June 14th, 2003

searchtools.com

Product Report: Datapark Search Engine

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

Datapark Search Engine

Product Information

  • Datapark Search is an open source search engine.

Platform: Apache
Price: free

Features (Search and Retrieval)

  • Effective caching gives significant time reduction in search times.
  • Supports http, https (SSL), ftp, nntp, news URLs, htdb virtual URL support for indexing SQL databases, as well as html, xml, plain, mpeg, and gif MIME types.
  • Option to query with all words, any words, or boolean queries.
  • Supports synonym lists and stopwords.
  • Index multilingual sites using content negotiation using ispell affixes and dictionaries
  • Multiple character sets supported.
  • Accent insensitive search.
  • Phrase segmenting for Chinese, Japanese, Korean and Thai Languages
  • Open source web-based search engine that uses the GNU Public Licence.
  • Includes an indexer and a web CGI front-end.
  • Supports external parsers.
  • Results can be sorted by relevancy, popularity rank, last modified time and by importance (which is a produect of relevancy and popularity rank).
  • Can scale to at least 300,000+ pages (based on one example).
  • DataparkSearch Reference Manual is well done (Russian version as well).
  • Active and searchable forums in English and Russian.

Examples:

  • Sochi's Internet Search - Sochi, is a resort town in southwestern Russia on the Black Sea. It's search engine has about 100,000 pages.
  • 43°N 39°E - Has been implemented to search specifically English, Russian, German, and French sites.
  • News Lookup Service - Crawls news sites on the web, and allows you to search news sites by media type, region, and/or different aspects of a page (title, body, ect.). Results can be sorted by relevancy and last date modified. Also, news can be browsed by region or topic.