September 20th, 2002

searchtools.com

Product Report: mnoGoSearch (formerly UdmSearch)

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

mnoGoSearch (formerly UdmSearch)

Product Information (Unix)

Product Information (Windows)

Platforms: Linux, FreeBSD, SunOS, Solaris, BSDI, OpenBSD, HP-UX, Digital Unix, SCO Unixware, AIX, SGI Irix, YellowDog Linux, Mac OS X and other Unix systems; Windows 98, NT and 2000
Prices: free, GNU GPL, open source in C (Unix version)
  $90 for Windows Lite version (to 3,000 pages)
  $299 for Windows Pro version

The name is pronounced "mno-go-search", and the word "mnogo" means "many" in Russian (the project was part of the Mnogo.ru portal).

Features

  • Uses a SQL database back end instead of an inverted index, supports MySQL, PostgreSQL, miniSQL, Solid, InterBase, Oracle, Sybase and MS SQL using CT-Lib, EasySoft ODBC, iODBC, and unixODBC
  • Mirroring and search clustering server options.
  • Indexes FTP, NNTP (Usenet news), HTTP proxy servers as well as web pages
  • Automatic language and character-set detection, can recognize multiple single-byte character sets in a document
  • Supports many character sets including UTF8, Chinese (BIG5 and GB2312), Korean (EUC-KR), Japanese (Shift-JIS), some disabled by default.
  • External file format converts for PDF, PostScript, Microsoft Word .doc, etc.
  • Uncompresses gzip, compress and deflate formats.
  • Indexes keywords, description and user-defined META tags
  • Can include password authorization
  • Indexing can specify weighting for page structures
  • Queries can include Boolean operators, options for stemming, synonyms and substrings.
  • Includes ispell spellchecker.
  • Results match highlighting.
  • Includes front-end systems in PHP, CGI and Perl
  • User-editable HTML templates for search results
  • Performs query tracking for later analysis
  • Windows version
    • Includes a graphical interface to the indexer, but is not free
    • Lite version (to 3,000 documents) can use a built-in database.
    • Windows Pro version now features a Windows NT Service interface for scheduling indexing.
    • Configuration Wizard
    • Improved help system

Articles

Examples

searchtools.com

Product Report: SWISH++

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

SWISH++

SWISH++ Product Information

Mailing list archive

Price: Free (open source) under the GNU License
Platforms: Any Unix with C++. STL and GNU make, and Windows 95/NT/2000 (under Cygwin)

Features

  • Indexes local files, web sites using a robot spider based on wget
  • Indexes HTML, text, LaTex, mail files, man pages,
  • Indexes PostScript, PDF and Word, Excel and PowerPoint
  • Can index and search meta tags including Dublin Core
  • Automatically excludes frequent terms
  • Allows for incremental indexing
  • Heavily geared for English
  • Queries can use Boolean And, Or, Not, right truncation with wildcard character
  • Results display shows title, file URL, size and relevance rank score
  • XML DTD and schema for search results

Articles & Reviews

  • DB2 for Linux: Full Text Searching using SWISH++ Susa.NET, 2000 by Kevin Sangeelee
    Describes how to extract records from a database and index them using SWISH++, so that people can do simple keyword searches on database content. Includes details on the DB2 side, such as replacing the tsearch function, writing the Perl index builder script, and creating a shell script.

  • Implementing search in AOLserver using SWISH++ fifthgate.org, August 2000
    Steps for setting the appropriate configuration, running the make and install as root, indexing, testing the search, setting up the faster daemon mode, connecting to AOLserver, testing and customization. Evaluating the search found that 175 MB of text books were indexed into 13 MB in 5 minutes.
searchtools.com

Product Report: Xapian Code Library

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

Xapian Code Library

Project Description

See also Brightstation Muscat

Price: free open source GPL
Platform: Linux (primary), Solaris

SmartLogik (previous known as Brightstation and Muscat) originally started developing this code library as open source for the next version of its high-performance text retrieval system. As of the end of April, 2001, the company closed down the Omsee open source effort. Participants in that group have re-emerged with Xapian

Features

  • Scalable to hundreds of thousands of pages
  • Full access to the index: from term to document, but also from document to terms
  • Search features include Boolean operators and phrase search
  • Probablistic relevance algorithm
searchtools.com

Product Report: Zebra

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

Zebra

Product Information

Platform: Unix, Windows (Win32)
Price: free, open source, GNU GPL (commercial support available)

Features

  • Indexes both free text and fielded information.
  • Indexes local files, no robot for remote spidering.
  • Input filters for recognizing structured text in SGML, XML, MARC and other ASCII text formats.
  • Free text and Boolean queries, regular expressions and right truncation.
  • Approximate matching (for spelling mistakes).
  • Responds to Z39.50 distributed search clients.

Articles & Reviews

  • Comparing Open Source Indexers Infomotions Musings; May 29, 2001 by Eric Lease Morgan
    Describes the history and features of eight open-source search engines, freeWAIS-sf (aging code and hard to install, but good for searching email and public domain etexts); Harvest (powerful gathering features for frequently-changing data stores, good with structured documents); ht://Dig (tricky to configure, no phrase searching, automatic stemming and match word highlighting); Isearch (weak documentation and support, easy to install, dated interface, Z39.50 support); MPS Information Server (zippy indexing of both text and structured data, Z39.50 support, Perl API, limited documentation); SWISH-E (simple to install engine, CGIs in Perl and PHP still beta, good for HTML pages, recognizes new META tags, sorts results by field; WebGlimpse (easy to install and configure, requires commercial version for customized output); Yaz/Zebra (mainly Z39.50, no Perl API, mainly a toolkit to index and respond to distributed client queries). Article also points out that chaotic information is less than helpful and encourages organization, structure and vocabulary control.