September 27th, 2002

Prodct Report: SWISH-E

SWISH-E Product Information

Mailing list archive

SWISH-E stands for "Simple Web Indexing System for Humans - Enhanced"

Price: Free (open source) under the GNU License
Platforms: Linux, Solaris, AIX, VMS, and Windows 95/NT/2000


  • Indexes local files or web sites using a robot spider
  • Index and search data in tags, including Dublin Core meta tags and XML nested fields
  • Use external converters to index binary files including PDF, Microsoft Word, compressed files
  • Portable indexes can be moved to other machines
  • Search allows Boolean And, Or, Not, and parentheses
  • Fuzzy matching including truncation and stemming
  • Sort results by relevance, date, size, other fields
  • Code library with API provided
  • Version 2.2, September 2002
    • External document indexing option, easy to add special indexing gatherers for databases, CMSs, etc.
    • new XML parsers, expat or libxml2
    • Improved filtering of binary file formats
    • Ignores text in <!-- noindex --> <!-- index --> blocks and follows meta robots instructions.
    • Special case indexing for "buzzwords" - complex terms including punctuation such as C++ and SWISH-E
    • Much faster indexing and searching
    • Searching, merging, and ranking results from multiple indexes
    • Improved security in temporary files and parameter checking
    • Search results layout can be edited directly or via the Perl HTML::Template
    • Result page match words highlighted in context
    • Windows binary in installer package
    • Extended documentation
    • Note: be sure to re-index after installing the new version, old indexes are not supported.

Articles & Reviews

  • Comparing Open Source Indexers Infomotions Musings; May 29, 2001 by Eric Lease Morgan
    Describes the history and features of eight open-source search engines, freeWAIS-sf (aging code and hard to install, but good for searching email and public domain etexts); Harvest (powerful gathering features for frequently-changing data stores, good with structured documents); ht://Dig (tricky to configure, no phrase searching, automatic stemming and match word highlighting); Isearch (weak documentation and support, easy to install, dated interface, Z39.50 support); MPS Information Server (zippy indexing of both text and structured data, Z39.50 support, Perl API, limited documentation); SWISH-E (simple to install engine, CGIs in Perl and PHP still beta, good for HTML pages, recognizes new META tags, sorts results by field; WebGlimpse (easy to install and configure, requires commercial version for customized output); Yaz/Zebra (mainly Z39.50, no Perl API, mainly a toolkit to index and respond to distributed client queries). Article also points out that chaotic information is less than helpful and encourages organization, structure and vocabulary control.

Example Sites