November 26th, 2002

searchtools.com

Product Report: Lucene

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

Lucene

Product Information
now part of Apache Jakarta

Platform: Java (designed for cross-platform use)
Price: free, open source, Apache Software License

Features

  • Very fast indexing, minimal RAM required
  • Index compression to 30% of original text
  • Indexes text and HTML, document classes available for XML, PDF and RTF
  • Search supports phrase and Boolean queries, plus, minus and quote marks, and parentheses
  • Allows single and multiple character wildcards anywhere in the search words, fuzzy search, proximity
  • Will search for punctuation such as + or ?
  • Field searches for title, author, etc., and date-range searching
  • Supports most European languages
  • Option to store and display full text of indexed documents
  • Search results in relevance order
  • APIs for file format conversion, languages and user interfaces

Articles & Reviews

  • JavaGuru Lucene FAQ jguru.com, updated as of July 2002 by Otis Gospodnetic
    Helpful information for indexing, searching, updates, configuration, etc.

  • The Lucene search engine: Powerful, flexible, and free JavaWorld, September 2000 by Brian Goetz
    Thoughtful description of implementing the Lucene search engine for searching Eyebrowse email archives, which are stored in a mySQL database. Discusses the features, including the powerful indexing and updating scheme in some detail, and includes code snippets for calling the code.

Examples