Describes the FullTextSQL of SharePoint Search vs. the FQL of FAST search, and other practical issues.
Excellent description of the tricky issues with near-duplicate detection, in some ways harder because of the 140 character limit on Twitter. Describes tokenization approaches and the Jaccard Distance algorithm for similarity, and what they call "entropy": how some collections are more uniform than others.
Lucid Imagination is a vendor of services and support for the open-source Lucene/Solr search engine code. Their new LucidWorks Enterprise package has three main parts:
- An easy installer/update to the current release version of Lucene/Solr. This also makes installing distributed indexes and additional search servers much easier.
- Additional features that any modern search engine should have, without each developer having to compile and configure multiple packages. These include data sources using web robot, file system, and database accessors, smart defaults in query parsing, some access control, logical query processing rules, auto-complete, spellchecking, synonyms, faceted navigation, and a clean results page design.
- Ways to configure the search (rather than learning Solr calls and config files):
- A clear RESTful API which makes calling search very easy for application programmers.
- An interactive browser admin interface for the people running search who are are less sysadmins than librarians, information architects, usability experts or site producers. It's an early version and there are still glitches but it's now possible for non-programmers to get a Solr search up and going.
The LWE (LucidWorks Enterprise) package is free during development, although paid support is available; payment is required for production deployment.
All of these features complete Solr so that it can now match against the most prominent enterprise search engines, including the Google Search Appliance, SharePoint and FAST search, Vivisimo, Coveo, Exalead, Attivio, and Endeca: even Autonomy now has a browser-based admin interface. Service vendor competitors SearchBlox and Constellio have packages for their versions of Lucene/Solr, and I will be reviewing and comparing all of them in some depth.
Disclosure: I wrote a white paper for Lucid Imagination, and critiqued early versions of the LWE.