July 11th, 2008


Sphinx (open source free search engine): New SearchTools Report

Sphinx is an open source search engine, written in C, using both SQL and custom index files to provide a very fast text search. The architecture scales to over a billion records by distributing the index and querying among multiple virtual and real processors.

While it does a full text search, Sphinx is designed to work with structured content (music lyrics, products), and semi-structured content (RSS feeds, blog posts, magazine articles). Sphinx is much faster and more flexible than the internal SQL functions such as where, order by, and group by. This structure allows it to display results in a faceted metadata, for example in the widepress.com results, showing graphical facets including country, source, theme and date.

Sphinx does not have a robot crawler, although it can accept input in XML which can be generated by a crawler. It connects directly to mySQL and PostgreSQL, and has web scripts for external sources. APIs are available in PHP, Python, Java, Perl and Ruby.

Read more and tell me about your Sphinx experience here.
  • Current Mood