February 1st, 2007


DataparkSearch - SearchTools Report Update

DataparkSearch is a free open-source search engine written in C by some smart people in Russia as an offshoot of the MnogoSearch project.

The biggest strength of this application is how well it handles languages and character sets. It supports internationalized domain names and proper word-segmentation (tokenization) in many languages including Chinese, Japanese, Korean and Thai. It can perform language detection on both text files and user queries. Spellchecking, abbreviation and synonym query expansion are on a per-language basis.

This search engine has some fancy Information Retrieval features like fuzzy searching, Boolean queries, and their own "Neo popularity ranking" based on neural network research and link analysis. Results templates include several innovative than simple listings (I'm not sure if I like them, but they're interesting). It also has caching of the index files, search templates and the code, and can distribute indexes and search servers among multiple machines, for better responsive time.

However, the code is distributed in source format for local compilation, and all the features are set via config files and runtime parameters -- it requires some comfort with command lines and programming tools. But there's an excellent manual and an active forum of users and developers.
