October 13th, 2008

searchtools.com

Why MediaWiki Site Search Stinks: 6 Good Reasons

Wikipedia, and particularly the related sites running the software as MediaWiki, have some of the worst site search I have ever seen. The default installation's query processing is absurdly limited, the retrieval is crippled by bad settings, the relevance is unclear, and the results page is not just ugly but contradictory and confusing.

I will be posting more detailed analyses supporting each of these statements, linked from this blog.

Summary

Default versions of the wikimedia search engine are very nearly unusable. If you have a MediaWiki, check the page Special:Version. If there is no mention of a search plugin, then run, do not walk, to replace the site search module. At least use the MWSearch (Lucene) extension, a version of which is used on the main wikipedia, or, better, the Sphinx search extension (which powers the New World Encyclopedia search). Your wiki readers will thank you.

What MediaWiki Default Search Does Wrong

  1. No Indexing of Short Words

more to come.

There are also some things that the MediaWiki search does right, but mostly, I've found it just stinks.

Arguments? Questions? Comments? I'm interested in other people's experiences, and may be able to deconstruct problems with the search.

searchtools.com

No Indexing of Short Words: Why MediaWiki's Site Search Stinks, part 1

The MediaWiki search engine defaults to a four-letter minimum word length. Seriously. Not only will it not search for one, two, or three letter words, they're not even in the index: they are completely unfindable. There is no way to search for perfectly reasonable words like fan, lab, qi, or pH. The bartendersdatabase.com wiki can't find rum while Google finds 731 on that same site (though some are duplicates).

  • To add insult to injury, the no-matches page does not explain this error, it just looks like there's nothing with that word on the site. For example, on the Cunnan medieval re-enactors wiki, there are no matches for new, despite the fact that they have (according to Google) 947 pages with one or more instances of the word.

    no-matches page for short words


  • This problem has been fixed on Wikipedia itself, so users can search for vi, M, REM,etc. Becauså it's impossible to ignore that those are perfectly good words to search for. Wikipedia is running a version of the MWSearch/Lucene extension, and really, everyone ought to be. Or just switch to Sphinx.
  • If you cannot install a real search, there's a way to change the minimum world length via the instructions hidden on MetaWiki. Although there's nothing about it on the MediWiki distribution site, it seems to be working for people.

Stopword hell, coming soon

< to the main MediaWiki Site Search Stinks page

Arguments? Questions? Comments? I'm interested in other people's experiences, and may be able to deconstruct problems with the search.

  • Current Mood
    annoyed