No Indexing of Short Words: Why MediaWiki's Site Search Stinks, part 1 
13th-Oct-2008 06:06 pm

The MediaWiki search engine defaults to a four-letter minimum word length. Seriously. Not only will it not search for one, two, or three letter words, they're not even in the index: they are completely unfindable. There is no way to search for perfectly reasonable words like fan, lab, qi, or pH. The bartendersdatabase.com wiki can't find rum while Google finds 731 on that same site (though some are duplicates).

  • To add insult to injury, the no-matches page does not explain this error, it just looks like there's nothing with that word on the site. For example, on the Cunnan medieval re-enactors wiki, there are no matches for new, despite the fact that they have (according to Google) 947 pages with one or more instances of the word.

    no-matches page for short words

  • This problem has been fixed on Wikipedia itself, so users can search for vi, M, REM,etc. Becauså it's impossible to ignore that those are perfectly good words to search for. Wikipedia is running a version of the MWSearch/Lucene extension, and really, everyone ought to be. Or just switch to Sphinx.
  • If you cannot install a real search, there's a way to change the minimum world length via the instructions hidden on MetaWiki. Although there's nothing about it on the MediWiki distribution site, it seems to be working for people.

Stopword hell, coming soon

< to the main MediaWiki Site Search Stinks page

Arguments? Questions? Comments? I'm interested in other people's experiences, and may be able to deconstruct problems with the search.

15th-Feb-2011 06:44 pm (UTC)
Straight to the point and well written! Why can’t everyone else be like this?
15th-Feb-2011 11:04 pm (UTC)
I'm glad you like it, it hurts me to see such bad search implementation.
20th-Mar-2011 08:18 pm (UTC)
Awesome post. Really enjoyed reading your blog posts.
