?

Log in

No account? Create an account
SearchTools Blog
No Indexing of Short Words: Why MediaWiki's Site Search Stinks, part 1 
13th-Oct-2008 06:06 pm
searchtools.com

The MediaWiki search engine defaults to a four-letter minimum word length. Seriously. Not only will it not search for one, two, or three letter words, they're not even in the index: they are completely unfindable. There is no way to search for perfectly reasonable words like fan, lab, qi, or pH. The bartendersdatabase.com wiki can't find rum while Google finds 731 on that same site (though some are duplicates).

  • To add insult to injury, the no-matches page does not explain this error, it just looks like there's nothing with that word on the site. For example, on the Cunnan medieval re-enactors wiki, there are no matches for new, despite the fact that they have (according to Google) 947 pages with one or more instances of the word.

    no-matches page for short words


  • This problem has been fixed on Wikipedia itself, so users can search for vi, M, REM,etc. Becauså it's impossible to ignore that those are perfectly good words to search for. Wikipedia is running a version of the MWSearch/Lucene extension, and really, everyone ought to be. Or just switch to Sphinx.
  • If you cannot install a real search, there's a way to change the minimum world length via the instructions hidden on MetaWiki. Although there's nothing about it on the MediWiki distribution site, it seems to be working for people.

Stopword hell, coming soon

< to the main MediaWiki Site Search Stinks page

Arguments? Questions? Comments? I'm interested in other people's experiences, and may be able to deconstruct problems with the search.

Comments 
15th-Feb-2011 06:44 pm (UTC)
Anonymous
Straight to the point and well written! Why can’t everyone else be like this?
15th-Feb-2011 11:04 pm (UTC)
I'm glad you like it, it hurts me to see such bad search implementation.
20th-Mar-2011 08:18 pm (UTC)
Anonymous
Awesome post. Really enjoyed reading your blog posts.
This page was loaded Oct 20th 2017, 12:41 pm GMT.