October 14th, 2008


Stop Words Must Go (WikiMedia Search Analysis)

The MediaWiki search defaults to excluding 547 words as stopwords. But they're perfectly good words (you can see them on the searchtools site). It's a MySQL full-text search default, and the MediaWiki people have never changed it. Exactly like the short words in the previous rant, these words are not indexed at all, so can never be retrieved by the search engine. Stop words include: able, about, above, according, across, actually, after... So a site search containing only one or more of those words has "No page text matches", even when there are pages with those words.

Collapse )