October 16th, 2008


Provide Reasonable Search Functionality (WikiMedia Search Analysis)

Why MediaWiki's Site Search Stinks, Reason #3

The default behavior of the MediaWiki search engine is to find only pages which match every word in the search query (find all). When there are quotes around terms in the query, it will only find pages with those terms as a phrase, which is nice. Unlike other search engines, however, there's no way to override this behavior.

Can't exclude terms from search results

There is no way to exclude pages containing a term from the search results, for example, to find a christmas pudding recipe without flour on recipes.wikia.com. The MediaWiki search does not support a Boolean NOT operator, nor does it support the minus sign (-) before a word, which is an Internet query operator convention in web search engines going back to AltaVista and HotBot. So the search can't limit the results to pages with cheese that do not include the word processed on a recipe site, for Unix that do not include Linux, or on the search for chant without Gregorian. on cdwiki.de.

No way to search for one or more among several terms

This search engine does not support the Boolean OR function, which has sometimes been implemented as a comma (,) on web search engines, or a radio button indicating the search is to find any pages that match any of the search terms. There are quite a few situations in which finding pages synonyms or variations would be very convenient, but MediaWiki can't do that. So there's no way to search in one pass for either current or raisin, for German, Austrian or Deutsche music on the cdwiki.de site, on technical sites for Red Hat or redhat, perl or PHP. The user has to enter each search separately.

Does not search for plural or other versions of words

Most full-text search engines automatically expand queries or retrieval to find multiple forms of words. This is particularly useful for plurals: many users just assume that they are getting both singular and plural forms, and often, multiple forms of verbs as well. But the MediaWiki search doesn't have any of this, so on the recipe wiki, users have to do two searches to get for raisin and raisins, on the knoppix wiki, all forms of the verb: install, installs, installer, installing, installed. On the Tolkien Gateway, it's not enough to search for dwarf, one also has to perform separate searches for both dwarfs and dwarves to find the appropriate pages.

No truncation or wildcards

Many database and regular expression-based search engines will perform substring searches, so a search for Thai can also find Thailand. Log analysis shows that many people stop in the middle when they don't know how to spell a long word, so they type ratat and hope the search engine can guess they mean Ratatouille. Sometimes, the truncation must be explictly defined, usually using an asterisk (*). While search engines with inverted indexes generally only allow left truncation or wildcard matching, some engines with more structured data fields can handle right or middle wildcards.

While the MediaWiki application itself can run SQL queries on the database internally, I can find no way for end-users to access that functionality.

<< Back to MediaWiki Site Search Stinks overview

  • Current Mood