SearchTools Blog (searchtools) wrote,
SearchTools Blog
searchtools

Search for Words with Multiple Meanings

A search engine claims to create lists of context-based homonyms, which sounds like a darn good idea to me. But then I got started thinking about the nature of spelling and language, especially in English which stems from so many sources (including Norse, who would guess?).

The classic IR example of how a search term can be ambiguous is "bank" -- does that mean "financial institution", "to store something", "side of a river", "airplane maneuver" or what? How should the search engine handle this situation? It gets even more complex to cope with when there are names, slang, acronyms and abbreviations added to the mix. Does a person searching for "coke" want to find the cola, the drug, the form of coal? How about searching for "freddy mac" or "jones" or "ARIA"?

There have been several different approaches to address this problem.

Research-oriented information retrieval often take a cluster approach, trying to group like elements by concept. Visualization tools use various graphical displays to help researchers see the relationships among these ideas.

Some search engines show other words frequently found in the same locations, to encourage searchers to choose one of the meanings.

Another approach is to highlight the matched words with surrounding text from the found documents. This is like the librarian's "Key Word In Context (KWIC) listings, and was pioneered on web search interfaces by Google.

Oddly enough, there seems to be no accepted linguistic term for words which are spelled the same, may or may not sound the same, but mean different things:

  • Homonyms sound the same or are spelled these same but mean different things (e.g. bore vs. boar).

  • Homophones sound the same but are different in meaning or spelling or both

  • Homographs are spelled the same and may or may not sound the same, but mean different things (e.g. bow, card, swallow). Note: many linguists use this term only for words that are spelled the same but do not sound the same.

  • Polysemes have the same etymological word source but multiple meanings (according to some)

  • Heteronyms are spelled the same but have different pronunciations (according to some)


For text search purposes, we only care about homographs, because the spelling is what matters.

Definitions
http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsAHomograph.htm
http://www2.hawaii.edu/~fredr/homonymy.htm

List of English Homographs

http://www.johnsesl.com/templates/vocab/homographs.php
http://rec-puzzles.org/new/sol.pl/language/english/meaning/synonyms/contranym

Lists of English homophones (change pronunciation)
http://www.marlodge.supanet.com/wordlist/homogrph.html
http://en.wikipedia.org/wiki/List_of_Homographs
http://www.marlodge.supanet.com/wordlist/homogrph.html (claims to be homographs, but only includes changed pronunciation)
http://www.opundo.com/homographs.htm (ditto)
http://markandkatiecraven.home.att.net/homograph.htm
http://www-personal.umich.edu/~cellis/heteronym.html (heteronyms only)
Subscribe

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments