December 17th, 2010

SearchTools - interesting links 12/17/2010

  • TagSoup - brute force HTML parser

    TagSoup is designed as a parser, not a whole application; it isn't intended to permanently clean up bad HTML, as HTML Tidy does, only to parse it on the fly. Therefore, it does not convert presentation HTML to CSS or anything similar. It does guarantee well-structured results: tags will wind up properly nested, default attributes will appear appropriately, and so on.

    tags: file-formats java xml

  • Speller Challenge (spellchecking algorithms for search)

    The Speller Challenge  - build the best speller that proposes the most plausible spelling alternatives for each search query.  It uses the TREC 2008 Million Query Track for training and the Bing Test Dataset for evaluation.  The first prize is $10,000 and the gratitude of the orthographically-challenged.

    tags: spellchecker research

  • SearchTools links & notes on Diigo

    slightly circular to link to my library, but it's much shorter.

Posted from Diigo. The rest of my favorite links are here.