Avi is Program Chair for ESS Fall

I'm happy to announce that I'm the new Program Chair for the Enterprise Search Summit Fall conference. It's really exciting to work on this!

I'll be posting a call for presentations soon, so please think about what you might want to talk about.

Enterprise Search Summit Fall 2011
November 1-3, in Washington DC, with KMWorld, Taxonomy Boot Camp, and SharePoint Summit

Any suggestions?

Judge rules against Google Books settlement

I think this is a Really Good Thing, and the Judge agrees, saying that the settlement went way too far planning

"to implement a forward-looking business arrangement that would grant Google significant rights to exploit entire books, without permission of the copyright owners. Indeed, the ASA would give Google a significant advantage over competitors, rewarding it for engaging in wholesale copying of copyrighted works without permission, while releasing claims well beyond those presented in the case.”

"OPINION: In the end, I conclude that the ASA is not fair, adequate, and reasonable."

Read the whole thing:

justia Google Books category: (general)

links: Boilerplate code library, enterprise relevance, HTML5

  • boilerpipe - removes clutter around web page content (java code library)

    The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate. Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0. The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA. Click here to read the paper and the presentation slides. A video of the presentation is freely available on (turn speaker balance to the left to improve audio quality). Commercial support is available through Kohlschütter Search Intelligence.

    tags: analysis APIs indexing

  • What makes relevance such a challenge in the enterprise? (sharepoint & fast search blog)

    Nice overview of why internal search is often worse than web search: mainly that there's little meaningful linking within an intranet, little incentive to make a site easily searchable, and security issues with access control.  The post recommends realistic expectations, not indexing low-value content, looking at third-party relevance tools, offering scope or zoned search, and tagging content.

    tags: enterprise search engines intranets overviews relevance

  • HTML5 specification, w3

    Complex and difficult to read, though I can tell they're trying to make it easier.

    tags: site-search web-search research

  • HTML5 - A Step Forward Towards Semantic Web

    Nice introduction to the new structural tags in HTML5: section, article, aside, header, hgroup, footer, and nav, and new content tags: figure, video, audio, canvas.

    tags: semantic search web-search

Posted from Diigo. The rest of my favorite links are here.

notes on Google CSE maximum results, Yahoo BOSS (Build Your Own Search Server), clouds from Attivio

Posted from Diigo. The rest of my favorite links are here.

links: Google Mini vs. Funnelback, Google Recipe Search, corporate analytics delays

Posted from Diigo. The rest of my favorite links are here.

link: Remedies for Search Bias (Ben Edelman re Google manual relevance biases)

Posted from Diigo. The rest of my favorite links are here.

links: ElasticSearch interview, Greplin Personal Cloud Search article

Posted from Diigo. The rest of my favorite links are here.

link: Moving from Oracle Text to Solr/Lucene @ Digital Collections - Blog entry 2009

Posted from Diigo. The rest of my favorite links are here.