?

Log in

No account? Create an account
SearchTools Blog
Recent Entries 
22nd-Mar-2011 07:19 pm - Avi is Program Chair for ESS Fall
searchtools.com
I'm happy to announce that I'm the new Program Chair for the Enterprise Search Summit Fall conference. It's really exciting to work on this!

I'll be posting a call for presentations soon, so please think about what you might want to talk about.

Enterprise Search Summit Fall 2011
November 1-3, in Washington DC, with KMWorld, Taxonomy Boot Camp, and SharePoint Summit

Any suggestions?
searchtools.com
I think this is a Really Good Thing, and the Judge agrees, saying that the settlement went way too far planning

"to implement a forward-looking business arrangement that would grant Google significant rights to exploit entire books, without permission of the copyright owners. Indeed, the ASA would give Google a significant advantage over competitors, rewarding it for engaging in wholesale copying of copyrighted works without permission, while releasing claims well beyond those presented in the case.”

"OPINION: In the end, I conclude that the ASA is not fair, adequate, and reasonable."

Read the whole thing: http://docs.justia.com/cases/federal/district-courts/new-york/nysdce/1:2005cv08136/273913/971/

justia Google Books category: http://dockets.justia.com/docket/new-york/nysdce/1:2005cv08136/273913/ (general)
searchtools.com
  • boilerpipe - removes clutter around web page content (java code library)

    The boilerpipe library provides algorithms to detect and remove the surplus "clutter" (boilerplate, templates) around the main textual content of a web page. The library already provides specific strategies for common tasks (for example: news article extraction) and may also be easily extended for individual problem settings. Extracting content is very fast (milliseconds), just needs the input document (no global or site-level information required) and is usually quite accurate. Boilerpipe is a Java library written by Christian Kohlschütter. It is released under the Apache License 2.0. The algorithms used by the library are based on (and extending) some concepts of the paper "Boilerplate Detection using Shallow Text Features" by Christian Kohlschütter et al., presented at WSDM 2010 -- The Third ACM International Conference on Web Search and Data Mining New York City, NY USA. Click here to read the paper and the presentation slides. A video of the presentation is freely available on Videolectures.net (turn speaker balance to the left to improve audio quality). Commercial support is available through Kohlschütter Search Intelligence.

    tags: analysis APIs indexing

  • What makes relevance such a challenge in the enterprise? (sharepoint & fast search blog)

    Nice overview of why internal search is often worse than web search: mainly that there's little meaningful linking within an intranet, little incentive to make a site easily searchable, and security issues with access control.  The post recommends realistic expectations, not indexing low-value content, looking at third-party relevance tools, offering scope or zoned search, and tagging content.

    tags: enterprise search engines intranets overviews relevance

  • HTML5 specification, w3

    Complex and difficult to read, though I can tell they're trying to make it easier.

    tags: site-search web-search research

  • HTML5 - A Step Forward Towards Semantic Web

    Nice introduction to the new structural tags in HTML5: section, article, aside, header, hgroup, footer, and nav, and new content tags: figure, video, audio, canvas.

    tags: semantic search web-search

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

28th-Feb-2011 10:30 pm - New InfoDocket information service
searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com
  • The ROI of User Experience with Dr. Susan Weinschenk (video)

    A spectacular drawing animation/video about how to see the User Experience as a way to avoid expensive errors and duplication.  It would be great to show a doubter.

    tags: user experience

  • Taxonomy Fairy Tales (video)

    Patrick Lambe and Matt Moore discuss why internal and enterprise search engines don't work as well as web search, due to a lack of meaningful hypertext links and different expectations (I agree).  They mention the value of taxonomies for improving search, and the fact that there is no Taxonomy Fairy or magic automated system to organize things.  And they recommend that most of the taxonomy come  from the bottom up, by categorizing stuff and seeing how it fits together.  Very engaging and I'm glad that I agree with them.

    tags: taxonomy tagging intranet enterprise search engines

Posted from Diigo. The rest of my favorite links are here.

searchtools.com
The new Greplin service is like desktop search, only it indexes online accounts. It uses OAuth to get personal posts and timelines from Twitter, Facebook, Gmail, Google Docs & Calendar, Dropbox, and LinkedIn*. The Greplin service is cloud-hosted, and the company says that it updates the indexes around every 20 minutes, though it can take as long as a day.

Greplin provides a secure web page to search , dividing the results by kinds of materials such as streams, messages, people, events, and files, with optional filters for the sources. It's all done with a ton of JavaScript. But within these results, there are some really old, like my son's high school charity fundraiser in "events" and email messages fro 2007.

The disconcerting part is that there's a seemingly-random mix of internal and external content. For example, searches find both everything on my rather large Twitter Timeline and my Twitter Direct Messages, which are a bit more private. Greplin doesn't find my Enterprise Search Engine Professionals group posts in LinkedIn, which is where I spend most of my time. And that wild directory project I worked on three years ago in Google Docs, which I don't actually want to see right now. And there's no way to control those things, as of yet.

The API, "coming soon", may solve some of these problems, if it allows for additional clients. I hope it also has services to make other social networks indexable, so that people can track their own posts on WordPress, BlogSpot, LiveJournal, etc.

* free service: the paid version currently also indexes EverNote, Yammer, and Google Apps Docs, Calendar, and Mail. They're also working on Dropbox attachments, Salesforce, Box.net, Basecamp, and Google Voice.
searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

searchtools.com

Posted from Diigo. The rest of my favorite links are here.

This page was loaded Dec 13th 2017, 10:47 am GMT.