<?xml version="1.0" encoding="utf-8"?>
<!-- If you are running a bot please visit this policy page outlining rules you must respect. http://www.livejournal.com/bots/ -->
<feed xmlns="http://www.w3.org/2005/Atom" xmlns:lj="http://www.livejournal.com">
  <id>urn:lj:livejournal.com:atom1:searchtools</id>
  <title>SearchTools Blog</title>
  <subtitle>SearchTools Blog</subtitle>
  <author>
    <name>SearchTools Blog</name>
  </author>
  <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/"/>
  <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom"/>
  <updated>2008-05-06T18:09:10Z</updated>
  <lj:journal username="searchtools" type="personal"/>
  <link rel="service.feed" type="application/x.atom+xml" href="http://searchtools.livejournal.com/data/atom" title="SearchTools Blog"/>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:77291</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/77291.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=77291"/>
    <title>A First Taxonomy for "Search Log Junk"</title>
    <published>2008-05-06T17:57:06Z</published>
    <updated>2008-05-06T18:09:10Z</updated>
    <content type="html">&lt;p&gt;Search logs contain a lot of weird things, and some of them can have a significant effect on search log analysis.  Having looked at tens of thousand lines of search log entries, I offer this first attempt at defining some of the weirdest and least useful kinds of log entry, which I call "Search Log Junk".   Here are the types of junk that I've seen most frequently:&lt;/p&gt;

&lt;dl&gt;
	&lt;dt&gt;&lt;b&gt;Empty Queries&lt;/b&gt;&lt;/dt&gt;
	&lt;dd&gt;Queries without any query text or usable parameters.  These can appear when people think the &amp;quot;Search&amp;quot; button is important in and of itself.  Or perhaps search is in the first page form, and the cursor gets into that field and users press Return.  These are often sent from the home page, according to the referer fields I've seen. &lt;br /&gt;
		&lt;br /&gt;
		The first thing is to make sure that the search engine is doing something reasonable in this case.   This could be just bringing up a helpful search page, adding  a script to bring up an error dialog, or a script to ignore the empty query. I'm leaning towards the last option.&lt;br /&gt;
		&lt;br /&gt;
		I've found only a couple of ways to use this information. They are still useful for traffic and response time metrics, and I think it's useful to check the top referring pages occasionally. A lot of empty queries for a page deep within a site may indicate some navigation problems.&lt;br /&gt;
		&lt;br /&gt;
	&lt;/dd&gt;
	&lt;dt&gt;&lt;b&gt;Repeat Queries&lt;/b&gt;&lt;/dt&gt;
	&lt;dd&gt;Multiple identical queries to the search engine from the same IP or user ID.  My best guess is that the client is calling for a refresh automatically  -- my favorite was thousands of queries over months for two dots: &amp;quot;..&amp;quot;.&lt;br /&gt;
		&lt;br /&gt;
		Again, this is useful for traffic metrics and possibly for identifying really weird incoming links. For most situations, it won't affect the statistics in any important way. But if there are hundreds of repeat queries by the same client, removing them from the database  allows you to concentrate on the real data.  You may also want to ban that IP address.&lt;br /&gt;
		&lt;br /&gt;
	&lt;/dd&gt;
	&lt;dt&gt;&lt;b&gt;Robot crawlers&lt;/b&gt;&lt;/dt&gt;
	&lt;dd&gt;Having search and intelligent agents crawl search results may be a good thing. Incoming links are always good and it may be that the search results on your site for emerald green widgets is number one in webwide search results and drives good traffic. However, there may be other robots wasting your search engine cycles: for those, a combination of robots.txt and banning their IP address will help.&lt;br /&gt;
		&lt;br /&gt;
	&lt;/dd&gt;
	&lt;dt&gt;&lt;b&gt;Server hacks&lt;br /&gt;
	&lt;/b&gt;&lt;/dt&gt;
	&lt;dd&gt;Search engines are attacked by the standard web server hacking parameters, such as &amp;quot;phpmyadmin&amp;quot;. They may also be subject to buffer overflow and other attacks, so should be included in standard website security audits and checklists.&lt;br /&gt;
		&lt;br /&gt;
	&lt;/dd&gt;
	&lt;dt&gt;&lt;b&gt;Guestbook spam&lt;/b&gt;&lt;/dt&gt;
	&lt;dd&gt;There are automated advertising services that insert fake comments with URLs into form fields, guestbooks, blogs and wikis (and there's a &lt;a href="http://en.wikipedia.org/wiki/Spam_in_blogs"&gt;wikipedia page about them&lt;/a&gt;).  Many of them do the same with search fields, which explains why logs contain bizarre queries with spaces, HTML formatting and URLs in them.&lt;/dd&gt;
	&lt;dd&gt;&lt;br /&gt;
	For sites with light search traffic, these meaningless entries can cause problems with both traffic metrics and top query listings. Even for sites with thousands of queries per day, they can distort  statistics about the average length of query, so removing them from your analysis database is a good idea.&lt;br /&gt;
	&lt;br /&gt;
	It's fairly easy to identify these queries with simple regular expressions looking for href, http and .com. I haven't heard of any search engines which filter this, though some may be doing it without bothering their customers about it. &lt;br /&gt;
&lt;br /&gt;
	&lt;/dd&gt;
&lt;dl&gt;&lt;dt&gt;&lt;b&gt;Internal testing queries&lt;/b&gt;&lt;/dt&gt;
&lt;dd&gt;For light traffic sites, any kind of automated testing, or even heavy manual testing 	can change the search log significantly -- especially given how quickly the Long 	Tail shows up.  Remove queries from testers by user ID  or IP address to look at 	real user data.&lt;/dd&gt;
&lt;/dl&gt;&lt;/dl&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:77043</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/77043.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=77043"/>
    <title>partly offline due to injury</title>
    <published>2008-03-04T23:22:11Z</published>
    <updated>2008-03-04T23:45:18Z</updated>
    <content type="html">I slipped on a stepladder and broke my left leg (tibial plateau fracture) and then chipped my right heel while on crutches.  My office is not really wheelchair accessible, nor can I go down my house's steps without great effort, so I'm working remotely, part-time. &lt;br /&gt;&lt;br /&gt; I am trying to read email every day and respond in a timely way, so if you've left a voice message or sent email that I have not answered, please try again (by email if possible).  Apologies for your inconvenience.&lt;br /&gt;&lt;br /&gt;Avi</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:76770</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/76770.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=76770"/>
    <title>where has Entopia gone?</title>
    <published>2008-01-23T21:55:04Z</published>
    <updated>2008-01-23T22:03:32Z</updated>
    <content type="html">One of my clients is interested in Entopia, so I was taking a look.&lt;br /&gt;&lt;br /&gt;I tried to go to the web site and it was replaced by one of those placeholder spam sites which pops up several spammy windows. It seems like the kind of thing that might have viruses, worms or trojans, so I'd suggest against opening the site in IE, or really, at all on a Windows machine.&lt;br /&gt;&lt;br /&gt;No one answered at one phone number, the other two I found were disconnected.&lt;br /&gt;&lt;br /&gt;Casualty of the recession?  Acquired by someone?  It's a mystery, and I'm curious.&lt;br /&gt;&lt;br /&gt;ETA: The Wayback Machine (archive.org) has an actual home page as of &lt;a href="http://web.archive.org/web/20060613051353/http://entopia.com/"&gt;June 13, 2006&lt;/a&gt; and an empty page as of &lt;a href="http://web.archive.org/web/20060701042058/http://www.entopia.com/"&gt;July 1&lt;/a&gt; of that year.  I always thought they were promising more than they could deliver, so this is perhaps confirmation.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:76543</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/76543.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=76543"/>
    <title>Small updates to Search Tools reports</title>
    <published>2007-12-19T01:28:15Z</published>
    <updated>2007-12-19T01:34:35Z</updated>
    <content type="html">We've updated the following reports on search engines large and small in the last few weeks:
&lt;ul&gt;

	&lt;li&gt;i411 has changed its name to &lt;a href="http://www.searchtools.com/tools/intelligenx.html"&gt;Intelligenx&lt;/a&gt; and added autocatagorization and multiple language support. &lt;/li&gt;
	&lt;li&gt;&lt;a href="http://searchtools.com/tools/engenium.html"&gt;Engenium&lt;/a&gt; now has OEM library and automatic clustering module.&lt;/li&gt;
	&lt;li&gt;&lt;a href="http://searchtools.com/tools/freefind.html"&gt;FreeFind&lt;/a&gt; now has wildcards for excluding URL paths from indexing, indexes common office document file formats, relevance weight adjustments for URL paths (with wildcards), and some really nice indexing reports -- URLs extracted, server response, status, and which URLs are actually in the searchable index.&lt;/li&gt;
	&lt;li&gt;&lt;a href="http://searchtools.com/tools/homepagesearchengine.html"&gt;HomePageSearchEngine&lt;/a&gt; now indexes more file types.&lt;/li&gt;
	&lt;li&gt;&lt;a href="http://searchtools.com/tools/doclinx.html"&gt;Doclinx&lt;/a&gt; now has a web monitoring agent, with support for speech recognition, for research and competitive intelligence, and a language analyzer.&lt;/li&gt;
	&lt;li&gt;&lt;a href="http://searchtools.com/tools/booleansearch.html"&gt;Boolean Search&lt;/a&gt; now runs natively on both PPC and Intel Mac OS X systems, includes web-based admin, spellchecking and match term highlighting in search results, template and AppleScript integration for search results formatting, standalone search server, and regular expressions in queries.&lt;/li&gt;
	&lt;li&gt;&lt;a href="http://searchtools.com/tools/crawl-it.html"&gt;Crawl-it remote service&lt;/a&gt;  is still being supported.&lt;/li&gt;

	&lt;li&gt;&lt;a href="http://searchtools.com/tools/datagold.html"&gt;Datagold&lt;/a&gt; is no longer a separate search, it's part of an online archiving suite.&lt;/li&gt;

	&lt;li&gt;&lt;a href="http://searchtools.com/tools/educesoft.html"&gt;Educasoft&lt;/a&gt; has no indication of continuing development &lt;/li&gt;

&lt;/ul&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:76135</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/76135.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=76135"/>
    <title>Search Conferences Listing updated</title>
    <published>2007-09-19T23:58:28Z</published>
    <updated>2007-09-19T23:58:28Z</updated>
    <content type="html">This &lt;a href="http://www.searchtools.com/info/conferences.html"&gt;list&lt;/a&gt; covers all the search and related related conferences I know about. &lt;br /&gt;&lt;br /&gt;At the &lt;a href="http://www.enterprisesearchsummit.com/west/"&gt;Enterprise Search Summit West&lt;/a&gt; I will be doing a pre-conference workshop on Critical Success Factors (how search engines work and how to make them better), a presentation on Tuning Search using Analytics and a moderating a panel on Good Practices for Search User Interfaces. At the &lt;a href="http://www.ftponline.com/conferences/webbuilder/2007/agenda.aspx"&gt;Web Builder 2.0&lt;/a&gt; conference, I'll be presenting on Web Site Search and the User Experience. If you are a reader of this web site, please come and say hi, and if you'd like an online presentation to your organization or company, I do those as well. &lt;br /&gt;&lt;br /&gt;To suggest a conference or the listing, please &lt;a href="http://www.searchtools.com/site/contact.html"&gt;leave a comment&lt;/a&gt; and I'll add it.&lt;img src="http://stools.icons.ljtoys.org.uk/mi/dot.gif" border="0" alt=""&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:75856</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/75856.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=75856"/>
    <title>Critique of the Google Custom Search Traffic Report</title>
    <published>2007-08-29T22:35:21Z</published>
    <updated>2007-08-29T22:50:19Z</updated>
    <content type="html">&lt;p&gt;Edward Tufte would be disappointed in Google.  The traffic reports in the Google Custom Search Business Edition are not only insufficient, but somewhat misleading.&lt;/p&gt;
	&lt;p&gt;Below is a picture from a CSBE search for a B2B site that I helped install in August 2007. The fact that it's a line chart,  with no data points given, filled underneath,makes it look active.  It seems as though something's happening, the traffic is making progress, or worse, losing ground.  The deep dips look scary, as though the site has done something wrong.&lt;/p&gt;
&lt;a name="cutid1"&gt;&lt;/a&gt;	&lt;p&gt;&lt;img src="http://www.searchtools.com/images/gcsbe-monthly-traffic.gif" width="700" height="356" /&gt;&lt;/p&gt;
	&lt;p&gt;The problems come about  because it's the &lt;i&gt;wrong graph format&lt;/i&gt; for the content.  This is very simple data: one point per day.  Look at it as a simple bar graph and it suddenly seems more reasonable.  The traffic resolves itself into a rhythm: the dips are on weekends -- all the customers are home. I don't know why they got it so wrong, but it's worth getting right. &lt;/p&gt;
	&lt;p&gt;&lt;img src="http://www.searchtools.com/images/searchtools-traffic-example.gif" width="700" height="383" /&gt;&lt;/p&gt;

	&lt;p&gt;&lt;a href="http://www.edwardtufte.com/"&gt;Edward Tufte&lt;/a&gt; wrote some enlightening books on these topics, including  &lt;i&gt;The Visual Display of Quantitative Information,&lt;/i&gt; which taught those of us paying attention that how data is presented deeply affects how it is received. I highly recommend getting some of Tufte's books, &lt;a href="http://www.amazon.com/gp/search?ie=UTF8&amp;amp;keywords=edward%20tufte&amp;amp;tag=searchtoolscom&amp;amp;index=books&amp;amp;linkCode=ur2&amp;amp;camp=1789&amp;amp;creative=9325"&gt;from Amazon&lt;/a&gt;&lt;img src="http://www.assoc-amazon.com/e/ir?t=searchtoolscom&amp;amp;l=ur2&amp;amp;o=1" width="1" height="1" border="0" alt="" style="border:none !important; margin:0px !important;" /&gt;, &lt;a href="http://www.powells.com/partner/24574/s?kw=Tufte+Edward"&gt;from Powell's&lt;/a&gt; or &lt;a href="http://worldcat.org/search?q=tufte,+edward&amp;amp;fq=dt%3Abks&amp;amp;qt=facet_dt%3A"&gt;from your library (using WorldCat)&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;Please comment whether you agree or disagree. I'm haven't seen quite this problem in other search engine traffic reports, but I'm wondering what other interfaces might look like, and what you think is best. Tell me your opinions, please!</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:75633</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/75633.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=75633"/>
    <title>Google Search Appliance and Mini - SearchTools Report Updated</title>
    <published>2007-08-20T20:55:48Z</published>
    <updated>2007-08-20T20:57:19Z</updated>
    <content type="html">&lt;p&gt;I have updated my  report on the &lt;a href="http://searchtools.com/tools/google-app.html"&gt;GSA and Mini search appliances&lt;/a&gt;, with detail based in part on my recent experiences customizing a Google Mini. The report includes information on the pricing as far as I could find it, the terms of licensing, new features, links to informative documents, and features that are not included with the Mini appliance.&lt;/p&gt;
&lt;p&gt;Once I update my &lt;a href="http://www.searchtools.com/analysis/google-appliance-v3.html"&gt;full product review&lt;/a&gt;, I will have a chance to pay attention to other search engines, and that will be lovely. &lt;img src="http://stools.icons.ljtoys.org.uk/mi/dot.gif" border="0" alt=""&gt;&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:75274</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/75274.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=75274"/>
    <title>Google CSE - different results when searching more than three sites</title>
    <published>2007-08-16T22:03:07Z</published>
    <updated>2007-08-16T22:03:07Z</updated>
    <content type="html">A &lt;a href="http://www.google.com/support/customsearch/bin/answer.py?answer=70392&amp;amp;topic=11502"&gt;support document&lt;/a&gt; for the Google CSE (Custom Search Engine)and CSBE (Custom Search Business Edition) notes that some results may be different than those found in the same search on Google.com.  It attributes this to including more than three sites in the CSE, and says that the CSE is using a subset of the Google.com index.  &lt;br /&gt;&lt;br /&gt;They recommend limiting the CSE to three sites, changing the behavior to 'Search the entire web but emphasize included sites', or adding refinements that have the same effect.&lt;br /&gt;&lt;br /&gt;As of August 16, 2007, the support note says "We're working to bring more complete results to all Custom Search Engines.".</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:75028</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/75028.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=75028"/>
    <title>Google Launches Site Search Service for Business</title>
    <published>2007-08-03T17:43:11Z</published>
    <updated>2007-08-03T19:37:35Z</updated>
    <content type="html">Google's Custom Search Business Edition uses the Google web search index limited by site or sites. It provides most of the Google web search features and is very cheap, only $100 per year for up to 50,000 pages, $500 for up to 500,000 pages. &lt;a href="http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=37075"&gt;More here at my InfoToday article&lt;/a&gt; / more at the &lt;a href="http://searchtools.com/tools/google-service.html"&gt;SearchTools Google Service report page&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;What do you think of it?</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:74946</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/74946.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=74946"/>
    <title>New Google hosted search with no advertising</title>
    <published>2007-07-19T23:48:35Z</published>
    <updated>2007-07-19T23:48:35Z</updated>
    <content type="html">&lt;p&gt;Called the &lt;a href="http://www.google.com/enterprise/csbe/index.html"&gt;Google Custom
	Search Business Edition&lt;/a&gt;, this is a hosted site search, designed for small businesses with web
	site content, who don't want the advertising displayed on the older &lt;a href="http://google.com/coop/cse/"&gt;Custom
	Search Engine&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This version uses Google's existing index of the Internet, searching all the pages they know about it
	on the specified sites including non-HTML file types, using their query language, retrieval and relevance
	algorithsm, and searching in multiple languages and character sets. Like the web search engine, there
	is no way to index pages protected by access control such as passwords or ACLs. &lt;/p&gt;
&lt;p&gt;The default interface customization
	is limited to a logo and colors of the results page border, title, background, text and links, but the
	XML results format is fairly configurable using the Google AJAX Search API. While there is no structure
	in place to display site advertising on search results, presumably one could do that very easily with
	XML results. Reports are limited to top queries and queries per day/week/month/all, but can be connected
	to the Google Activity Monitor site traffic analysis tool. &lt;/p&gt;
&lt;p&gt;Note that Google will not guarantee that they'll crawl all of the pages of a particular site, update
	on-demand, or even update frequently. Using this service will not improve a site's position in the Google.com
	search results. &lt;/p&gt;
&lt;p&gt;Pricing is $100 per year for up to 5,000 pages; $500 per year for up to 50,000 pages (both payable by
	credit card via Google Checkout). According to &lt;a href="http://www.ecommerce-guide.com/news/article.php/3689231"&gt;ecommerce-guide.com&lt;/a&gt;, it seems to go to a $15,000 per year
	fee for up to 1 million pages, but potential customers should contact the company. (Non-profits, university
	and government agencies can use the standard &lt;a href="http://google.com/coop/cse/"&gt;Custom
	Search&lt;/a&gt; and
	opt-out of advertising).&lt;/p&gt;</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:74698</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/74698.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=74698"/>
    <title>Swish-e - SearchTools Report Updated</title>
    <published>2007-05-04T00:00:24Z</published>
    <updated>2007-05-04T00:00:24Z</updated>
    <content type="html">&lt;a href="http://www.searchtools.com/tools/swish-e.html"&gt;Swish-e&lt;/a&gt;, a free open-source Unix search engine, Swish-e is fast at indexing and searching, and quite flexible. It can handle simple authentication, indexes HTML, text, XML, and (via converters), PDF, MS Word, Excel and MP3 ID3 tags, with an emphasis on storing feilds/tags for specifying during search. Resuts can be sorted by relevance, date, size, and other fields. It runs as a CGI to a web server (Apache recommended), and has a fairly active user and developer base. New features include adjustments to the relevance algorithm, "near" operator and "?" single character wildcard operator (in addition to "*").</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:74380</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/74380.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=74380"/>
    <title>DolphinSearch - SearchTools Report Updated</title>
    <published>2007-05-03T23:52:53Z</published>
    <updated>2007-05-03T23:52:53Z</updated>
    <content type="html">&lt;a href="http://www.searchtools.com/tools/dolphinsearch.html"&gt;DolphinSearch&lt;/a&gt;, with an unusual search algorithm based on neural networks and dolphin research, this search appliance is designed for legal research and corporate compliance, and integrates into an enterprise document management system. (May 3)</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:74091</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/74091.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=74091"/>
    <title>Analysis &amp; Review of  the Webinator Search Engine</title>
    <published>2007-04-24T23:23:02Z</published>
    <updated>2007-05-03T23:55:41Z</updated>
    <content type="html">&lt;p&gt;In this review, I cover every aspect of the Thunderstone Webinator search engine, looking

		at what's possible, what's special and what's missing. I've been much helped by the posts on the Webinator

		support mailing list and the frank answers from Thunderstone's representative, as well as several working

		indexes on one of their test appliances. &lt;/p&gt;

See my &lt;a href="http://www.searchtools.com/analysis/webinator-review.html"&gt;full review&lt;/a&gt; for details of indexing, access control, query processing, retrieval, relevance ranking, results page layout and search reports, and my conclusions.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:73489</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/73489.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=73489"/>
    <title>IBM OmniFind Yahoo Edition -  new SearchTools Report</title>
    <published>2007-04-15T22:41:17Z</published>
    <updated>2007-05-02T21:47:36Z</updated>
    <content type="html">&lt;a href="http://www.searchtools.com/tools/omnifind.html"&gt;OmniFind Yahoo! Edition&lt;/a&gt; is a free search engine based on the open-source &lt;a href="http://www.searchtools.com/tools/lucene.html"&gt;Lucene&lt;/a&gt; core, is a reasonably full-featured search that can index up to 500,000 pages, making it an interesting competitor to the &lt;a href="http://www.searchtools.com/tools/google-app.html"&gt;Google Search Appliance&lt;/a&gt;, &lt;a href="http://www.searchtools.com/tools/ultraseek.html"&gt;Autonomy Ultraseek&lt;/a&gt; and &lt;a href="http://www.searchtools.com/tools/solr.html"&gt;Solr&lt;/a&gt;, as well as lower-end search engines.  &lt;br /&gt;&lt;br /&gt;Features include an automated install package for Windows and Linux, browser administration, a powerful web crawling robot, file system remote crawler, index support for over 400 file types (using the Inside Out system for file reading), query parsing recognizes Internet Query Operators and Boolean operators, provides a spellchecker, synonym and suggestions, and Lucene-based stemming.   It indexes and searches Arabic, Czech, Danish, German, Greek, English, Spanish, Finnish, French, Hebrew, Italian, Japanese, Korean, Dutch, Norwegian, Polish, Portuguese, Russian, Swedish, Simplified Chinese, and Traditional Chinese. Searches can be sent via REST, and the results formatted within the admin interface, or sent back as ATOM, HTML with XSLT or XML, and linked to optional local document caching.  Enterprise support is available from IBM.  &lt;br /&gt;&lt;br /&gt;There are some first-release glitches, but it's a well-designed package that's easy to use interactively, with some powerful automation interfaces ready for those who need more flexibility. Definitely worth a look.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:73323</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/73323.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=73323"/>
    <title>Info Today Report: "Enterprise Search: Deployment, Usage and Trends"</title>
    <published>2007-04-12T21:09:30Z</published>
    <updated>2007-04-12T21:16:12Z</updated>
    <content type="html">A survey of 250 professionals connected to search in their enterprise has some enlightening results.  They were a fairly wide variety of industries, organization sizes, departments and roles (described in detail in the report), so the results are generally applicable.  &lt;br /&gt;&lt;br /&gt;This survey contradicts conventional wisdom by reporting that 62% of these enterprises have more than one search engine, with a 27% of having  four or more search engines.  In my view, this indicates the understanding that one search cannot solve all problems, and that some areas will require specialized, and usually more powerful, search solutions.&lt;br /&gt;&lt;br /&gt;The other response which surprised me was that 20% of respondents said they already provide search for audio and video, and 35% said they want to do so in the future.  I suppose some of that is podcasts and training videos, and it's a big challenge for search, although much easier if there are transcripts or textual captions.&lt;br /&gt;&lt;br /&gt;The report also covers integration with other applications (mainly CMS and KM), current search solutions, vendor support satisfaction, software vs. hosted search vs. appliance (only 17% reported using a search appliance), upgrade plans, and search features currently available and desired for the future.  There's a long section about the respondents' relative emphasis on various criteria for selecting a search solution, covering ease of use, features, integration, cost, scalability, speed, vendor reputation, ease of installation, upgradability, and vendor support.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.enterprisesearchcenter.com/Articles/ReadArticle.aspx?ArticleID=35706"&gt;This report is available on the Enterprise Search Center, at a cost of $495 US.&lt;/a&gt;  The study was conducted by Shore Communications and Faulkner Information Services.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:73028</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/73028.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=73028"/>
    <title>Solr Open Source Enterprise Search and Faceted Metadata Server - new SearchTools Report</title>
    <published>2007-03-30T18:43:29Z</published>
    <updated>2007-03-30T18:44:56Z</updated>
    <content type="html">The &lt;a href="http://www.searchtools.com/tools/solr.html"&gt;Solr&lt;/a&gt; Java open-source search engine builds on the &lt;a href="http://www.searchtools.com/tools/lucene.html"&gt;Lucene&lt;/a&gt; engine, adding more standard tools for indexing, query processing and sending back results. While Solr does not have a site indexing crawler, it can use &lt;a href="http://www.searchtools.com/tools/nutch.html"&gt;Nutch&lt;/a&gt; or any other robot crawler, and accept content converted from any native format to a simple XML schema. The architecture provides powerful tools for analyzing and transforming text to create a very rich index with structured fields. It accepts a wide variety of query operators, and parameters to control retrieval and ranking, including sorting on specified fields. The default relevance ranking can be tuned to suit the needs of the users and content, and the search results provide the valuable match terms in context for each item. In addition to lists of results, Solr has &lt;a href="http://www.searchtools.com/info/faceted-metadata.html"&gt;faceted metadata&lt;/a&gt; displays dynamically calculated for search results, allowing users to drill down on topics, date ranges, price, brand or any other attribute. For scaling to millions of documents and high search traffic, the system offers caching configuration, index replication, autowarming for new starts. It's installed on such high-traffic sites as CNET, Shopper.com, and The Internet Archive, showing its scalability, and has active developer and user mailing lists.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:72861</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/72861.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=72861"/>
    <title>Fluid Dynamics Search - SearchTools Report Updated</title>
    <published>2007-02-16T00:38:23Z</published>
    <updated>2007-02-16T00:38:23Z</updated>
    <content type="html">The &lt;a href="http://www.searchtools.com/tools/fluiddynamics.html"&gt;Fluid Dynamics Search Engine&lt;/a&gt; is a Perl CGI script that performs nicely on sites below 10,000 pages.  It can crawl links or read a local file system to gather text, HTML and PDF files, and includes extensive controls for excluding pages.  Search supports Internet query operators, Boolean operators and quotes.  There's an option to allow public submission of URLs for topical portal search, and the admin is all done via browser interface. Only $40, runs on Unix, Windows, Mac OS X, and the documentation is excellent.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:72672</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/72672.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=72672"/>
    <title>Alkaline - SearchTools Report Updated</title>
    <published>2007-02-15T23:30:50Z</published>
    <updated>2007-02-15T23:30:50Z</updated>
    <content type="html">&lt;a href="http://www.searchtools.com/tools/alkaline.html"&gt;Vestris Alkaline&lt;/a&gt;, from Switzerland, has been around for a long time but is still very actively updated.  Running on Unix and Windows NT, it has a web crawler than can handle multiple sites, with extensive rules options for including and excluding pages by url and extension.  It is mainly focused on web pages but external filters allow indexing of XML, PDF, Microsoft Word,  WordPerfect and other documents.  Can handle password and Windows NTLM access control, but displays all results (no hit-level authentication).  Query features include internet and Boolean operators, wildcards and number search; admins can adjust results weighting using a local GUI configuration interface. Standalone search server can run on any port.  Written in C++ for binary distribution, but source code licensing is available.  Low price: free for noncommercial sites, $350 for commercial sites.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:72434</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/72434.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=72434"/>
    <title>i411 - SearchTools Report Updated</title>
    <published>2007-02-14T23:47:57Z</published>
    <updated>2007-02-14T23:49:40Z</updated>
    <content type="html">&lt;a href="http://www.searchtools.com/tools/i411.html"&gt;i411&lt;/a&gt; is a &lt;a href="http://www.searchtools.com/info/faceted-metadata.html"&gt;faceted metadata search and browse engine&lt;/a&gt;, capable of scaling to very large deployments, such as the &lt;a href="http://dexonline.com"&gt;DexOnline&lt;/a&gt; yellow pages site, which uses it for both search results and browse navigation. The most recent version adds a web crawler to the local file and database connectors, a natural language module that can extract entities from queries and provide concept-based spellcheck, more flexibility in the search flow, and a SiteOptimizer analytics and reporting module to expose site dynamics and user behavior.&lt;br /&gt;&lt;br /&gt;(Disclaimer: I consulted with DexOnline and helped them choose the engine among a very strong field of candidates.)</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:72080</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/72080.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=72080"/>
    <title>DataparkSearch - SearchTools Report Update</title>
    <published>2007-02-01T22:43:15Z</published>
    <updated>2007-02-01T22:43:15Z</updated>
    <content type="html">&lt;a href="http://searchtools.com/tools/dataparksearch.html"&gt;DataparkSearch&lt;/a&gt; is a free open-source search engine written in C by some smart people in Russia as an offshoot of the &lt;a href="http://searchtools.com/tools/mnogosearch.html"&gt;MnogoSearch project&lt;/a&gt;. &lt;br /&gt;&lt;br /&gt;The biggest strength of this application is how well it handles languages and character sets.  It supports internationalized domain names and proper word-segmentation (tokenization) in many languages including Chinese, Japanese, Korean and Thai.  It can perform language detection on both text files and user queries.  Spellchecking, abbreviation and synonym query expansion are on a per-language basis.  &lt;br /&gt;&lt;br /&gt;This search engine has some fancy Information Retrieval features like fuzzy searching, Boolean queries, and their own "Neo popularity ranking" based on neural network research and link analysis.  Results templates include several innovative than simple listings (I'm not sure if I like them, but they're interesting).   It also has caching of the index files, search templates and the code, and can distribute indexes and search servers among multiple machines, for better responsive time.&lt;br /&gt;&lt;br /&gt;However, the code is distributed in source format for local compilation, and all the features are set via config files and runtime parameters --  it requires some comfort with command lines and programming tools.  But there's an excellent manual and an active forum of users and developers.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:71796</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/71796.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=71796"/>
    <title>Xapian Code Library SearchTools Report Updated</title>
    <published>2007-01-22T23:28:01Z</published>
    <updated>2007-01-22T23:28:01Z</updated>
    <content type="html">&lt;a href="http://www.searchtools.com/tools/xapian.html"&gt;Xapian&lt;/a&gt; is an active open source high-performance text retrieval system, based on years   of research and scalable to very large sets of documents. It now includes the Omega search engine, an application that implements the code library and makes it relatively simple to install and run.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:71440</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/71440.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=71440"/>
    <title>Webetiser (formerly re.s@earch suite) SearchTools Report Updated</title>
    <published>2007-01-22T23:24:18Z</published>
    <updated>2007-01-22T23:24:18Z</updated>
    <content type="html">&lt;a href="http://www.searchtools.com/tools/webetiser.html"&gt;Webetiser&lt;/a&gt;, which comes in a free version, &amp;quot;Worx&amp;quot;,  and various paid versions (including those for distribution via CD or other media), runs primarily on Windows NT. It has indexing and configuration wizards using a Windows GUI, and the paid versions read Word, Excel, PowerPoint and Acrobat PDF files (via OLE). It can index local file shares, and offers a reasonable set of templates for results presentation, using a set of JavaScript files. It's been around for a while as re.s@earch suite, and has some happy customers.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:71361</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/71361.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=71361"/>
    <title>Search Tools discontinued or no longer developed</title>
    <published>2007-01-22T21:33:58Z</published>
    <updated>2007-01-22T23:06:58Z</updated>
    <content type="html">The &lt;a href="http://www.searchtools.com/tools/docfather.html"&gt;Siteforum Docfather search engine&lt;/a&gt; is now part of the SFS-Software Siteforum Online Enterprise Suite.&lt;br /&gt;&lt;br /&gt;The &lt;a href="http://www.searchtools.com/tools/harvest.html"&gt;Harvest&lt;/a&gt; suites of web crawling and robot spidering tools are no longer being developed. They were some of the first developed in the field, and were widely used in the 1990s for locating and collecting information using multiple standard protocols.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.searchtools.com/tools/rutersearch.html"&gt;Rutersearch&lt;/a&gt; is gone: it was a personal project and is no longer distributed.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:71105</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/71105.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=71105"/>
    <title>Subject Search Server (SSServer) SearchTools Report Updated</title>
    <published>2007-01-22T21:21:40Z</published>
    <updated>2007-01-22T21:21:40Z</updated>
    <content type="html">&lt;a href="http://www.searchtools.com/tools/ssserver.html"&gt;Subject Search Server&lt;/a&gt; indexes local text and HTML files only, handles many languages and character sets. It has a technical and ambitious interface, offering control over the length and number of extracts to display, as well as defaulting to fuzzy search -- matching parts of the query terms rather than the more standard exact match -- which can be changed in the search form. Although it's free (with a link to the Kryloff site) on Windows, Linux and FreeBSD, it lacks a robot spider for indexing via HTTP, has less-than-user-friendly search form and results pages, and has no admin interface, using only configuration files.</content>
  </entry>
  <entry>
    <id>urn:lj:livejournal.com:atom1:searchtools:70781</id>
    <link rel="alternate" type="text/html" href="http://searchtools.livejournal.com/70781.html"/>
    <link rel="self" type="text/xml" href="http://searchtools.livejournal.com/data/atom/?itemid=70781"/>
    <title>empty queries - what's going on?</title>
    <published>2007-01-19T23:25:47Z</published>
    <updated>2007-01-20T00:10:47Z</updated>
    <content type="html">Does anyone have any good feeling as to why there are so many empty searches?  As far as I can tell, most of them are searches where the user somehow clicked the search button or put the cursor into the query field and pressed the Return key.&lt;br /&gt;&lt;br /&gt;I first noticed empty searches back in the day, when I was working with a party planning site.  I insisted on getting search logs and there were all those empty queries.  In that case, it was particularly important, because the database back end took an empty query to be a request for every item in the catalog, displayed by order entered.  When I reported it to my client, they fixed that and suddenly the load on the database went way down (a win for log analysis!)&lt;br /&gt;&lt;br /&gt;I'm still baffled.  I'm looking at a big busy site log right now, and checking out some of the sessions which include empty queries.  Most of the time, the user just goes away.  Sometimes they do another empty query (or click the search button). If they type a search term, it's almost never anything to do with the site contents -- they probably think they're on a webwide search engine.&lt;br /&gt;&lt;br /&gt;So what &lt;em&gt;should&lt;/em&gt; a search engine do if there's nothing in the search box?&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Show a simple query page (this is what most of them do).&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Do nothing.  This seems pretty easy to implement, just a little javascript around the search form.  But will it confuse people?  Frustrate them?  Make them want to kick the computer?&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Show a little dialog that says they have to enter something in the search field.  I know an intranet that does this, it seems to work very nicely, but that's a controlled environment.&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;br /&gt;Has anyone done any usability testing with this problem?  Any insights?  I'm boggled and would very much appreciate ideas here.</content>
  </entry>
</feed>
