March 27th, 2003

Search Tools Report: Faceted Metadata Search

Faceted Metadata Search and Browse

Metadata is information about information: more precisely, it's structured information about resources. This can be a single set of hierarchical subject labels, such as a Yahoo or Open Directory Project category. More often, the metadata has several facets: attributes in various orthogonal sets of categories. This is often stored in database record fields and tables, especially for product catalogs.

Examples of faceted metadata include:

  • Music store: songs have attributes such as artist, title, length, genre, date...
  • Recipes: cuisine, main ingredients, cooking style, holiday...
  • Travel site: articles have authors, dates, places, prices...
  • Regulatory documents: product and part codes, machine types, expiration dates...
  • Image collection: artist, date, style, type of image, major colors, theme...

In all these cases, there is no single way to provides navigation for everyone: users have such disparate needs. One person might want to look through all the U2 albums, while another is looking for classical guitar or 1940s jazz releases.

Traditional Approaches to Structured Data Access

Parametric Search

Traditional field-based or parametric search engines for structured data have used a command line or provided a form to fill out:

AU:rosenfeld TI:web PB:oreilly



These require a lot of knowledge on the searcher's side: they have to know the values or choose from a popup menu. If they include too many parameters, they will probably not find any records that match their requirements -- a dead end. The possible values are hidden from the searcher, so all the work the editorial staff has done in defining and assigning attributes is lost.

Full-Text Search

Full text search engines can index all HTML metadata or gather data from multiple database fields or tables. Full text search wipes out the value of the metadata: a number 3 is just a number, not a size, price, product ID or other meaningful number, as it is in context of the tagged page or database record. Similarly, it's hard to know whether a recipe, for example, has chili pepper as a significant ingredient or minor flavoring. While many searches are just fine without that information, there are other cases where providing that context would be extremely helpful.

Faceted Metadata Search Solution

A good solution to these problems involves exposing the facets in dynamic taxonomies, so that the search user can see exactly the options they have available at any time. They can switch easily between searching and browsing, using their own terminology for search while recognizing the organization and vocabulary of the data.

Features for metadata search include

  • Displaying aspects of the current results set in multiple categorization schemes
  • Showing only populated categories, no dead-ends (links leading to empty lists)
  • Displaying a count of the contents of each category, warning the user how many more choices they will see
  • Generating groupings on the fly, such as size, price or date
  • Drill down by facet, so a diamond buyer could choose price, clarity, size and setting.
  • Adding special facets within categories: a Yellow Pages site would want to show cuisine and location for restaurant listings but not plumbers.


Tower Records (Endeca)
American Express Travel and Leisure (i411) (Siderean)
Do a search for your favorite artist or record title, and you'll see a list of search results, and on the left, a set of options including Genre, album feature, price range, format and more. After doing a search, the mid-right listing shows options for the matching articles, allowing travelers to choose the one that is most likely to answer their questions. In this case, a search for beach houses which have internet connections finds some results, and the interface allows vacationers to search and browse by the country, cost, number of bedrooms, and other criteria.

Applying Faceted Metadata Approaches to Unstructured Text

Most site and intranet documents don't have such rich tagging. They may have a title, modification date, and author, and almost all have a location and file size. However, there are tools available to perform entity extraction and external tagging, recognizing companies, people, products and other standard text. Using these tags, even unstructured documents can be approached with faceted metadata searching.

Faceted Metadata Search Resources

Flamenco Project
UC Berkeley professor Marti Hearst is investigating how faceted metadata can provide a dynamic information-architecture context for browsing and searching on web sites. She reports on extensive usability studies done with both textual and image databases.
Peter Merholz on Faceted Metadata Search (early 2002)
An eminent information architect explains the value of creating and searching faceted metadata.
SearchTools Report on Faceted Classification 
Faceted metadata is not just for search: it's a way of describing content in its many aspects. It is more flexible and extensible than traditional hierarchical organizations, because it does not attempt to put things in one category only. There are many systems for creating and maintaining faceted classifications.

Facets and Multiple Angles of Access Information Flow Newsletter, August 2002 by Ramana Rao
Insights into faceted metadata from the founder of Inxight, starting with how an information seeker might look for a resource. Describes the actual challenges of developing a faceted system with a diverse collection of documents. Points out that the value in this approach is by removing limits to accessing resources.
When Search Is Not Enough: Guided Navigation from Endeca IDC Bulletin, May 2002 by Mary Flanagan and Susan Feldman
A commissioned report, describing the problems of information overload in search results and the faceted metadata search implemented by Endeca's Navigation Engine. Provides business background, competitive analysis and software overviews. Uses Tower Records as a case study.
Dynamic Taxonomies: A model for Large Information Bases (link to PDF) IEEE Transactions on Knowledge and Data Engineering, May/June 2000 by Giovanni M. Sacco
Academic paper starts with the problems of searching and browsing huge data sets, and of expressing multiple taxonomies. Describes dynamic taxonomies derived from documents by analyzing their concepts, and a visual framework to browse the content. This allows users to find appropriate documents in a few clicks, even in very large data sets. Because the system only displays categories containing documents that fit the criteria, the users will never be in a situation where there are no documents. (See Universal Knowledge Processor page).

Faceted Metadata Search Engines

  • Endeca - online stores, directories, and intranets
  • Flamenco - working systems as described by Prof. Hearst of UC Berkeley
  • i411 - directories, online content, stores
  • Inxight - intranets with semi-structured data, such as pharmaceuticals
  • Siderean Seamark - online stores, content sites
  • Universal Knowledge Processor - examples include online catalog, image database, and newspaper

SearchTools Report: E-Commerce Search Engines

Search Engines for Online Stores and Commerce Sites


Online stores and product catalogs need a search function so that customers can bypass any hierarchical navigation and find the things they want to buy. It's a form of customer service, like having a knowledgeable sales person who can answer questions accurately.

Many online stores have no search engine or an inadequate one: when customers can't find what they want, they may well leave and never return.

SearchTools Commerce Search Checklist

  • Accept multiword keyword searches, without requiring SQL commands or Boolean operators.
  • Find matches on some or all words as keywords, as well as phrases
    • Better to find something than nothing.
  • Allow customers to enter product codes and product brand names as well as general topics.
    • Make repeat customers happy
  • Default to searching all the product information, but recognize a few special fields, such as size, color and price (see faceted metadata search)
  • Include synonyms, so a search for "red sweater" will find scarlet cardigans and magenta crew-necks.
  • Include site information, such as order status and return processing.
  • Index extensive product information, even if it's stored in back-end databases
  • Sort results so the most likely to be relevant items come first.
    • Perform user testing to learn what relevance means to customers
    • Analyze your search logs to find out what people ask for and where they go
    • Adjust the search indexing to include everything that's useful
  • Format results listings to show the most valuable information:
    • Emphasize the matching text using bold or colors, so it's clear why the item was found.
    • For physical products, such as clothes or groceries, show pictures.
    • Show the most important elements, such as price, size, brand name or compatibility information.
    • Include inventory status, so it's clear what's available and what must be backordered.
  • For searches which don't find any matches, provide a clear and helpful error page.
  • Generate a search log, which store operators can consult for free market research:
    • what's popular
    • what's trending down
    • what customers look for that you don't carry
    • what misspellings and typos they commonly make.

Articles and Reports on Commerce Search

  • Evaluating 25 E-Commerce Search Engines $99 from 37Signals , January 2003
    Research firm performed systematic evaluation on searching online stores. Criteria were accuracy and relevance for simple searches, handling misspellings, responding to "mixed" specifications (such as color, size and material in the same search), automatically expanding to synonyms and related terms, providing options for sorting and filtering results, and handling failed searches where no matches were found. They found that 92% of the commerce sites (including Lands' End, Amazon,, QVC and the Apple Store) found relevant results for standard searches, but most had significant problems with the other tests. Includes detailed analysis and screenshots of the results, and rating for each site.

  • At Talbot's, a search for increased web sales pays off Internet Retailer, August 13, 2002
    Report from eTail 2002 conference on a presentation by the e-commerce merchandise manager for Talbot's clothing store online. She gave examples of problems searches, such a as looking for "clutch" meaning a purse, but getting irrelevant results. At Talbot's, installing EasyAsk took less than a month and the search engine recognizes the context of search queries. With the new search engine and a search field on the front page, the number of product searches has increased 267%, while average orders have grown 18% and conversion rates have improved by 34%.
  • E-retailers seek improved search engines ComputerWorld August 12, 2002 by Carol Sliwa
    Report from eTail 2002 conference about online commerce sites responding to user demand by improving their search engines. Describes how Talbot's implementation of EasyAsk increased the average order size when using search by 18%, and the number of shoppers who search and then purchase by 34%. Nieman Marcus chose iPhrase and Spiegel chose Endeca after extensive comparisons.

  • When Searching Is No Longer Enough. Internet Retailer; April 2002 by Kurt Peters
    Discusses online store search engines: Mercado, EasyAsk, Endeca, and Netrics, referring to the 2001 Forrester report. Describes customer needs for technology that allows both searching and browsing by category or product attribute. Includes evidence of the good results at Tower Records with Endeca, but finds few examples of faceted metadata usage. Includes additional references to studies of e-commerce search engine demands and limits, suggesting significant changes from the traditional "relevance ranking" approach. Changes include recognizing the problems of a long search result, redirecting searches for brands or products not carried to similar available products, sorting and drilling down on product attributes, spelling correction and synonyms.

  • Desperately Seeking Search Technology (Commentary) BusinessWeek Online, September 24, 2001 by Robert D. Hof
    Quotes eminent analysts from Jupiter, Patricia Seybold Group and Forrester to support the value of a good search engine on commerce web sites in particular. Recommends ultrafast updates, as exemplified by FAST on eBay, tolerance of misspellings [and typos], synonym recognition such as EasyAsk on LandsEnd and search fields on every page, like Ritz Interactive. Also suggests using Amazon-like recommendations and providing information stored in private product databases to web search indexers such as Google.

  • The Search For Success InternetWeek, March 29, 2001 by Jody Dodson
    An expert in customer service for Internet business points out that fixing the site search may be much more cost-effective than complex CRM solutions. He recommends considering outsourcing using Ask Jeeves or a similar service; providing a content directory as well as a search engine; matching search form to function; knowing the audience and providing the appropriate search capabilities (such as product codes); and explaining the search functionality more clearly.

  • Not All Site Features Turn Online Shoppers Into Buyers PricewaterhouseCoopers, March 6, 2001
    A survey of 547 Internet users in January of this year found that over three-quarters of the respondents use search features (77%). Search functionality is considered the most important feature for online shopping by 43%, beating product information (40%), when choosing where to shop: both features led customer service, personalization and wish lists in selecting sites. When deciding what to buy, search functions also pay an important role, although enlarged product images, availability and comparison guides are more directly involved.
  • Revving Up the Search Engines to Keep the E-Aisles Clear New York Times, February 28, 2001 by Lisa Guernsey (registration may be required to read this article)
    Discusses the difficulty of locating items in online stores, referring to the Forrester report of last spring. Describes the use of thesaurus tools for synonym searching and taking advantage of database structure in online stores. Quotes the vendors Mercado, which provides search for WebVan and Tower Records, and EasyAsk, as well as the chief scientist at Verity.
  • Building Web Sites With Depth Web Techniques, February 2001 by Jakob Nielsen and Marie Tahir
    In a discussion of e-commerce sites, these analysts point out that search engines are an area that could be a strength of online business, but are generally a waste of time. They recommend making sure that the search engine covers the "nonproduct needs" such as how to pay, check a gift registry, and return items. They suggest designing thoughtful results, especially when there is no item that matches a search exactly (see our report on No Matches Pages). Another way to reduce the number of results is winnowing, allowing users to narrow down the list.

  • NNGroup report on e-commerce search engines, late 2000
    focusing on user experience. It says very much the same things we've been saying about search forms, results pages and search failures. Includes some solid test data backing up the recommendations to use a search box, recognize synonyms, accept various operators and errors, show helpful results metadata, explain results, handle search failure, and perform extensive search log analysis. Well worth the $45 to download the PDF report. See also Search Tools Report on Search User Interface and User Experience.

  • Must Search Stink? Forrester Report, June 2000 by Paul Hagen
    Focussed on ecommerce and B2B sites, this report describes the importance of site searching, and the problems with standard search engines. They emphasize that a simple term frequency algorithm for relevance rankings will often fail to return the best matches at the top of the list. It also points out how important content management, metadata and information architecture is for good search results.

    Recommendations include building a vocabulary and synonym listings so that searches for a specific term will find pages with all variants and equivalent terms, improving content management, and implementing good user interfaces to the search engine. They even have a section on the benefits of fixing search, showing how it makes bottom-line sense.

    Cost of search, mainly for e-commerce sites, is given at $150,000 for a search engine, $150,000 to integrate with existing databases and $60,000 for user interface and testing, along with an estimate of $4 per page or item for page titles, descriptions, removing duplicates and creating a controlled vocabulary.

    SearchTools Analysis of the Forrester Report

    I like this report a lot: it's clear on how important site search is and how traditional algorithms fail to retrieve and sort the results well. However, they don't emphasize the special issues that may arise in searching structured data (such as product catalogs), and they mix up retrieving relevant items with ranking (so that the relevant items appear on the first page). In addition, they don't understand how much people like a simple search field -- it's a Web convention that is now inescapable. And the cost is appropriate for a large e-commerce site, rather than a smaller or simpler site, such as an online magazine, small store, or corporate site.

Commerce Search Engines