June 11th, 2010


Thoughts on Federated and Aggregated Search

Federated vs. Aggregated Search Architectures

Federated search systems accept user queries, convert the query language, and send the queries to one or more remote search engines. They then display the results, sometimes in separate blocks, sometimes merged together. vital for searching external or un-owned data sources, such as national patent databases or legal archives. Federated search requires a lot of work to translate queries and deal with results. The heavy lifting is done at search time, which is good for absolutely current content and access control.

Aggregated search systems gather and index text from many different data sources. When the user sends a query, it can be handled locally. Aggregated search requires some work to get data from multiple data sources, and the ability to scale index size nearly exponentially

My research for this presentation indicated that each is useful in specific circumstances (I know, no surprise there). Many data sources are obviously best accessed by one or the other, but it's the corner cases that are tricky. Aspects to consider include:

  • size of the content in the source,
  • how often your users need that content,
  • content change rate
  • importance of real-time access control permissions changes
  • content licensing rules
  • available tools for indexing / querying
  • difficulty of extracting and indexing
  • quality of the internal search engine
  • difficulty of sending queries and receiving results.

Slides (with fish!)     presented by Avi Rappoport at ESS, May 2010

Federated and Aggregated Search, Web View (color PDF)

Federated and Aggregated Search, Printable (grayscale 4-up PDF)

Comments? Arguments? Explanations? Please discuss below. Want an analysis of your data sources? I can do that, comment here or send me a message.