July 23rd, 2003

searchtools.com

Product Report: Lotus Extended Search

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

Lotus Extended Search

Product Information

Price: [1998 pricing] $3,995 (1-4 processors), $9,995 (5+ processors); $30 for each client access license
Platform: Windows NT 4, 2000, Unix AIX 4.3.3

Features

  • A metasearch engine for Notes, legacy databases and Web search sites.
  • Distributes queries and aggregates results.
  • Does not require Lotus Domino or Notes clients, though both are supported.
  • Searches Lotus Notes, IBM DB2, Oracle, Sybase, MS SQL Server, MS Access, Domino.Doc, MS Index Server, MS SiteServer, MS Exchange public email storage and over 18 Web search sites (e.g. HotBot, Excite, AltaVista, Usenet Newsgroups, etc.).

Articles

  • Anatomy of a Domino e-commerce Web site Notes.net; October 2, 2000 by Michael Patrick
    Case study of an educational bookstore, starting with navigation and searching. Describes how to index a database, how it works behind the scenes and what the elements of the form contain.

  • Domino R5: Domain Search Notes.net; May 3, 1999 by Susan Florio with David Kajmo
    How to set up a search enigne for a Domino-driven web site.

  • Lotus moves to converge knowledge management and collaborative computing PC Week Online, June 23, 1998 by Christy Walker
    Describes the Lotus plan to add searching (Domino Extended Search), data mining, middleware "preference engine" for customizing search results, and updated conferencing and discussion tools. Lotus will be integrating their acquisitions: DataBeam Conference Server and Ubique VirtualPlaces, into Notes and working with business partners for additional knowledge management features.

  • The Search is On for Lotus InternetWeek, July 27, 1998 by Jeffrey Schwartz

  • Lotus to introduce Domino Extended Search 1.0 PCWeek Online July 27, 1998, by Christy Walker.
    Descriptions of the new Extended Search add-on to Domino servers. It will allow users on either Lotus Notes or web clients to search multiple databases and webwide search services simultaneously, by accessing the existing indexes. Databases can be Notes or any ODBC-compatible system. LotusScript allows integrators or administrators to customize the application. An API allows access to other site search engines, and Lotus plans to offer direct interfaces to major engines in future releases.
searchtools.com

Product Report: Smartlogik Discover (APR)

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

Smartlogik Discover (APR)

see also: Xapian

Product Information

Platforms: Windows NT, 2000; Unix: Sun Solaris, others on request
Pricing: Typical projects range from about £150,000 to £450,000 for deployment and the license fee.

Background

The Muscat search engine comes from research performed by Dr. Martin Porter of Cambridge University (UK), and was commercialized in 1984 by Cambridge CD Publishing. It was subsequently sold to MAID which became The Dialog Corporation and in 2000 changed its name to Bright Station plc: it seems to have been spun off in 2001 to Smartlogik. In April 2002, the assets of Smartlogik were bought by Applied Psychology Research Ltd. APR offers support for existing users of Muscat/Smartlogik software and continues to develop products based on the technology.

June 2002: Muscat users can contact Lemur Consulting for support such as a spidering/indexing system.

Features

  • Can index free text such as web pages
  • Also has tools for extracting data from structured datasets like Oracle or Lotus Notes.
  • It applies probability theory and linguistic inference algorithms to natural language searching.
  • Performs language recognition and language-specific stemming.
  • Stopword removal.
  • Noun and document structure recognition, word position analysis.
  • Find similar documents feature.
  • Also supports Boolean and wildcard searching.
  • Dynamic word weighting gives priority to rare words.
  • Personalized alerting (also known as filtering or current awareness).
  • Flexible configuration features.
  • J2EE, XML and API integration.

Articles

  • Unstructured Information Management Report Infosphere, March, 2003 by Magnus Stensmo and Mikael Thorson, $325/€295 for a single PDF license
    General report on search and categorization tools. Describes how the APR Smartlogik search and categorization system integrates with user profiling for better results. Considers this to be a "solid market offering".
  • Seeking far and wide for the right data InfoWorld, August 27 / September 3, 2001 by Cathleen Moore.
    Describes the value of search engines and categorization as essential elements of corporate portal infrastructures, to handle the "deluge" of information within enterprises. Quotes Aberdeen analyst Guy Creese who points out that without a good way to search, corporations would be "blowing their investment in the content". Covers recent announcements of search and categorization features by Autonomy, Verity, AltaVista, iPhrase, and Smartlogik (Muscat).

  • Search technology gains recognition InfoWorld, July 30, 2001 by Cathleen Moore
    Covers search and categorization technology for corporations from Verity and Smartlogik (Muscat). Includes analyst quotes about the importance of search in handling huge amounts of content. Describes Verity "Social Networks" using business rules to make taxonomies more useful. Smartlogik's Muscatdiscovery search engine stresses natural-language searching, Muscatstructure provides rules-based categorization, both can integrate into application servers using Java or COM+.
searchtools.com

SearchTools Report: Full-Text Searching and Database Content

Searching for Text Information in Databases

Databases provide the content storage for many sites, which dynamically create web pages around them, including ecommerce catalog sites, online news, and even entertainment sites. Intranets often contain large amounts of text stored in databases as well.

These databases generally have their own search functions, which may appear to take the place of a full-text search engine. But that's not always the case. Database search is not oriented towards text search and relevance ranking: it is great for locating widgets by part number and listing the inventory of leather slippers, but not so good at helping site visitors find the articles on widget quality or comparing leather and fleece slippers. Text search engines are often the right solution, moving the processing load from the database to the search engine.

Database search vs. text search

Searching Many Fields and Tables

Databases store their information neatly organized into fields, such as product name, category, description, price and so on. However most people don't like to choose a field before searching: who knows whether widget is in the name, category, or description field? While databases can set up complex queries to find the search words in all applicable fields, this makes them slower to respond, requires more memory, and is more difficult to program. Text search engines store this information in a single index and can find words in any field for a record. Many high-end search engines can also store field information, so searches can be limited to a specific field as well.

Simple Search Commands

In many cases, users must type in complex Boolean or SQL (Structured Query Language) commands for searching. Some databases, such as MySQL, are limited to "Or" searches -- they will return all records which match any of the search terms. Others default to exact phrase matching, so a search for fuzzy slippers would not find slippers, fuzzy.

While some programmers and librarians enjoy the control that query languages give them, most people do not. This is particularly important for searching multiple words: if someone types in brown bear, they probably want to see records with bear with a brown coat in them as well, without typing bear AND brown. Search logs show clearly that most people will not do anything complicated while searching -- they will give up if the search is too hard to use. Full-text search engines offer simple and flexible search options, with most providing an Advanced Search feature for power users.

Flexible Search Processing

Database search functions tend to look for exact matches in capitalization and characters. If someone searches for pokemon, they won't find records with Pokemon or pokémon in them. Many text search engines will automatically match lowercase searches with any text, and will adjust for extended and diacritical characters. Some also allow search administrators to set up synonyms for searching (doctor and physician), and automatically perform stemming (find octopi when searching for octopus). A new trend in text search engines is to incorporate spelling checkers, which are particularly useful for queries which do not find any matches. By checking the words already in the index, the search engine spelling checker can offer suggestions which apply to this particular content, which is extremely useful.

Some databases even return a list of everything they have if a user enters an empty search string -- this requires significant server processing time. Search engines tend to just report that they found no matches.

Database Structure

Many databases are not designed for easy searching. In a wine database, for example, it may be difficult to find wines from a certain region, such as the Napa valley of California, because the location is stored in a table far away from the wine names. Some text search engines, such as dtSearch, FAST Data Search, Ultraseek, and Verity K2, provide a rich data set for searching. Mercado's IntuiFind provides options to entirely rearrange a database structure, if that's what's necessary for searching.

Response Time and Database Resources

Databases are optimized to search for exact words and phrases, and they tend to respond very slowly otherwise. So if a searcher wants to find sheepskin or shearling in the same search, databases will tend to do two searches and then merge the results. Full text search engines are designed to store these words in a single index, so they can perform these kinds of searches efficiently and return quickly.

In addition to the time, searching databases requires additional back-end and server resources. By storing a search index on a separate server and searching that rather than the live data, a text search engine can perform queries without additional demands on the database itself.

Results Per Page

Many database search engines will happily display all results on the same page, whether there are 8 or 8,000 records. Text search engines have a mechanism for dividing up the results and providing navigation from the first page to following pages (and back).

Sorting by Relevance

Text search engines sort by relevance, as determined by the number and location of matched words in the result page or record. Database search functions sort by size, or price, or date, or the order in which the items were entered in the database!

Many text search engines can sort results by date as an option, and a few of them can sort by price, size, geographic location, etc.


Connecting Full Text Search Indexer to the Database

To add a text search engine to a database-generated site, the engine can connect to the database directly, using SQL, ODBC, JDBC or a native connector. This is efficient because it reduces the overhead of going through the HTTP server and creating a new session for every record. It also contains only the content text, rather than navigation links, copyright information and other inappropriate content. In some cases, the database can track new and changed records, and only give this new data to the search engine indexer, on demand or as part of an application interface.

If there is no way to connect directly to the database, a search engine spider can crawl the pages generated from the database, viewing the pages as a browser would. In some cases, this can be a special crawl for pages designed just for indexing, whether detected by the server or as a special set of URLs that are then converted to user-viewable URLs. Again, the database should provide only new and changed records, rather than all records every time.

Tools for Extracting Database Content

  • Quigo Intelligence - interfaces with databases and makes the content visible to search engine crawlers or as XML feeds while retaining the valuable structures.
  • Your Amigo - has indexing agents which can correlate database access forms with backend database APIs, either for the internal search engine or as a feed to other search engines.

Search Engines with Database Interfaces


Databases with Search Engine Interfaces

These databases incorporate a full-text search function in some way.

  • IXIASOFT TEXTML database -XML database with real-time index synchronization, grouping of elements and attributes in indexes. Search features include Boolean, proximity and frequency operators. Results sorting, site is not clear on their relevance algorithm.
  • Jasmine (Computer Associates) - AmTSG Controlled Vocabulary Search
  • MySQL - handles simple and Boolean queries, returns results sorted by relevance, using a TV/IDF ranking algorithm.
  • Oracle Search - various text and multimedia search engines
  • SQL Server
  • Sybase - Verity API
  • Thunderstone Texis and Webinator - high-end RDBMS with integrated search functionality.
searchtools.com

Product Report: Siderean Seamark Metadata Search

This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.

Siderean Seamark Metadata Search

(formerly code named Teapot)

Product Information

Platform: RDF toolbox Java 2 (v.1.31), Windows 2000 and XP, Unix: Solaris, Red Hat Linux
Price: contact company

Features

  • Designed for faceted metadata for structured data including e-commerce product databases, news feeds, or scientific articles
  • Uses RDF an an intermediate storage format.
  • Admin interface provides tools for adjusting categories.
  • Uses Lucene search engine for queries.
  • Performs a default "and" search, finding items which match all query words.
  • Extends search results with browsable sets of options such as price, size, color, etc.
  • Can deliver results using Web Services and SOAP.
  • Option for simple template search results in browser.

Examples

  • BeachHouse.com - beachfront rental properties search engine
  • Fortunoff Fine Jewelry - online store
  • Recipes - example on developer site
  • Medical journal articles - example on developer site
    Disclosure: Search Tools Consulting has provided analysis and information to this search engine developer. We do not give them site visitor or survey personal information, or allow our relationships with any vendors to change any product review or analysis
</searchtool>