Databases provide the content storage for many sites, which dynamically create
web pages around them, including ecommerce catalog sites, online news, and
even entertainment sites. Intranets often contain large amounts of text stored
in databases as well.
These databases generally have their own search functions, which may appear
to take the place of a full-text search engine. But that's not always the
case. Database search is not oriented towards text search and relevance ranking:
it is great for locating widgets by part number and listing the inventory
of leather slippers, but not so good at helping site visitors find the articles
on widget quality or comparing leather and fleece slippers. Text search engines
are often the right solution, moving the processing load from the database
to the search engine.
Database search vs. text search
Searching Many Fields and Tables
Databases store their information neatly organized into fields, such as
product name, category, description, price and so on. However most people
don't like to choose a field before searching: who knows whether widget
is in the name, category, or description
field? While databases can set up complex queries to find the search words
in all applicable fields, this makes them slower to respond, requires more
memory, and is more difficult to program. Text search engines store this
information in a single index and can find words in any field for a record.
Many high-end search engines can also store field information, so searches
can be limited to a specific field as well.
Simple Search Commands
In many cases, users must type in complex Boolean or SQL (Structured Query
Language) commands for searching. Some databases, such as MySQL, are limited
to "Or" searches -- they will return all records
which match any of the search terms. Others default to exact phrase matching,
so a search for fuzzy slippers would not find slippers,
While some programmers and librarians enjoy the control that query languages
give them, most people do not. This is particularly important for searching
multiple words: if someone types in brown bear, they probably
want to see records with bear with a brown coat in them as
well, without typing bear AND brown. Search logs show clearly
that most people will not do anything complicated while searching -- they
will give up if the search is too hard to use. Full-text search engines
offer simple and flexible search options, with most providing an Advanced
Search feature for power users.
Flexible Search Processing
Database search functions tend to look for exact matches in capitalization
and characters. If someone searches for pokemon, they won't
find records with Pokemon or pokémon in
them. Many text search engines will automatically match lowercase searches
with any text, and will adjust for extended and diacritical characters.
Some also allow search administrators to set up synonyms for searching (doctor
and physician), and automatically perform stemming (find
octopi when searching for octopus). A new trend
in text search engines is to incorporate spelling checkers, which are particularly
useful for queries which do not find any matches. By checking the words
already in the index, the search engine spelling checker can offer suggestions
which apply to this particular content, which is extremely useful.
Some databases even return a list of everything they have if a user enters
an empty search string -- this requires significant server processing time.
Search engines tend to just report that they found no matches.
Many databases are not designed for easy searching. In a wine database,
for example, it may be difficult to find wines from a certain region, such
as the Napa valley of California, because the location is stored in a table
far away from the wine names. Some text search engines, such as dtSearch,
FAST Data Search, Ultraseek, and Verity K2, provide a rich data set for
searching. Mercado's IntuiFind provides options to entirely rearrange a
database structure, if that's what's necessary for searching.
Response Time and Database Resources
Databases are optimized to search for exact words and phrases, and they
tend to respond very slowly otherwise. So if a searcher wants to find sheepskin
or shearling in the same search, databases will tend to do
two searches and then merge the results. Full text search engines are designed
to store these words in a single index, so they can perform these kinds
of searches efficiently and return quickly.
In addition to the time, searching databases requires additional back-end
and server resources. By storing a search index on a separate server and
searching that rather than the live data, a text search engine can perform
queries without additional demands on the database itself.
Results Per Page
Many database search engines will happily display all results on the same
page, whether there are 8 or 8,000 records. Text search engines have a mechanism
for dividing up the results and providing navigation from the first page
to following pages (and back).
Sorting by Relevance
Text search engines sort by relevance, as determined
by the number and location of matched words in the result page or record.
Database search functions sort by size, or price, or date, or the order
in which the items were entered in the database!
Many text search engines can sort results by date as
an option, and a few of them can sort by price, size, geographic location,
Connecting Full Text Search Indexer to the Database
To add a text search engine to a database-generated site, the engine can
connect to the database directly, using SQL, ODBC, JDBC or a native connector.
This is efficient because it reduces the overhead of going through the HTTP
server and creating a new session for every record. It also contains only
the content text, rather than navigation links, copyright information and
other inappropriate content. In some cases, the database can track new and
changed records, and only give this new data to the search engine indexer,
on demand or as part of an application interface.
If there is no way to connect directly to the database, a search engine
spider can crawl the pages generated from the database, viewing the pages
as a browser would. In some cases, this can be a special crawl for pages
designed just for indexing, whether detected by the server or as a special
set of URLs that are then converted to user-viewable URLs. Again, the database
should provide only new and changed records, rather than all records every
Tools for Extracting Database Content
- Quigo Intelligence -
interfaces with databases and makes the content visible to search engine
crawlers or as XML feeds while retaining the valuable structures.
- Your Amigo - has indexing
agents which can correlate database access forms with backend database
APIs, either for the internal search engine or as a feed to other search
Search Engines with Database Interfaces
- Albert web - interfaces to Oracle,
Sybase, mySQL, PostgreSQL, Informix, MS-SQL Server, ODBC and dBase formats
- Aurweb combines database and
- Convera RetrievalWare and Web
Express - native bridges to Oracle, Sybase, Informix and MS SQL,
- Dieselpoint Search - JDBC
connector to databases including DB2, Oracle, Informix, MS SQL, MySQL, PostgreSQL,
- Domino (Lotus) Extended Search
- can query Lotus Notes, IBM DB2, Oracle, Sybase, MS SQL Server, MS
- dtSearch Web - ODBC
- EasyAsk - All ODBC Data Sources;
native SQL for Oracle, Sybase, DB2, Informix and SQL Server; supports Flat
Files and Database Views
- Endeca - faceted metadata search,
dynamic taxonomies with database attributes
- FAST Data Search - JDBC
and ODBC connectors to Oracle and other DBMSs.
- IMP Database Search Engine - Replaces
Microsoft SQL Server 2000's full-text search.
- interMedia (Oracle multimedia
- Isys:web - ODBC
- Mercado IntuiFind - ODBC, JDBC
- MondoSearch - Microsoft
Commerce Server partner
- MPS Information Server - various
- Muscat - Oracle gateway
- OMNIDEX - ODBC, JDBC, JNI
and the OmniAccess API. This search engine does not require Boolean or SQL,
groups many fields, case insensitive, diacritical conversion, etc.
- Oracle Search - text
and multimedia search engines
- Plumtree Portal Search Server
(formerly Ripfire) - JDBC to Oracle, Sybase, Informix and SQL Server, etc.
- Siderean Seamark Metadata Search
(formerly Teapot) - faceted parametric search shows alternate aspects from
- Thunderstone Webinator and Texis
RDBMS - text search integrated into high-end database software
- Triplehop MatchPoint (proprietary
- Ultraseek - JDBC, also ODBC
on Windows NT/2000 and direct Oracle access on Unix.
- Verity - ODBC, Sybase has a
- YourAmigo - flexible and
customizable tools to extract content from complex databases, JDBC and ODBC
Databases with Search Engine Interfaces
These databases incorporate a full-text search function in some way.
- IXIASOFT TEXTML database -XML database
with real-time index synchronization, grouping of elements and attributes
in indexes. Search features include Boolean, proximity and frequency operators.
Results sorting, site is not clear on their relevance algorithm.
- Jasmine (Computer Associates) - AmTSG
Controlled Vocabulary Search
- handles simple and Boolean queries, returns results sorted by relevance,
using a TV/IDF ranking algorithm.
- Oracle Search - various
text and multimedia search engines
- SQL Server
- Sybase - Verity API
- Thunderstone Texis and Webinator
- high-end RDBMS with integrated search functionality.