This is an off-site copy of the corresponding Product report page on the SearchTools.com website, and it is designed to allow you to comment on the product and/or the reporting. For more information about the topic of search and tools visit SearchTools.com where you can browse many articles, in-depth analysis and overviews of external resources.
Price: free, open source under the GNU General Public License
Platforms: Solaris, HP/UX, IRIX, SunOS, Linux, Mac OS X, Mac OS 9 (from
Potential Security Flaw,
3.1.0b2 through 3.1.5 and 3.2.0b3 users should upgrade
- Free and open-source, written in C++
- Extremely helpful user group
- Designed for source-level modification and customization
- ConfigDig: a template-based
HTML front end for easy search administration from any browser. This allows
remote configuration by search admins who are not expert at Unix command-line
- Can handle multiple sites and over 100,000 pages
- Index spider is quite robust and handles error conditions gracefully.
- Metadata indexing is configurable, easy to add Dublin Core DC tags.
- Can index PDF, MS Word, PowerPoint (see tips)
- Many options for indexing and searching non-exact matches, including stemming,
soundex and fuzzy matching.
- Note: version 3.2 will have phrase matching, current version only does AND
or OR Boolean search.
- Searching on field or metadata contents not yet implemented.
- Version 3.2 still in beta test
October, 2001: fixes a potential security flaw, 3.1.x users should update
to this version.
- 3.1.5, February 2000: fixes bugs including a serious security-related
bug in all previous releases.
- 3.1.3, September 1999: fixed additional bugs, META robot parsing, compound-word
- 3.1.2, April 1999 (RedHat updates August 1999): improved Acrobat PDF compatibility,
META description tag display, many bug fixes including Y2K improvements.
- Version 3.1 released February 9, 1999.
Articles & Reviews
Open Road: Using ht://Dig UnixReview, April 2002 by Joe "Zonker"
Part 1 is a short but helpful discussion of how the indexing and search work,
formatting results, scheduling and configuration. Part
2 talks about tuning the search engine for speed and efficiency.
Open Source Indexers Infomotions Musings; May 29, 2001 by Eric Lease
Describes the history and features of eight open-source search engines, freeWAIS-sf
(aging code and hard to install, but good for searching email and public domain
etexts); Harvest (powerful gathering features for
frequently-changing data stores, good with structured documents); ht://Dig
(tricky to configure, no phrase searching, automatic stemming and match word
highlighting); Isearch (weak documentation and
support, easy to install, dated interface, Z39.50 support); MPS
Information Server (zippy indexing of both text and structured data, Z39.50
support, Perl API, limited documentation); SWISH-E
(simple to install engine, CGIs in Perl and PHP still beta, good for HTML
pages, recognizes new META tags, sorts results by field; WebGlimpse
(easy to install and configure, requires commercial version for customized
output); Yaz/Zebra (mainly Z39.50, no Perl API, mainly
a toolkit to index and respond to distributed client queries). Article also
points out that chaotic information is less than helpful and encourages organization,
structure and vocabulary control.
love it when a plan comes together PalmPower magazine: March 2001 by
Rambling but cheerful description of setting up a search engine for ZATZ web
sites using ht://Dig, indexing only the appropriate articles and not the alternate
forms or contents pages. Some digressions into robots.txt, Linux and PHP.
- Indexing File Formats: from the ht://Dig
mailing list, 19 December, 2000 by David Adams
- ht://Dig can index PowerPoint using ppt2html, though perhaps not the
- It can also index Microsoft Word documents using wp2html ("which
extracts the 'subject' from the document summary and puts it in the header,
which might be a problem") and catdoct (which "does often include
gibberish in its output, and you could find removing the -b option in
the call of catdoc an improvement.")
- Doc2html.pl uses pdfinfo to extract the title of the .PDF file, and
I have seen .PDF documents where the title is 'þÿ ' for some reason. You
might need to modify doc2html.pl to suppress such titles.
- Search Engines: The Hunt
Is On Network Computing Magazine: October 16, 2000 by Avi Rappoport
In-depth discussion of search engines for e-commerce and other web sites covers
features and future trends, software vs. services, database vs. text searching,
searching, and open-source
search engines covering ht://Dig and mnoGoSearch.
- Search This!
Developer Shed, March 15, 1999 by Colin Viebrock
Helpful hints and information about installing, configuring, indexing,
searching, and displaying results, specifically for those running PHP servers.
- ht://Dig: Recognized META information
How to set ht://Dig to recognize meta keywords, email addresses and other
- ht://Dig survey
List of more than 35 ht://Dig installations, including number of servers,
of documents, of words; update frequency; number of hits per day; index size,
primary use (intranet, educational, etc.), and problems.
gegen den Compass Server (in German) late 1998, minor updates 1999,
by Walter Hafner
Evaluation compares Compass with ht://Dig,
and PLWeb. Praises Compass for browser administration
system, filtering, virtual hosts, and realtime monitoring: downsides are proprietary
database format and price. PLWeb has some text configuration files, every
efficient customization, good indexing speed and scheduling, multiple indexes
can be created, and it is free; however it had problems with multiple servers,
domains and virtual hosts. Ht://Dig is open source and free, but can be hard
to configure, slow to index and search large indexes, and provides minimal
Developer.Com Guide To Search Engines Wes Sonnenreich and Tim MacInta:
John Wiley & Sons, February 1998, ISBN 0471246387 $34.99.
A wide-ranging book covering everything from the beginnings of the robot
spiders crawling and indexing the web to analysis of the major webwide search
engines to detailed information on installing and configuring six local site
search tools. The programs covered are AltaVista
Search Intranet, Excite for Web Servers, Harvest,
ht://Dig, Phantom and Ultraseek. Also describes BDDBot:
An ongoing collaborative project, to create a Java web server and search spider,
using open source under the GNU public license. Use these links to buy from
and you'll support this site.
of Site Search Packages WebReview Nov. 21, 1997
Chart covering Excite, Microsoft Index Server,
ht://Dig, Verity Search97, Netscape
Catalog and SWISH.
- Web Site Search Engines
(Appendices) The PIPER Letter: November, 1996
Covers Basis Webserver , Excerpt, Excite,
Folio, freeWAIS-sf, FrontPage, Fulcrum,
Glimpse, ht://Dig, Ice, Isearch, PL
Web, Swish, TEAMate, WebFind/WebIndex, and WebSite.