SearchTools Blog (searchtools) wrote,
SearchTools Blog
searchtools

Search Indexing Spiders and Date Problems

Document Date Issues for Search Indexing

Search indexing spiders (aka robots and crawlers) follow links in HTML pages to find new pages. They also check known indexed pages to see if the content has changed. Generally, they do this by either getting the whole page again (HTTP GET) or to be more efficient, just the header (HTTP HEAD) or, even more so, send an "IF-MODIFIED-SINCE" (Conditional Get) request to get the whole page only if it's been updated since they last asked about it.

If the date reported is far past, the future, or the instant the indexer requests it, it makes the indexer waste cycles re-indexing unchanged content. Worse, it lies to searchers about the content currency, which is a vital element in assessing the value of a search result. Dates on web servers are not reliable which is one reason Google and Yahoo's Web Search results rarely even even show page date. Enterprise search can do better, if you can make the required changes on the server or publishing side.

Details of Search Indexing and Page Date Problems

Subscribe

  • Post a new comment

    Error

    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.
  • 0 comments