10.3 Excluding Documents from Being Indexed

One way to improve search results is to guard what content is actually indexed, thus clearing a path for relevant information.

10.3.1 Using the Extensions to Exclude Option

You can use the Extensions to Exclude option to direct QuickFinder to ignore specific file types. For example, if you don’t want Word or PowerPoint documents to be included in search results, you can specify DOC and PPT in the Extensions to Exclude field. When these document types are encountered during an indexing job, QuickFinder skips them.

For more information on the Extensions to Exclude option, see Creating an Advanced Crawled Index and Creating an Advanced File System Index.

Using the Extensions to Include Option

As mentioned above, you can use the Extensions to Exclude option to direct QuickFinder to ignore specific file types. However, if you can’t specify all of the extensions to exclude, use the Extensions to Include option and specify all acceptable file extensions. A typical list would specify HTM, HTML, PDF, TXT, and DOC.

HINT:When entering extensions in the Extensions to Exclude box, separate each extension by a space or a hard return. Don’t use commas. For example:

htm html pdf txt doc

Using the Robots Meta Tag

Another effective way of controlling what QuickFinder indexes is to use the Robots meta tag. This tag is inserted into header section of a document and contains instructions about what should (or should not) be indexed.

When a Web-based search engine encounters a document containing the Robots meta tag, the search engine does as the meta tag instructs.

There are several values you can specify in the Robots meta tag

NOINDEX: Indicates that the document is not to be indexed.

NOFOLLOW: Indicates that hypertext links in the document are not to be crawled.

FOLLOWINDEX: Indicates that hypertext links in the document should be crawled.

ALL: Indicates that the document can be indexed and all links can be crawled.

NONE: Indicates that the document is not to be indexed and that hypertext links are not to be crawled.

To include the Robots meta tag, use this syntax in the header section of the document:

<META name=”Robots” content=”value, optional_value”>

Using the Robots Comment Tag

You can also use the Robots Comment tag to exclude specific sections of HTML documents from your search results. For example, you might not want such sections as repetitive headers, footers, navigation bars, and server-side includes to be indexed.

HINT:You can also place these tags at the top and bottom of all include files so these sections are never indexed when they are part of a larger document.

To direct QuickFinder where to begin skipping content while indexing:

  1. At the point in your HTML document where you want QuickFinder to begin skipping content while indexing, use the following tag:

    <!--*Robots NoIndex- >
    
  2. Just after the content you want skipped, use the following tag:

    <!--*Robots Index- >
    
  3. Save your changes and index (or reindex) the content.