You can improve the accuracy of your search results by following these indexing guidelines:
When defining and creating your indexes, start with the highest possible Web site URLs and file system paths.
If content is showing up in your search results that you don't want included, try removing some paths or URLs from your defined indexes. Also, try excluding specific subdirectories that you know or suspect might contain content that you don't want searched.
If you've indexed too many file types and cluttered your search results, try removing file types that you don't want indexed by using the Extensions to Exclude option on the Define Index page. See Using the Define Crawled Index (Advanced) Page and Using the Define File System Index (Advanced) Page for more information.
Use the Robots META tag in your Web site's content.
Exclude documents or specific sections of documents, including headers, footers, and navigation bars.
Use the Robots Exclusions standard (the /robots.txt files found on almost all Web sites). The Index Definition’s excludePaths settings are automatically combined with the robots.txt settings when crawling and indexing. You can turn robots.txt support on or off.
One way to improve search results is to guard what content is actually indexed, thus clearing a path for relevant information.
You can use the Extensions to Exclude option to direct QuickFinder to ignore specific file types. For example, if you don't want Word or PowerPoint* documents to be included in search results, you would enter DOC and PPT in the Extensions to Exclude field. When these document types are encountered during an indexing job, QuickFinder skips them.
For more information on the Extensions to Exclude option, see Using the Define Crawled Index (Advanced) Page and Using the Define File System Index (Advanced) Page.
As mentioned above, you can use the Extensions to Exclude option to direct QuickFinder to ignore specific file types. However, if you can’t specify all of the extensions to exclude, use the Extensions to Include option and specify all acceptable file extensions. A typical list would specify HTM, HTML, PDF, TXT, and DOC.
HINT:When entering extensions in the Extensions to Exclude box, separate each extension by a space or a hard return. Avoid using commas. For example:
htm html pdf txt doc
Another effective way of controlling what QuickFinder indexes is using the Robots META tag. This tag is inserted into header section of a document and contains instructions about what should (or should not) be indexed.
When a Web-based search engine encounters a document containing the Robots META tag, the search engine does as the META tag instructs.
There are several values you can specify in the Robots META tag:
NOINDEX: Indicates that the document is not to be indexed.
NOFOLLOW: Indicates that hypertext links in the document are not to be crawled.
FOLLOWINDEX: Indicates that hypertext links in the document should be crawled.
ALL: Indicates that the document can be indexed and all links can be crawled.
NONE: Indicates that the document is not to be indexed and that hypertext links are not to be crawled.
To include the Robots META tag, use this syntax:
<META name=”Robots” content=”value, optional_value">
You can also use the Robots Comment tag to exclude specific sections of HTML documents from your search results. For example, you might not want such sections as repetitive headers, footers, navigation bars, and server-side includes to be indexed.
HINT:You can also place these tags at the top and bottom of all include files so these sections never get indexed when part of a larger document.
To direct QuickFinder where to begin skipping content while indexing:
At the point in your HTML document where you want QuickFinder to begin skipping content while indexing, enter the following tag:
<!--*Robots NoIndex- >
Just after the content you want skipped, enter the following tag:
<!--*Robots Index- >
Save your changes and index (or reindex) the content.