5.5 Creating Indexes

QuickFinder creates two types of indexes:

There are two forms you can use to create each type of index: the standard form and the advanced form. The Define Crawled Index is the standard form for creating a crawled index. But the Define Crawled Index (Advanced) form offers more options than the standard form, including options that override default virtual search server settings. Both methods are described in the following sections.

5.5.1 Defining a New Crawled Index

  1. On the Global Settings page of QuickFinder Server Manager, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, click New Crawled Index, then click Define Index.

  3. In the Index Name field, specify a name for your index.

    A name can be a word, phrase, or a numeric value. If the virtual search server you are working on contains, or will contain, a large number of indexes, you might want to use a numbering scheme to help you manage multiple indexes more effectively. However, the name you enter here appears on the default search page, so you might want to choose a name that can be understood by users of your search services.

  4. Under Web Sites to Crawl, specify the URL of the Web site that you want indexed.

    You can specify just the URL, such as www.mycompany.com, or you can also append a complete path, down to the file level, such as www.mycompany.com/ path/index.html.

  5. If desired, add another URL.

  6. To add additional URLs, click Add More URLs.

  7. Click Apply Settings.

Using the Define Crawled Index (Advanced) Page

The Define Crawled Index (Advanced) page offers some additional options beyond those available in the standard Define Crawled Index page. Changes made using this page override default virtual search server settings.

  1. On the Global Settings page of QuickFinder Server Manager, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, click New Crawled Index, then click Define Index.

  3. On the Define Crawled Index page, click Advanced Index Definition.

  4. In the Index Name field, specify a name for your new index.

    A name can be a word, phrase, or a numeric value. If the virtual search server you are working on contains, or will contain, a large number of indexes, you might want to use a numbering scheme to help you manage multiple indexes more effectively. However, the name you enter here appears on the default search page, so you might want to choose a name that can be understood by users of your search services.

  5. In the Index Description field, specify an optional description of the index to be created.

  6. Under Web Sites to Crawl, specify the URL of the Web site to be indexed.

    If you enter a filename at the end of the URL, then just that file is indexed.

  7. (Optional) Use the Path Weight option to boost or degrade search results based on the path.

    A weight of 100 makes the path’s relevance normal. Increasing the weight makes the path more relevant, and lowering the weight makes the path less relevant

  8. (Optional) Select Use Only As a Crawl Filter if you don't want QuickFinder to use the URL you specified in the URL of Web Site field to begin indexing.

    Any subsequent links found that contain a URL matching the one you specified in the URL of Web Site field are followed and subsequently indexed.

  9. (Optional) If you want to mask the actual URL displayed in the search results template, enter an alternate URL in the S how URL In Search Results As field.

    For example, if you want to index a Web server that is used inside of your company but allow your customers access to some of the data, you could hide the actual internal URL with the URL of your public Web site.

  10. In the Subdirectories to Exclude text box, specify the directories that you want QuickFinder not to index.

    For example, /marketing or /sales/doc.

  11. To direct QuickFinder to include or exclude specific file types, click Extensions to Include or Extensions to Exclude and then specify the extensions, such as HTM PDF TXT, separating each one with a single space.

  12. To add additional URLs, click Define More Web Sites.

  13. To delete a URL, click Remove Web Site.

  14. In the Additional URLs text box, specify any other URLs that you want indexed (for example, www.mycompany.com/marketing).

    This allows you to specify additional areas of information found on other Web sites, but not include all of the content of those sites to your searches.

    When QuickFinder encounters links found in the pages of Additional URLs that point to pages specified in Web Sites to Crawl, QuickFinder follows those links. All other links that go outside of Web Sites to Crawl are not followed.

  15. Use the Off-Site URLs option to determine the maximum number of off-site URLs (those URLs not located within any of the URLs specified in Web Sites to Crawl) that QuickFinder should index.

    In the URLs to Exclude field, list the off-site URLs that you want to exclude from indexing.

  16. Use the Adjust Individual URL Relevance option to adjust the relevance of individual items within the index.

    Adjustment values can range from 1 to 200. Values higher than 100 increase the calculated relevance of the item on the search results page, and values lower than 100 decrease the calculated relevance of the item. The value specified here is combined with other values to determine the final relevance.

  17. Under Additional Settings, enter the absolute path to where you want the index files stored in the Location of Index Files field.

    For example, volume:\ searchroot\sites\mysites.

    By default, index files are stored at volume:\ searchroot\sites\default\indexes\ .

    Changes made to Additional Settings override Default Settings.

  18. From the Level of Detail in Indexing Logs drop-down list, select the amount of information you want included in the index logs.

    Option

    Description

    Disabled

    Turns off index logging.

    Terse

    Lists only the URLs indexed.

    Normal

    Lists the URLs indexed and the results of the crawl.

    Verbose

    Lists the URLs indexed, the results of the crawl, and the links that were skipped during the crawl.

    New Links

    Lists the URLs indexed, the results of the crawl, the links that were skipped, and any new links found during the crawl.

    All Links

    Lists the URLs indexed, the results of the indexing, the links that were skipped, and all links found during the crawl.

  19. From the Encoding (If Not in META Tags) drop-down list, select the encoding to be used by files being indexed that do not contain an encoding specification.

  20. Use the Index weight option to boost or degrade search results based on the item's index.

    A weight of 100 makes the item's relevance normal. Increasing the weight makes the item more relevant, while lowering the weight makes the item less relevant.

  21. In the Maximum Index Depth field, enter the number of jumps (or links) from the starting URL that QuickFinder should crawl.

  22. In the Maximum File Size to Index field, enter the maximum file size (in bytes) that QuickFinder should index.

    Files exceeding this size will not be indexed and therefore, will not be included in search results.

  23. In the Maximum Time to Download a URL field, enter a number (in seconds) before QuickFinder automatically skips the indexing of the specified URL.

  24. To direct QuickFinder to pay attention to case of filenames and directory names, click Yes next to URLs Are Case Sensitive.

  25. To direct QuickFinder to crawl dynamic content (URLs containing the question mark [?]), click Yes next to Crawl Dynamic URLs.

    For more information about indexing dynamic content, see Section 5.7, Indexing Dynamic Web Content.

  26. Click Yes next to Obey Robots.txt Exclusions When Crawling if you want QuickFinder to following instructions found in any Robots META tags.

    For more information, see Using the Robots META Tag.

  27. Click Yes next to Index May Be Copied to Other Clustered Servers if you want to allow this index to be copied to other servers in a QuickFinder Synchronization cluster.

    For more information about QuickFinder Synchronization, see Section 4.4, Synchronizing Data Across Multiple QuickFinder Servers.

  28. Under User Credentials, specify the method of authentication for the servers that will be indexed that required user authentication in order to access them.

    • Basic Authentication: If you know that the server to be indexed requires basic authentication, specify the username and password in the fields provided.

    • Form-Based Authentication: If the server to be indexed uses form-based authentication, leave the Basic Authentication fields blank and type the correct user credentials in the Form-based Authentication field, placing each new entry on its own line. For example:

      userid:admin
      password:novell
      context:novell
      tree:marketingtree
      
  29. (Optional) If the Web sites you are indexing require users to log in at a specific URL (such as login.digitalairlines.com), specify the login URLs in the Alternate Login URLs field.

    After the session cookies are returned, QuickFinder sends the appropriate ones as needed to the Web sites being indexed.

  30. In the HTTP Headers field, specify any additional headers and values you want included with each HTTP request, placing each header on a separate line.

    Some Web sites require specific information in HTTP headers when attempts are made to access them. If your Web site uses form or cookie based authentication, you can specify such information here.

  31. Click Apply Settings.

Configuring Rights-Based Search Results

  1. On the Global Settings page of QuickFinder Server Manager, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, select New Crawled Index, then click Define Index.

  3. On the Define Crawled Index page, click Advanced Index Definition.

  4. Under Rights-based Search Results, configure authorization checking by selecting one of the following options:

    • Use Default: Select this option if you want this index to use the default authorization checking setting specified on the Index Settings page of your virtual search server.

    • Off: If you want all users to have access to this index, select this option. No authorization checking is done.

    • By Index: To enable rights checking for this index, specify a file that exists on your server that can be used in verifying user access. By creating a file and setting access rights to it, QuickFinder can verify access to this index based on the rights to the file. Click Use Default Path if one was specified on the Index Settings page.

  5. From the Unauthorized Hits Filtered By drop-down list, select one of the following filters:

    • Use Default: Select this option if you want the current index to use the default setting found on the Index Settings page.

    • Search Engine: When you select this option, users attempting to search the index without first logging in do not see any of the unauthorized hits on the search results page. If the user doesn’t have access to any search results, then the system returns a No Results Found message on the search results page.

    • Templates: When you select this option, users attempting to search the index without first logging in to the system receive results, but they are then required to provide a username and password before being allowed to see the contents.

  6. Click Apply Settings.

After you define an index, you must generate it to make it searchable. See Generating Indexes.

5.5.2 Defining a New File System Index

  1. On the Global Settings page of QuickFinder Server Manager, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, click New File System Index, then click Define Index.

  3. In the Index Name field, specify a name for your index.

    A name can be a word, phrase, or a numeric value. If the virtual search server you are working on contains, or will contain, a large number of indexes, you might want to use a numbering scheme to help you manage multiple indexes more effectively. However, the name you enter here appears on the default search page, so you might want to choose a name that can be understood by users of your search services.

  4. In the Server Path to be Indexed field, specify the absolute path to the folder containing the information that you want indexed (for example, sys:\sales\reports).

  5. In the Corresponding URL Prefix field, specify the URL that should be used by the search results page to access the individual files (for example, /sales).

  6. To add additional paths, click Add More Paths.

  7. Click Apply Settings.

After you define an index, you must generate it to make it searchable. See Generating Indexes.

Using the Define File System Index (Advanced) Page

  1. On the QuickFinder Server Manager Global Settings page, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, click New Crawled Index, then click Define Index.

  3. On the Define File System Index page, click Advanced Index Definition.

  4. In the Index Name field, specify a name for your new index.

    A name can be a word, phrase, or a numeric value. If the virtual search server you are working on contains, or will contain, a large number of indexes, you might want to use a numbering scheme to help you manage multiple indexes more effectively. However, the name you enter here appears on the default search page, so you might want to choose a name that can be understood by users of your search services.

  5. In the Index Description field, specify an optional description of the index to be created.

  6. Under Path Information, specify the absolute path to the folder containing the information that you want indexed in the Server Path field (for example, sys:\sales\reports).

  7. (Optional) Use the Path Weight option to boost or degrade search results based on the path.

    A weight of 100 makes the path’s relevance normal. Increasing the weight makes the path more relevant, while lowering the weight makes the path less relevant

  8. In the Corresponding URL Prefix field, enter the URL that should be used by the search results page to access the individual files (for example, /sales).

  9. To exclude specific subdirectories from being indexed, specify their relative paths in the Subdirectories to Exclude field.

  10. To direct QuickFinder to include or exclude specific file types, click Extensions to Include or Extensions to Exclude and then type the extensions, separating each one with a single space, such as HTM PDF TXT.

  11. (Optional) To add additional paths, click Define More Paths.

  12. (Optional) To delete a path, click Remove Path.

  13. Use the Adjust Individual File Relevance option to adjust the relevance of individual items within the index.

    Adjustment values can range from 1 to 200. Values higher than 100 increase the calculated relevance of the item on the search results page, while values lower than 100 decrease the calculated relevance of the item. The value specified here is combined with other values to determine the final relevance.

  14. In the Location of Index Files field, specify the absolute path to where you want the index files stored.

    For example, sys:\qfsearch\sites\mysites on NetWare or /var/lib/qfsearch./sites/mysites on Linux.

    By default, index files are stored at volume:\ searchroot\sites\ site_name\ indexes\ .

  15. From the Level of Detail in Indexing Logs drop-down list, select the amount of information you want included in the index logs.

    Option

    Description

    Disabled

    Turns off index logging.

    Terse

    Lists only the files indexed.

    Normal

    Lists the files indexed and the results of the crawl.

    Verbose

    Lists the files indexed, the results of the crawl, and the links that were skipped during the crawl.

    New Links

    Lists the files indexed, the results of the crawl, the links that were skipped, and any new links found during the crawl.

    All Links

    Lists the files indexed, the results of the indexing, the links that were skipped, and all links found during the crawl.

  16. From the Encoding (If Not in META Tags) drop-down list, select the encoding to be used when indexing files that do not contain an encoding specification.

    For example, HTML files can specify their encoding with a Content-Type META tag.

  17. Use the Index weight option to boost or degrade search results based on the item's index.

    A weight of 100 makes the item's relevance normal. Increasing the weight makes the item more relevant, while lowering the weight makes the item less relevant.

  18. In the Maximum Index Depth, specify the number of directories from the starting directory QuickFinder should search.

    This lets you limit how far (or deep) into a file server QuickFinder should search.

  19. In the Maximum File Size to Index field, specify the maximum file size (in bytes) that QuickFinder should index.

    Files exceeding this size are not indexed and are not included in search results.

  20. (Optional) Click Yes next to Index May Be Copied To Other Clustered Servers if you want this index shared with other QuickFinder servers in a QuickFinder Synchronization cluster.

    For more information about QuickFinder Synchronization, see Section 4.4, Synchronizing Data Across Multiple QuickFinder Servers.

  21. Click Apply Settings.

Configuring Rights-Based Search Results

  1. On the QuickFinder Server Manager Global Settings page, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, select New File System Index, then click Define Index.

  3. On the Define File System Index page, click Advanced Index Definition.

  4. Under Rights-based Search Results, configure authorization checking by selecting from one of the following options:

    • Use Default: Select this option if you want this index to use the default authorization checking setting specified on the Index Settings page of your virtual search server.

    • Off: If you want all users to have access to this index, select this option. No authorization checking is done.

    • By Index: To enable rights checking for this index, specify a file that exists on your server that can be used in verifying user access. By creating a file and setting access rights to it, QuickFinder can verify access to this index based on the rights to the file. Click Use Default Path if one was specified on the Index Settings page.

    • By Result Item: If checked, QuickFinder verifies the user’s access rights to each hit. This is not recommended for high traffic servers because checking every hit can slow down server performance.

  5. From the Unauthorized Hits Filtered By drop-down list, select one of the following filters:

    • Use Default: Select this option if you want the current index to use the default setting found on the Index Settings page.

    • Search Engine: When you select this option, users attempting to search the index without first logging in do not see any of the unauthorized hits on the search results page. If the user doesn’t have access to any search results, then the system returns a No Results Found message on the search results page.

    • Templates: When you select this option, users attempting to search the index without first logging in to the system receive results, but they are then required to provide a username and password before being allowed to see the contents.

  6. Click Apply Settings.

After you define an index, you must generate it to make it searchable. See Generating Indexes.

5.5.3 Searching across Multiple Indexes

QuickFinder can search across multiple indexes within a single virtual search server. However, searching a single index is generally faster than searching across multiple indexes.

Restricting Search Results to Specific Areas

You can restrict search results to specific areas of your file or Web server in the following ways:

  • Using multiple indexes and using the &index=index_name query parameter.

  • Using a single index and restricting results to certain URL paths using the &filefilter=path query parameter.

  • Using a single index and restricting results to certain values in document fields by including /fieldname=value with either the query=value or filter=value search parameters.

HINT:Using the last option requires that indexed documents contain summary fields such as META tags. This option works for almost any file format that contains document summary fields, including HTML, XML, PDF, Word, and WordPerfect.

For information about preventing QuickFinder from indexing specific content, see Excluding Documents from Being Indexed.

5.5.4 Indexing Content on a Password-Protected Web Site

If the Web servers you want to index require authentication, there are two methods for providing the correct user credentials: basic authentication and form-based authentication. Which one you choose depends on how authentication is implemented on the Web sites you index. For example, if you are indexing www.company1.com and it uses basic authentication, then enter the username (user ID) and password in the Basic Authentication fields. In this case, the credentials are sent using an HTTP authorization header with every request made to the server of the URL you have specified.

However, if www.company1.com uses a form-based authentication method, leave the Basic Authentication fields blank and type the correct user credentials in the Form-based Authentication text box, placing each new entry on its own line. For example:

userid:admin
password:novell
context:novell
tree:marketingtree

In Form-based authentication, the first time the Web site is indexed, the credentials are sent and a session cookie is returned. Thereafter, QuickFinder uses the session ID in the cookie for authentication and the credentials are no longer sent to the Web site.

HINT:If you are indexing more than one URL and each one requires a different set of credentials, we recommend that you create a separate index for each URL.

5.5.5 Indexing Volumes on Remote Servers

You can index file content stored on remote volumes using the Novell NFS Gateway product, included with NetWare 6.5.

To be able to index remote servers, first install and configure the NFS Gateway. Then create a new file system index and specify the path to a volume on a remote server. QuickFinder does the rest.

For more information about installing and configuring NFS Gateway, see the NFS Gateway for NetWare documentation.

5.5.6 Generating Indexes

After you define an index, you must generate it before it can be used for searching. Generating an index is the actual process where QuickFinder Server examines file server or Web server content, gathers keywords, titles, and descriptions and then includes them in the index.

  1. On the QuickFinder Server Manager Global Settings page, click Manage in the row of the virtual search server that you want to work with.

  2. Click Generate in the Action column of the index that you want to work with.

    The Active Jobs screen indicates the status of the current indexing jobs. When there is no current index job, the status page reads No indexing jobs are currently running or defined.

  3. To cancel the current indexing jobs, click Cancel in the Status column.

You can direct QuickFinder to automatically update your indexes on specific dates and at specific times by scheduling events. For more information, see Section 5.9, Automating Index and Server Maintenance.

Generating an Index For a Linux-mounted NSS Volume

To generate an index for a Linux-mounted NSS volume, the novlwww user or www group must have read access to the NSS volume. To do this, verify that the novlwww user and the www group are LUM-enabled, then give read access to the NSS volume by running rights and assign the user or group trustee rights to the volume.

NOTE:The novlwww user is the Open Enterprise Server defined Tomcat user. QuickFinder is run as a servlet so its access rights are the same as the Tomcat user.

Generating a File System Index

When generating a file system index and specifying a set of filename extensions to index, you could end up indexing files you don't want.

For example, you index your entire hard drive and look for only HTM and HTML files. There are about 10,000 properly matching files on your file system, but you end up with over 30,000 files in your index. This is because the file system scanner includes files with no filename extensions. In some cases, including files with no extension is better than not including them, but in this case, the index of all the HTML files on your hard drive is fairly useless because it contains a large number of non-HTML files.

To avoid this kind of situation, manually modify the QuickFinder Server configuration files ( /var/lib/qfsearch/Sites/qfind.cfg on Linux and sys:/qfsearch/sites/qfind.cfg on NetWare). Add “IncludeNoExtension N” to the <Directory> section of an index definition to prevent files with no filename extensions from being included (it defaults to Y). Put this line next to the “IncludeExtension” setting which lists all of the filename extensions to be indexed.