8.5 Creating Indexes

QuickFinder creates two types of indexes:

There are two forms you can use to create each type of index: the standard form and the advanced form. The Define Crawled Index is the standard form for creating a crawled index, but the Define Crawled Index (Advanced) form offers more options than the standard form, including options that override default virtual search server settings. Both methods are described in the following sections.

8.5.1 Creating a Crawled Index

  1. On the Global Settings page of QuickFinder Server Manager, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, click New crawled index, then click Define Index.

  3. In the Index Name field, specify a name for your index.

    A name can be a word, phrase, or a numeric value. If the virtual search server you are working on contains, or will contain, a large number of indexes, you might want to use a numbering scheme to help you manage multiple indexes more effectively. However, the name you specify here appears on the default search page, so you might want to choose a name that can be understood by users of your search services.

  4. Under Web Sites to Crawl, specify the URL of the Web site that you want indexed.

    You can specify just the URL, such as www.mycompany.com, or you can also append a complete path, down to the file level, such as www.mycompany.com/path/index.html.

  5. If desired, add another URL. To add additional URLs, click .Add More URLs

  6. Click Apply Settings.

8.5.2 Creating an Advanced Crawled Index

The Define Crawled Index (Advanced) page offers some additional options beyond those available in the standard Define Crawled Index page. Changes made using this page override default virtual search server settings.

  1. On the Global Settings page of QuickFinder Server Manager, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, click New crawled index, then click Define Index.

  3. On the Define Crawled Index page, click Advanced Index Definition.

  4. In the Index Name field, specify a name for your new index.

    A name can be a word, phrase, or a numeric value. If the virtual search server you are working on contains, or will contain, a large number of indexes, you might want to use a numbering scheme to help you manage multiple indexes more effectively. However, the name you specify here appears on the default search page, so you might want to choose a name that can be understood by users of your search services.

  5. In the Index description field, specify an optional description of the index to be created.

  6. Under Web Sites to Crawl, specify the URL of the Web site to be indexed.

    If you specify a filename at the end of the URL, just that file is indexed.

  7. (Optional) Use the Path Weight option to boost or degrade search results based on the path.

    A weight of 100 makes the path’s relevance normal. Increasing the weight makes the path more relevant, and lowering the weight makes the path less relevant.

  8. (Optional) Select Use only as a crawl filter if you don’t want QuickFinder to use the URL you specified in the URL of Web Site field to begin indexing.

    Any subsequent links found that contain a URL matching the one you specified in the URL of Web Site field are followed and subsequently indexed.

  9. (Optional) If you want to mask the actual URL displayed in the search results template, specify an alternate URL in the Show URL in search results as field.

    For example, if you want to index a Web server that is used inside of your company but only allow your customers access to some of the data, you could hide the actual internal URL with the URL of your public Web site.

  10. In the Subdirectories to exclude text box, specify the directories that you want QuickFinder not to index.

    For example, /marketing or /sales/doc.

  11. To direct QuickFinder to include or exclude specific file types, click Extensions to include or Extensions to exclude and then specify the extensions, such as HTM PDF TXT, separating each one with a single space.

  12. To add additional URLs, click Define More Web Sites.

  13. To delete a URL, select it and click Remove Web Site.

  14. In the Additional URLs text box, specify any other URLs that you want indexed (for example, www.mycompany.com/marketing).

    This allows you to specify additional areas of information found on other Web sites, but not include all of the content of those sites to your searches.

    When QuickFinder encounters links found in the pages of Additional URLs that point to pages specified in Web Sites to Crawl, QuickFinder follows those links. All other links that go outside of Web Sites to Crawl are not followed.

  15. Use the Off-Site URLs option to determine the maximum number of off-site URLs (those URLs not located within any of the URLs specified in Web Sites to Crawl) that QuickFinder should index.

    In the URLs to Exclude field, list the off-site URLs that you want to exclude from indexing.

  16. Use the Adjust Individual URL Relevance option to adjust the relevance of individual items within the index.

    Adjustment values can range from 1 to 200. Values higher than 100 increase the calculated relevance of the item on the search results page, and values lower than 100 decrease the calculated relevance of the item. The value specified here is combined with other values to determine the final relevance.

  17. Under Additional Settings, specify the absolute path to where you want the index files stored in the Location of Index Files field.

    For example, /var/lib/qfsearch/sites/mysites.

    By default, index files are stored at /var/lib/qfsearch/Sites/default/indexes/index_name.

    Changes made to Additional Settings override Default Settings.

  18. From the Level of detail in indexing logs drop-down list, select the amount of information you want included in the index logs.

    Option

    Description

    Disabled

    Turns off index logging.

    Terse

    Lists only the URLs indexed.

    Normal

    Lists the URLs indexed and the results of the crawl.

    Verbose

    Lists the URLs indexed, the results of the crawl, and the links that were skipped during the crawl.

    New Links

    Lists the URLs indexed, the results of the crawl, the links that were skipped, and any new links found during the crawl.

    All Links

    Lists the URLs indexed, the results of the indexing, the links that were skipped, and all links found during the crawl.

  19. From the Encoding (if not in META tags) drop-down list, select the encoding to be used by files being indexed that do not contain an encoding specification.

  20. In the Field names to display on Search Results pages field, list any field names that you want to be included on the search results page (for example, author date_created product dc.copyright).

    To display the field on the search results pages, add the corresponding $$Variable to the template (for example, $$author $$date_created $$product $$copyright).

  21. Use the Index Weight option to boost or degrade search results based on the item’s index.

    A weight of 100 makes the item’s relevance normal. Increasing the weight makes the item more relevant, and lowering the weight makes the item less relevant.

  22. In the Maximum index depth field, specify the number of jumps (or links) from the starting URL that QuickFinder should crawl.

  23. In the Maximum file size to index field, specify the maximum file size (in bytes) that QuickFinder should index.

    Files exceeding this size are not indexed and are not included in search results.

  24. In the Maximum time to download a URL field, specify a number (in seconds) before QuickFinder automatically skips the indexing of the specified URL.

  25. In the Delay between URL requests field, specify the amount of time (in milliseconds) QuickFinder should delay before attempting to index a URL.

  26. To direct QuickFinder to pay attention to the case of filenames and directory names, click Yes next to URLs are case sensitive.

  27. To direct QuickFinder to crawl dynamic content (URLs containing the question mark [?]), click Yes next to Crawl dynamic URLs.

    For more information about indexing dynamic content, see Section 8.7, Indexing Dynamic Web Content.

  28. Click Yes next to Obey Robots.txt exclusions when crawling if you want QuickFinder to following instructions found in any Robots meta tags.

    For more information, see Using the Robots Meta Tag.

  29. Click Yes next to Index may be copied to other clustered servers if you want to allow this index to be copied to other servers in a QuickFinder Synchronization cluster.

    For more information about QuickFinder Synchronization, see Section 9.0, Synchronizing Data Across Multiple QuickFinder Servers.

  30. Click Yes to activate the newly generated index irrespective of the size.

    The default option is No for the Always activate new index option. If it is set to No, then the newly generated index is compared with the current one. If the size of the new index is small when compared to the existing one, an error message is displayed in the Admin console.

  31. If the URLs to be crawled require authentication, use the Type of Authentication required to crawl web site option to select the methods for providing the correct user credentials.

    • Basic: If you know that the server to be indexed requires basic authentication, select Basic, then specify the username and password in the Crawler Credentials fields.

      For example, if you are indexing www.company1.com and it uses basic authentication, specify the username (user ID) and password in the Crawler Credentials fields. In this case, the credentials are sent using an HTTP authorization header with every request made to the server of the URL you have specified.

    • Form: If the server to be indexed uses form-based authentication, type the correct user credentials in the Form Fields box. For example: UserIDField:$$UserID

      In form-based authentication, the first time the Web site is indexed, the credentials are sent and a session cookie is returned. Thereafter, QuickFinder uses the session ID in the cookie for authentication and the credentials are no longer sent to the Web site.

      If you are indexing more than one URL and each one requires a different set of credentials, we recommend that you create a separate index for each URL.By default, QuickFinder Server sends the form-based credentials by using the HTTP Post protocol. If the Web sites being indexed require the HTTP Get protocol, deselect the Send login data using HTTP post protocol check box. When this option is not selected, QuickFinder Server sends the form-based credentials as query parameters to the URLs being indexed.

  32. (Optional) If the Web sites you are indexing require users to log in at a specific URL (such as login.digitalairlines.com), specify the login URLs in the Alternate Login URLs field.

    After the session cookies are returned, QuickFinder sends the appropriate cookies as needed to the Web sites being indexed.

  33. Select Yes next to Use Crawler Credentials when Highlighting to use the Crawler Credentials specified Step 31 instead of the search user’s credentials when requesting the specified documents.

  34. In the HTTP Headers field, specify any additional headers and values you want included with each HTTP request, placing each header on a separate line.

    Some Web sites require specific information in HTTP headers when attempts are made to access them. If your Web site uses form-based or cookie-based authentication, you can specify the information here.

  35. Click Apply Settings.

After you define an index, you must generate it to make it searchable. See Generating Indexes.

8.5.3 Configuring Rights-Based Search Results for Crawled Indexes

  1. On the Global Settings page of QuickFinder Server Manager, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, select New crawled index, then click Define Index.

  3. On the Define Crawled Index page, click Advanced Index Definition.

  4. Under Rights-based Search Results, configure authorization checking by selecting one of the following options:

    • Use Default: Select this option if you want this index to use the default authorization checking setting specified on the Index Settings page of your virtual search server.

    • Off: If you want all users to have access to this index, select this option. No authorization checking is done.

    • by Index: To enable rights checking for this index, specify a file that exists on your server that can be used in verifying user access. By creating a file and setting access rights to it, QuickFinder can verify access to this index based on the rights to the file. Click Use default path if a path was specified on the Index Settings page.

      NOTE:NCP (eDirectory) rights-based search results option that is supported for remote NCP volumes is not applicable for crawled indexes. In case of NCP-based remote volume indexes, it is possible to restrict search results based on logged in user’s rights to individual files and directories. 

  5. From the Unauthorized hits filtered by drop-down list, select one of the following filters:

    • Use Default: Select this option if you want the current index to use the default setting found on the Index Settings page.

    • Search Engine: When you select this option, users attempting to search the index without logging in do not see any of the unauthorized hits on the search results page. If the user doesn’t have access to any search results, then the system returns a No Results Found message on the search results page.

    • Templates: When you select this option, users attempting to search the index without logging in to the system receive results, but they are then required to provide a username and password before being allowed to see the contents.

  6. Click Apply Settings.

After you define an index, you must generate it to make it searchable. See Generating Indexes.

8.5.4 Creating a File System Index

  1. On the Global Settings page of QuickFinder Server Manager, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, click New file system index, then click Define Index.

  3. In the Index Name field, specify a name for your index.

    A name can be a word, phrase, or a numeric value. If the virtual search server you are working on contains, or will contain, a large number of indexes, you might want to use a numbering scheme to help you manage multiple indexes more effectively. However, the name you specify here appears on the default search page, so you might want to choose a name that can be understood by users of your search services.

  4. In the Server Connection field, select Yes if the files to be indexed are on an NCP server, then specify the NCP server name, a valid username, password, and the character set of the server.

    NOTE:You must specify the username in the user.ou.o format.

    The user must have at least read rights to all the files. If you want to do rights-based searches, the user must have administrator rights to the NCP server.

    The Server Charset option must be set correctly so that the URLs can be properly encoded (according to the server encoding).

    This option is useful if you have a local NSS volume on the same machine as your QuickFinder index and you want to create a rights-based search for your users, or if you have a local or remote NCP server (such as a NetWare server, another server with NCP on it, or a local indexing machine) and you want to centralize your indexing.

    If you choose this option, make sure that the Corresponding URL Prefix option in the Path Information section contains a complete URL so that your users can access the indexed files from the NCP server. Also, if you are planning on indexing a large number of files (for example, over a million), your system should have at least 2 GB of memory.

  5. In the Server path to be indexed field, specify the absolute path to the folder containing the information that you want indexed (for example, /var/lib/qfsearch/data).

  6. In the Corresponding URL prefix field, specify the URL that should be used by the search results page to access the individual files (for example, /sales).

    You can also specify a file URL containing the UNC path of the server and path. The syntax is file://///server-dns-name/volume/path.

    If the filename contains non-ASCII characters, set the return encoding of the Search Result page to match the encoding of the client’s machine. You can set the return encoding on the General Settings page, or the client can set it by specifying the value of the encoding being sent to the search server in the retencoding search parameter. For example, the default English Windows encoding is Windows-1252, and Japanese is Shift-Jis. In order for the page to open, the client must already be authenticated to the server that the path points to.

  7. To add additional paths, click Add More Paths.

  8. Click Apply Settings.

After you define an index, you must generate it to make it searchable. See Generating Indexes.

8.5.5 Creating an Advanced File System Index

  1. On the QuickFinder Server Manager Global Settings page, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, click New file system index, then click Define Index.

  3. On the Define File System Index page, click Advanced Index Definition.

  4. In the Index Name field, specify a name for your new index.

    A name can be a word, phrase, or a numeric value. If the virtual search server you are working on contains, or will contain, a large number of indexes, you might want to use a numbering scheme to help you manage multiple indexes more effectively. However, the name you specify here appears on the default search page, so you might want to choose a name that can be understood by users of your search services.

  5. In the Index description field, specify an optional description of the index to be created.

  6. In the Server Connection field, select Yes if the files to be indexed are on an NCP server, then specify the NCP server name, a valid username, password, and the character set of the server.

    The user must have at least read rights to all the files. If you want to do rights-based searches, the user must have administrator rights to the NCP server.

    The Server Charset option must be set correctly so that the URLs can be properly encoded (according to the server encoding).

    This option is useful if you have a local NSS volume on the same machine as your QuickFinder index and you want to create a rights-based search for your users, or if you have a local or remote NCP server (such as a NetWare server, another server with NCP on it, or a local indexing machine) and you want to centralize your indexing.

    If you choose this option, make sure that the Corresponding URL Prefix option in the Path Information section contains a complete URL so that your users can access the indexed files from the NCP server. Also, if you are planning on indexing a large number of files (for example, over a million), your system should have at least 2 GB of memory.

  7. Under Path Information, specify the absolute path to the folder containing the information that you want indexed in the Server path field (for example, /var/lib/qfsearch/data).

  8. (Optional) Use the Path Weight option to boost or degrade search results based on the path.

    A weight of 100 makes the path’s relevance normal. Increasing the weight makes the path more relevant, while lowering the weight makes the path less relevant.

  9. In the Corresponding URL prefix field, specify the URL that should be used by the search results page to access the individual files (for example, /sales).

    You can also specify a file URL containing the UNC path of the server and path. The syntax is file://///server-dns-name/volume/path.

    If the filename contains non-ASCII characters, set the return encoding of the Search Result page to match the encoding of the client’s machine. You can set the return encoding on the General Settings page, or the client can set it by specifying the value of the encoding being sent to the search server in the retencoding search parameter. For example, the default English Windows encoding is Windows-1252, and Japanese is Shift-Jis. In order for the page to open, the client must already be authenticated to the server that the path points to.

  10. To exclude specific subdirectories from being indexed, specify their relative paths in the Subdirectories to exclude field.

  11. To direct QuickFinder to include or exclude specific file types, click Extensions to include or Extensions to exclude and then type the extensions, separating each one with a single space, such as HTM PDF TXT.

  12. (Optional) To add additional paths, click Define More Paths.

  13. (Optional) To delete a path, select it and click Remove Path.

  14. Use the Adjust Individual File Relevance option to adjust the relevance of individual items within the index.

    Adjustment values can range from 1 to 200. Values higher than 100 increase the calculated relevance of the item on the search results page, and values lower than 100 decrease the calculated relevance of the item. The value specified here is combined with other values to determine the final relevance.

  15. In the Location of index files field, specify the absolute path to where you want the index files stored.

    For example, /var/lib/qfsearch/sites/mysites.

    By default, index files are stored at /var/lib/qfsearch/Sites/default/indexes/index_name.

  16. From the Level of detail in indexing logs drop-down list, select the amount of information you want included in the index logs.

    Option

    Description

    Disabled

    Turns off index logging.

    Terse

    Lists only the files indexed.

    Normal

    Lists the files indexed and the results of the crawl.

    Verbose

    Lists the files indexed, the results of the crawl, and the links that were skipped during the crawl.

    New Links

    Lists the files indexed, the results of the crawl, the links that were skipped, and any new links found during the crawl.

    All Links

    Lists the files indexed, the results of the indexing, the links that were skipped, and all links found during the crawl.

  17. From the Encoding (if not in META tags) drop-down list, select the encoding to be used when indexing files that do not contain an encoding specification.

    For example, HTML files can specify their encoding with a Content-Type meta tag.

  18. In the Field names to display on Search Results pages field, list any field names that you want to be included on the search results page (for example, author date_created product dc.copyright).

    To display the field on the search results pages, add the corresponding $$Variable to the template (for example, $$author $$date_created $$product $$copyright).

    The field data is stored in the index and causes the index size to increase.

  19. Use the Index Weight option to boost or degrade search results based on the item’s index.

    A weight of 100 makes the item’s relevance normal. Increasing the weight makes the item more relevant, and lowering the weight makes the item less relevant.

  20. In the Maximum index depth field, specify the number of directories from the starting directory QuickFinder should search.

    This lets you limit how far (or deep) into a file server QuickFinder should search.

  21. In the Maximum file size to index field, specify the maximum file size (in bytes) that QuickFinder should index.

    Files exceeding this size are not indexed and are not included in search results.

  22. (Optional) Click Yes next to Index may be copied to other clustered servers if you want this index shared with other QuickFinder servers in a QuickFinder Synchronization cluster.

    For more information about QuickFinder Synchronization, see Section 9.0, Synchronizing Data Across Multiple QuickFinder Servers.

  23. Click Yes to activate the newly generated index irrespective of the size.

    The default option is No for the Always activate new index option. If it is set to No, then the newly generated index is compared with the current one. If the size of the new index is small when compared to the existing one, an error message is displayed in the Admin console.

  24. Click Apply Settings.

After you define an index, you must generate it to make it searchable. See Generating Indexes.

8.5.6 Configuring Rights-Based Search Results for File System Indexes

  1. On the QuickFinder Server Manager Global Settings page, click Manage in the row of the virtual search server that you want to work with.

  2. Under Define a New Index, select New file system index, then click Define Index.

  3. On the Define File System Index page, click Advanced Index Definition.

  4. Under Rights-based Search Results, configure authorization checking by selecting from one of the following options:

    • Use Default: Select this option if you want this index to use the default authorization checking setting specified on the Index Settings page of your virtual search server.

    • Off: If you want all users to have access to this index, select this option. No authorization checking is done.

    • by Index: To enable rights checking for this index, specify a file that exists on your server that can be used in verifying user access. By creating a file and setting access rights to it, QuickFinder can verify access to this index based on the rights to the file. Click Use Default Path if one was specified on the Index Settings page.

    • by Result Item: If checked, QuickFinder verifies the user’s access rights to each hit. This is not recommended for high-traffic servers because checking every hit can slow down server performance.

      NOTE:Rights-based search results with Result Item works fine only if the NCP volume present on local or remote machine is indexed using NCP channel. To make NCP(eDirectoty) rights-based results option to work, you must set Is NCP Server index to yes.

  5. From the Unauthorized hits filtered by drop-down list, select one of the following filters:

    • Use Default: Select this option if you want the current index to use the default setting found on the Index Settings page.

    • Search Engine: When you select this option, users attempting to search the index without logging in do not see any of the unauthorized hits on the search results page. If the user doesn’t have access to any search results, then the system returns a No Results Found message on the search results page.

    • Templates: When you select this option, users attempting to search the index without logging in to the system receive results, but they are then required to provide a username and password before being allowed to see the contents.

  6. Click Apply Settings.

After you define an index, you must generate it to make it searchable. See Generating Indexes.

8.5.7 Searching Across Multiple Indexes

QuickFinder can search across multiple indexes within a single virtual search server. However, searching a single index is generally faster than searching across multiple indexes.

Restricting Search Results to Specific Areas

You can restrict search results to specific areas of your file or Web server in the following ways:

  • Using multiple indexes and using the &index=index_name query parameter.

  • Using a single index and restricting results to certain URL paths by using the &filefilter=path query parameter.

  • Using a single index and restricting results to certain values in document fields by including ^fieldname=value with either the query=value or filter=value search parameters.

HINT:Using the last option requires that indexed documents contain summary fields such as meta tags. This option works for almost any file format that contains document summary fields, including HTML, XML, PDF, Word, and WordPerfect.

For information about preventing QuickFinder from indexing specific content, see Excluding Documents from Being Indexed.

8.5.8 Indexing Content on a Password-Protected Web Site

If the Web servers you want to index require authentication, there are two methods for providing the correct user credentials: basic authentication and form-based authentication. Which one you choose depends on how authentication is implemented on the Web sites you index. For example, if you are indexing www.company1.com and it uses basic authentication, specify the username (user ID) and password in the Crawler Credentials fields. In this case, the credentials are sent using an HTTP authorization header with every request made to the server of the URL you have specified.

However, if www.company1.com uses a form-based authentication method, leave the Crawler Credentials fields blank and type the correct user credentials in the Form Fields text box. For example: UserIDField:$$UserID.

In Form-based authentication, the first time the Web site is indexed, the credentials are sent and a session cookie is returned. Thereafter, QuickFinder uses the session ID in the cookie for authentication and the credentials are no longer sent to the Web site.

HINT:If you are indexing more than one URL and each one requires a different set of credentials, we recommend that you create a separate index for each URL.

8.5.9 Indexing Volumes on Remote Servers

For information on indexing volumes on remote servers, see Step 6 in Section 8.5.5, Creating an Advanced File System Index.

8.5.10 Generating Indexes

After you define an index, you must generate it before it can be used for searching. Generating an index is the actual process where QuickFinder Server examines file server or Web server content, gathers keywords, titles, and descriptions and then includes them in the index.

Generating an Index

  1. On the QuickFinder Server Manager Global Settings page, click Manage in the row of the virtual search server that you want to work with.

  2. Click Generate in the Action column of the index that you want to work with.

    The Active Jobs page indicates the status of the current indexing jobs. When there is no current index job, the status page reads No indexing jobs are currently running or defined.

  3. To cancel the current indexing jobs, click Cancel in the Status column.

You can direct QuickFinder to automatically update your indexes on specific dates and at specific times by scheduling events. For more information, see Section 8.9, Automating Index and Server Maintenance.

Generating an Index For a Linux-Mounted NSS Volume

To generate an index for a Linux-mounted NSS volume, the wwwrun user or www group must have read access to the NSS volume. To do this, verify that the wwwrun user and the www group are LUM-enabled, then give read access to the NSS volume by running rights and assigning the user or group trustee rights to the volume.

Generating a File System Index

When generating a file system index and specifying a set of filename extensions to index, you could end up indexing files you don’t want.

For example, you index your entire hard drive and look for only HTM and HTML files. There are about 10,000 properly matching files on your file system, but you end up with over 30,000 files in your index. This is because the file system scanner includes files with no filename extensions. In some cases, including files with no extension is better than not including them, but in this case, the index of all the HTML files on your hard drive is not helpful because it contains a large number of non-HTML files.

To avoid this kind of situation, manually modify the QuickFinder Server configuration file:

  1. Open the /var/lib/qfsearch/Sites/default/qfind.cfg file.

  2. In the <Directory> of an index definition section, add the following entry next to the Include Extension HTM HTML line:

    IncludeNoExtension N

This prevents files with no filename extensions from being included .

HINT:QuickFinder Server can only index files that are accessible through local file system calls. If you mount a volume or map a drive to a remote server and the local system sees it as a local drive, QuickFinder Server can index it.