Indexing Documents

Documents stored in GroupWise libraries need to be indexed so users can locate documents using the Find feature in the GroupWise Windows client. Your organization might need dedicated indexing to minimize performance degradation and network congestion. You might also need dedicated indexing so users can have prompt access to newly-created documents.


Understanding DMS Indexing

Before determining if you will need dedicated indexing, you should have a basic understanding of how indexing works in GroupWise.


Index Storage

When documents are indexed, the information is stored in QuickFinderTM indexes, which are located in a library's index subdirectory. A library's QuickFinder index is partitioned into ten *.idx files. Additionally, temporary *.inc (incremental) files are created that contain each day's new index information. The *.inc files are combined once per day into the *.idx files (usually at midnight).

In a system with multiple libraries, each library has its own set of QuickFinder index files. Depending on how many libraries belong to a post office, and how many post offices with libraries are in your GroupWise system, there can be many sets of QuickFinder index files.


Index Content

Indexing can include a document's full text (depending on its document type), and always includes the document's property sheet information (subject, author, version descriptions, and so on). Both newly-edited and newly-created documents are indexed, which means indexing volume is determined by how many existing documents are edited as well as how many new documents are created.

Newly-created documents must be indexed before users can search for them. In setting up your indexing strategy, you must know how quickly users will need access to newly-created documents.

The standard search is limited to the QuickFinder indexes in the user's default library. But users can choose to search for documents in other libraries to which they have access.


Indexing Performed by the POA

Indexing is among the many functions of the Post Office Agent (POA). To learn more about POA functions, see Role of the Post Office Agent.

You can configure the POA for a post office to meet basic indexing needs. See Regulating Indexing.

To support greater indexing needs, you can set up an additional POA that is dedicated to indexing. See Configuring a Dedicated Indexing POA.

Not all libraries need dedicated POAs for indexing documents because indexing needs vary widely:

  • In a small GroupWise system that has only one post office and one library, indexing can easily be done by the one POA.
  • In a post office with heavy DMS usage, one or more additional POAs can be dedicated to indexing the documents.
  • In a large system that has a DMS post office housing all libraries in the GroupWise system, indexing can be done by the DMS post office's POAs.

A library can have more than one POA dedicated to indexing its documents. Because the library's QuickFinder index is partitioned into ten separate *.idx files, an organization that is extremely document-intensive can boost indexing performance by using up to ten POAs dedicated to indexing. These POAs will not conflict with each other in performing indexing because the *.idx and *.inc files are locked during the indexing process.

You can temporarily use multiple indexing POAs for importing documents to speed up importing time.


Indexing Cycle

The frequency of indexing is determined by the POA QuickFinder Interval setting. The default is once every 24 hours at 8:00 p.m. This might be often enough in an organization where document usage is minimal, or where searching for newly-created documents is not mission-critical.

You can specify the QuickFinder Interval setting in one-hour increments. For example, a setting of 1 would allow users to find documents created as recently as an hour ago. Whether you should use a dedicated indexer at this frequency would depend on the volume (per hour) of documents that get queued for indexing.

You can set the QuickFinder Interval to 0 (zero) for continuous indexing. This is recommended for organizations where document usage is intensive, or where users routinely need to find documents that have just been created. If document usage is intensive in your organization, you might need a separate indexer server dedicated to continuous indexing because the post office server's performance could become unacceptably slow if continuous indexing is performed on it.


Bandwidth Considerations

A primary factor in network speed is bandwidth. This is the amount of data that can be passed through the network per second. If a network's bandwidth is not sufficient for handling heavy traffic, intensive document indexing can degrade network performance.

A number of elements affect network bandwidth: cable types, transmission protocols, and hardware. Ethernet networks are susceptible to wide fluctuations in transmission speed during periods of heavy traffic. WANs can benefit from reduced network traffic.

If you locate a post office in close proximity to its users, you will have less traffic through routers, bridges, and other network hardware. Running GroupWise in client/server access mode also reduces network traffic.

GroupWise users can add heavy messaging traffic to your existing network. DMS usage will add document indexing traffic as well. These factors could create much more network bandwidth usage than you have previously experienced.


Indexer Configurations

Following are five basic examples of how dedicated indexers can be configured. The examples do not cover all possibilities. You can combine elements from these configurations to customize indexing for your organization.

In all configuration examples, the post office can contain multiple libraries, although the Single Server with One POA configuration is best suited to only one library. In the other configuration examples, one or more POAs can be set up for indexing documents for all libraries in the post office.


Single Server with One POA

One POA runs on the post office server and performs all POA functions for the post office and its libraries. This basic configuration is best suited for a small system, or a decentralized library configuration with small post offices that each have a library. For more information, see Centralized vs. Decentralized Library Configurations.


Single machine with one POA
Advantages Disadvantages
  • Default configuration; no additional setup is required.
  • Troubleshooting is limited to a single server.
  • All operations are performed on one server, which can cause performance degradation if your organization does enough DMS operations.
  • If you increase QuickFinder intervals to lessen the load on the POA, you lengthen the time users must wait to search for new files, or find modified information through new searching keywords.

Single Server with Multiple POAs

It is possible to run more than one POA for the same post office on the same server.


Single machine with multiple POAs
Advantages Disadvantages

None.

  • Many processes running on one server can slow it down.
  • Single point of failure can cause the server to shut down when a problem is encountered.

There are no advantages to running multiple POAs on the same server. If you need more than one POA, run it on a separate server, as described in Dedicated Indexer Server


Dedicated Indexer Server

You can have the post office on one server and a POA dedicated to indexing DMS documents on another server. This configuration is useful for systems of any size with heavy DMS usage.


Dedicated indexing machine
Advantages Disadvantages
  • Dedicated server for quicker DMS indexing. This is useful for organizations that are document-intensive.
  • Messaging post office is not hampered by DMS indexing.
  • Network traffic can increase significantly during periods of intense indexing.
  • Multiple server hardware is required.

Dedicated Indexer Server on an Isolated Network Segment

You can have the post office on one server and a POA dedicated to indexing documents on another server that is on an isolated network segment. This configuration minimizes bandwidth congestion for the production network segment.


Post office on one machine and the dedicated indexing POA on another machine
Advantages Disadvantages
  • Dedicated server for quicker DMS indexing. This is useful for organizations that are document-intensive.
  • Messaging post office is not hampered by DMS indexing.
  • The large amount of information that is passed between the post office server and the indexing server does not congest the bandwidth of the production network segment.
  • Multiple server hardware is required.
  • Dedicated network segment is required (including second network interface card that is directly linked to the indexer server).
  • For multiple indexing servers, a dedicated hub might be needed.

Dedicated DMS Post Office

You can have one post office that is dedicated to messaging and another to DMS. This configuration is useful for post offices that have heavy DMS usage. For a review of this configuration, see Centralized Libraries.


Dedicated DMS post office
Advantages Disadvantages
  • Dedicated POA for quicker DMS indexing. This is useful for organizations that are document-intensive.
  • Messaging post office is not hampered by DMS traffic and indexing.
  • Logical separation of messaging and DMS databases. Processes such as backing up databases are easier.
  • This configuration is ideal for creating a centralized library configuration.
  • High-end hardware is required for DMS server.
  • Additional post office and POA to be maintained.
  • Client/server is required for searching and accessing documents.
  • Remote access is required for users who cannot use client/server mode. This ensures that the slower store-and-forward process will be used for remote searching and accessing of documents.


Determining Your Indexing Needs

The following table presents some indexing considerations and suggests an indexing configuration based on how the considerations pertain to your indexing needs:

Consideration Single Server with One POA Dedicated Indexer Server Dedicated Indexer Server on an Isolated Network Segment Dedicated DMS Post Office

Does the post office own multiple libraries?

No

Yes or No

Yes or No

Yes

What is the expected indexing volume (per hour)?

Light

Light or Moderate

Moderate or Heavy

Heavy

Is hardware available for a dedicated indexer server?

No

Yes

Yes

Yes

Could bandwidth congestion be a problem?

No

Maybe

Maybe or Yes

Yes

Use the Indexing Worksheet to estimate the indexing needs of the libraries in your GroupWise system. Each worksheet accommodates three libraries.

Identify each library (worksheet items 1 and 2). Estimate the impact of each consideration in each library (worksheet items 3 through 6). Then compare your estimates for each library to the values in the table above to determine the indexing configuration for each library (worksheet item 7).


Indexing Worksheet

For instructions on how to use this worksheet, see Determining Your Indexing Needs.

  Library     Library     Library    

1) Library:

 

 

 

2) Library's Post Office:

 

 

 

3) Multiple Libraries per Post Office?

  • Yes
  • No

 

 

 

4) Expected Indexing Volume (per hour):

  • Light
  • Moderate
  • Heavy

 

 

 

5) Additional Server Available?

  • Yes
  • No

 

 

 

6) Bandwidth Congestion Possible?

  • Yes
  • Maybe
  • No

 

 

 

7) Indexer Configuration:

  • Single server with one POA
  • Dedicated indexer server
  • Dedicated indexer server
    on an insolated network
    segment
  • Dedicated DMS post office

 

 

 


Implementing Indexing

For libraries where a single POA running on the post office server will provide adequate indexing support for the post office's libraries, follow the instructions in Regulating Indexing to implement indexing.

For libraries where additional POAs running on separate servers are required to support the indexing needs of the post office's libraries, follow the instructions in Configuring a Dedicated Indexing POA to implement indexing.