Sizing a GroupWise System
Novell Cool Solutions: Feature
Digg This -
Posted: 3 May 1999
This document gives guidelines for planning and implementing the deployment of GroupWise 5.5. Most recommendations can be applied to 5.2., but Novell suggests they should be used for GroupWise 5.5. These recommendations are based upon a mixture of in-house testing, third- party lab performance studies, and Technical Support's experience.
As a preface to the recommendations, a list of assumptions has been given. These assumptions can be used as guidelines for planning the growth of an existing GroupWise system or the installation of a new GroupWise system.
It is intended that these recommendations be used with discretion based upon each specific implementation and the circumstances surrounding it.
This is a living document and will change as determined by experience, new technology and noteworthy feedback.
The list of assumptions below provides a general context for which recommendations are provided. If your GroupWise system differs substantially from the listed assumptions, you would need to take those differences into account when following the recommendations.
- 5.5 or higher
- Any patch level of 5.5
- Pentium Pro 200 or better
- Client to server LAN speed of 10 MB
- Server to server LAN speed of 100 MB
- Only handles e-mail
- Users are not logging into this server for applications, NDS authentication, or file/print services.
- The server is not being used for NDS replication.
- One per server
- One MTA per domain
- One per server
- Run on same server as the domain
- Run with IP links to other domains and post offices
- Run on separate server from the post office (data store) or Post Office Agent (POA). (This recommendation is specific for those domains that act as high volume mail traffic hubs. In most circumstances there is not a problem with running a domain with an MTA on the same server as a single post office with a POA.
- One per server
- One POA per post office
- One per server
- Run on same server where the post office (data store) resides
- TCP/IP (client/server) connection is used for client access
- When running Document Management with a moderately to heavily used library, it is recommended that you have a second POA running on a separate server to perform the indexing tasks of the POA.
Based upon the above assumptions, the following are the recommendations for sizing a GroupWise system:
There is no limitation to the number of domains contained in a system. However, Novell Technical Services does not recommend that this number get too high because of difficulties in synchronization. The more domains in a system, the more databases must be synchronized. It would be inaccurate to provide a static number to define too high. This needs to be determined on an individual basis, keeping in mind that more is not necessarily better. Here are some guidelines to help determine if and when a domain is needed:
- The primary domain should not have any post offices and should have a
direct TCP/IP link to all secondary domains where possible. This ensures
that administration traffic is isolated from message delivery routes. It
also ensures that administration changes are replicated as quickly as
possible throughout the system. However, the primary domain should not
be used as a routing domain.
- Routing domains should be used at WAN to LAN connection points and
service the remote domains that come through these points.
- If a routing domain is going to be used, it should exist
on its own server and can service 60+ links based upon LAN/WAN traffic
- If the system contains multiple high-traffic gateways, all gateways
should be put on a separate gateway secondary domain.
Additional domains for Internet traffic (primarily GroupWise Internet
Agent and WebAccess) can be placed at WAN/LAN hub locations to reduce
the time/cost for message delivery and network bandwidth. If there are
other high-traffic gateways, they can be placed on separate domains on
separate boxes to boost the processing power.
- For Web Access and Remote Async connections, it is sometimes cheaper
to have users access them locally, instead of an using 800 line. In this
case, putting a domain in the remote location allows the gateway to run
local to the users, reducing the cost of long distance and/or 800
- If there are poor communication lines and/or high cost leased lines,
an Async link may be preferred. A secondary domain at the remote
location would be required to allow for the Async to Async connection.
- Other situations would include the use of dial-up routing where link scheduling could reduce costs, and external synchronization with other GroupWise 5.x systems where there is a high volume of administrative traffic.
Additional recommendations are found in Appendix D, which contains Novell IS&Ts GroupWise objectives and configuration for domain links..
Again, there is no limitation to the number of post offices that can be contained in a domain. In realistic terms, the number of post offices per domain is not as important as the number of messages handled by the MTA for that domain. For example, a domain with 50 post offices, each with 30 users, would be acceptable if the LAN/WAN speed is sufficient to handle the amount of traffic generated by these users. At Novell, the Orem MTA handles an average of 100,000 messages per day without problems. This MTA is on its own NetWare server and the server has no other processes running on it.
Other factors should be taken into consideration, such as the following:
- LAN/WAN speed
- Processing power of the server the MTA is running on
- Average number of administration messages per hour in the system
- Busy search demand
- Remote async requests (remote requests put a greater strain on the system than direct users)
- The number of gateways in the domain
- Average user's activity on the GroupWise system
- The number of rules the users have active
The recommendation for this is one POA per post office. This POA should also be run on the same server that houses the data store. By running this configuration, the POA is able to utilize built-in load balancing to determine which threads and/or requests get priority over others. When a second POA is loaded on the same server, one process cannot control the threads of the other process and therefore the two processes end up competing for time.
If a second POA is needed for QuickFinder indexing or for faster processing, it is recommended that you run the second POA on a different server. This can be connected via the LAN (recommended to be on the same network segment as the first server) or a second NIC can be installed with a crossover cable to either a NetWare server or an NT server where the second POA will run.
Before making recommendations, we need a tighter definition for both users and post office.
Novell defines users as clients who are actively sending, reading, or otherwise using their mailboxes. This document will refer to these users as active users to avoid confusion with other vendors, who use the term to mean the total number of users regardless of their activity. It is up to each administrator or integrator to determine what maximum percentage of total users will be actively using mail at any given time.
The term post office has different meanings with different groupware packages. Historically, GroupWise has only defined it to be a grouping of users in a given data store location. As the groupware industry grows, it is becoming more common for a post office to mean a grouping of users in a given data store on a dedicated server; in other words, mailboxes per server. Although Novell doesn't require this to be the case, it is the preferred method of implementation.
Given the second definition for post office, Novell recommends that the number of active users per post office does not exceed the range of 500 to 700. This number is based on active users concurrently using GroupWise with a direct TCP/IP connection. The total number of users is limited only to how many of those users will be active at any given time. Novell foresees this number changing as technologies change. In addition, the performance of the client, agent and server can vary dramatically as settings for both the POA and the server are adjusted. These performance settings and changes are listed in the appendixes to this document.
With the assumption that an average of only 60% to 70% of users will be logged in and using mail at any given time, GroupWise could easily support a post office of more than 1000 total users. However, the intent of this document is to help the reader distinguish what values will realistically affect the system and not just give an impressively large number. Here are some things to consider when deciding on a acceptable number of users per post office:
- LAN/WAN speed and topology. The slower the network and the more hops
a packet has to make, the slower the performance will be. This gets
worse as both the GroupWise client and GroupWise POA start flooding the
network with TCP/IP packets.
- Cleanup policies. Without standard cleanup policies implemented,
there is no way to control the size of the GroupWise databases or user
mailboxes. The larger the databases get, the longer maintenance routines
will take. In addition, as the number of messages the POA must query for
Finds and/or indexing increases, the POA slows down. The larger a user's
mailbox (database) gets, the longer it takes the client to display the
items, resulting in a dissatisfied user.
- The number and size of attachments the average user is expected to
have. In Novell Technical Services, the post office has less than 300
users because of the size of attachments each user has, including large
databases and core dumps mailed from customers. This meant we needed to
reduce the number of users on a post office for disk space reasons.
- Backup and restoration can be an issue whether the problem is too many users or too much mail. This issue requires decisions that will be very specific to each situation. Solutions could include limiting the number of users per post office, adding more disk space, implementing an e-mail policy, restricting the size of attachments, limiting the number of messages, and billing the user when limits are exceeded.
This section is added to provide recommendations for those installations which will provide Web-based access as their primary client access. WebAccess doesn't have users specific to it, but it provides Web-based access to a GroupWise mailbox which must be defined in a specific post office within the GroupWise system.
The recommended number of WebAccess users per post office is the same as for any other type of users: 500 to 700 active users. The basic service to the client is handled through the POA in the same way as a direct TCP/IP client (assuming that WebAccess is set up to do client/server). However, this recommendation can vary based upon several factors: Web performance is acceptably slower, since people expect the Internet to be slow; typically there are fewer active users at any given time; and because of slower speeds the average user will avoid sending large attachments. With these aspects in mind, the number of WebAccess users per post office can be greater than direct connect users.
If an ISP is used, the number of users per post office can be even greater because of the limitation imposed by the number of modems that the ISP supports at a given time. For example, if an ISP has 500 modems, then 500 active connections are all that can be supported, and the total number of users per post office can be unlimited.
These MTA memory requirements are in addition to memory needed for the operating system.
- For a small system with 3 to 5 direct links: 10 MB is the minimum required.
- For larger systems with more than 3 to 5 direct links, add the
following per link to the 10 MB minimum required:
- Light to moderate traffic (less than 50,000 messages routed per day) add .2 MB.
- Moderate to heavy traffic (greater than 50,000 messages routed per day) add .5 MB
These numbers were gathered and compared against live systems, not lab environments. Additionally, they are based upon averages. Novell recommends that you exceed the minimums rather than just meet them.
The memory requirements for a POA will vary based upon the number of active users. The total number of users in the post office is irrelevant. Again, these numbers are based upon the above assumptions.
- 100 active users: 94 MB additional memory
- 250 active users: 208 MB additional memory
- 500 active users: 232 MB additional memory
- 700 active users: 274 MB additional memory
Common Questions and Answers
A. The best way to do this is to first determine the current usage or expected current usage. Then plot the expected growth of business need. What will the groupware needs of the company be over the next several years?. Last of all, examine the collected data and set numbers based upon the expectations. Also, keep in mind that as technology expands, these recommendations will change and the demands on the groupware system will probably increase.
A. The answer to this is ONE! This doesn't mean that if there is currently more then one post office on a server, it is wrong. Historically, Novell has had many different answers to this question, and as a result many different configurations have been implemented. Unfortunately, most have been based upon success or failure of specific experiences and then generalized to answer this broad question. Novell recommends one post office per server as the best solution, not the only solution.
When multiple POAs run on the same server, each will initialize threads for anticipated work loads. In periods of idle time these threads have to be maintained and controlled by the owning process (the POA in this case), thus taking resources. For example, if there are three POAs on a given server, two POAs could be relatively idle and the third POA very busy. The third POA cannot use the threads of other two POAs, and yet those threads are still taking resources just to be maintained. In addition, if one of the idle POAs hits a trigger to kick off a QF Indexing or a GWCheck, then there is no way for the busy POA to prioritize that thread with its active client/server threads. In either case, the recommendation of one post office per server and one POA per server is the best solution.
A. The memory requirements are listed above. This is the most important information to keep in mind. However, there are other things that factor into this, including the following:
- Single point of failure. If the server goes down, how many
users will be affected?
- GroupWise work load. Are users doing busy searches and cross
post office proxies, using Remote, sending large attachments, doing
document management, and using the Find feature?
- Server work load. What else is the server responsible for:
NDS authentication, NDS replication, file/print services, or other
- User access. What level of user access will the server have?
Novell's base recommendation is to only have one agent per server. However, we realize that this is asking a lot in most situations. Therefore, a more acceptable recommendation is two per server. In most cases this will be one MTA and one POA with the exception of the MTA that is responsible for more than 50 links (with at least half as direct links) acting as a routing domain.
A. There were four main considerations:
- Performance. A server's ability to handle the POA
requirements decreases as these recommendations are exceeded. This
includes TCP/IP (client/server) requests, thread usage, processor
demands, and disk I/O.
- Manageability. This includes ease of managing users,
libraries, distribution lists, and NDS rights.
- Maintenance. The time it takes to run GWCheck, back up and
restore post office data, and perform general maintenance routines.
- Disk space. The amount of space required for database growth and attachment blob areas.
A. The usual cause of problems when recommendations are followed, is because of the tuning of the server and/or agents, or underestimating the activity of the users.
For the server and agents, the appendices following this document contains recommended settings and their descriptions. Sometimes these settings need to be adjusted to fit specific situations and circumstances.
For user activity, the best solution is to plan for growth and increased use of the e-mail system. This would include addition of post offices, addition of hardware, implementing e-mail policies, and not filling post offices to their maximum before considering a growth plan.
TID# 2943356 covers the various areas to optimize your server's performance. It also looks into areas of pro-active/preventive maintenance on your server and how to achieve the best results. These actions can also prevent the possibility of server abends and crashes.
This is the title of a downloadable document from the Novell Support Connection at http://support.novell.com. The filename is highutl1.exe; it can be found with the File Finder or on the Current Patch List page.
NTS has found the following server settings to directly affect agent performance. (The defaults are based on NetWare 4.11.)
Many of the sections below warn about having enough memory. Each additional buffer allocated takes up about 4k. Each service process requires about 16k. The best way to determine sufficient memory is to watch the LRU count and the Available Cache Buffers. If these numbers drop -- LRU below 20 minutes and Cache Buffers below 40% -- more memory is probably required.
This setting will decrease processor overhead and I/O traffic. It determines how often the directory cache buffer is refreshed. Every refresh requires a new disk read and write to memory. By increasing the value to 30 seconds, the administrator is decreasing how often the refresh takes place. There is little danger in losing data. As new files are added to a directory structure, the buffer is dynamically updated. This feature is a safeguard for rare cases in which a file was not added to the buffer.
By increasing this value, the buffer is already established and no additional resources are required to allocate more buffer space on the fly. This can eliminate processor and I/O bottlenecks.
This setting protects the system from using too much memory for directory cache buffers, but the default does not give the system enough room to grow. Setting it at 4000 gives the system some leeway, but may require the addition of physical memory.
The Read Ahead feature significantly increases performance on NetWare servers. It predicts what files are required next and loads them in memory ready for access.
This feature is for the Read Ahead mechanism. This Read Ahead LRU sitting Time Threshold indicates that if the LRU (Least Recently Used) is below the specified time, the Read Ahead feature won't be used.
LRU is an algorithm that is used for memory block / page replacement. An LRU list identifies the least recently used cache blocks (blocks that have been in cache the longest time without being accessed) and flags those for use first. It provides a more efficient caching implementation. The reason for the above setting is that if there is not enough memory to access data from available cache, Read Ahead will take up memory and processor time without increasing performance. If Read Ahead is not helpful, it makes sense to not use the resources. This setting can be configured up to 1 hour. In general terms, if the LRU is 20 minutes or more, the system probably has sufficient memory. This setting could be effective anywhere from a minute to possibly 5 minutes. Be aware that this disables Read Ahead, which usually is not recommended. If this option is used a lot, it is probably time to add more memory.
Although GroupWise does much more record locking than file locking, it is wise to allocate enough file locks if there are many users on the system. This does require memory and should not be over-used.
GroupWise performs many record locks. If there are many users on the system, it is wise to allocate enough record locks. This does require memory and should not be over-used.
Service processes are dynamic. By pre-allocating them, less overhead is required to allocate them on the fly. As long as there is sufficient memory, this number can be increased. A good rule of thumb is to monitor the server during peak times. Set the Minimum Service Processes to whatever the current service processes are during peak times.
This also takes up resources. Monitor this setting in the monitor.nlm. If the current processes begin to approach the maximum, increase the maximum service processes.
Adjusting this setting can drastically increase performance. When a service process is required, a new one can be created quickly. With the default setting of 2.2 seconds, the theory is that if the system waits long enough, a process will become free. If there is sufficient memory, there is no harm in creating a process instantaneous to the initial request.
Any request that is processed uses a packet receive buffer. This includes all NCP requests, SAPs, RIPs, TCP packets, etc. If the server is bombarded with requests and there are not enough packet receive buffers, the system will get bottlenecked and will start dropping requests. The result is loss of connection to users, loss of server to server connections, slowness, etc. Monitor the current packet receive buffers during peak times and make sure that the minimum is set to ensure that there are enough packet receive buffers at all times. Remember, this also takes up memory. Be sure to have sufficient memory on the server.
Note that a server hosting WebAccess should set this to 2000.
This protects the server against too many packet receive buffers allocating too much memory to processes.
If the server has sufficient memory, this setting can significantly increase productivity. As with service processes, the server will immediately spawn a new buffer without waiting to see if one becomes available first.
This feature protects the TCP/IP stack against LandAttacks. LandAttacks are packets sent to the server with the same source and destination. The packets get into a loop and can bring the server down. If the server in question has no access to the outside world, the chance of a packet doing this is extremely minimal. By turning this unneeded feature off, overhead is reduced and IP packets can be processed faster.
The POA object has many settings that can be configured, several of these can increase or decrease the performance of the POA. These settings are available so that each administrator can change them based upon their needs for optimal performance and stability.
This setting enables the POA to detect database problems and fix them in most cases. In the long run, this will improve performance because it prevents problems from getting really back before they are detected and corrected. It does take resources to run the GWChecks if and when problems are encountered; however, Novell recommends this be left On. The trade-off in stability is worth the possible loss in performance.
Having this set to On improves the performance of the POA. It allows the POA, at the software level, to handle caching of the databases it is working with. Novell recommends that this setting be On. However, if there are problems with database corruption, it would be a good idea to turn this setting off until the source of the problem is located and fixed.
Turn this off only if SNMP is not being used to manage the agent. This feature requires quite a bit of I/O and processor traffic. If SNMP is not being used (through ManageWise or some other SNMP manager), turning this unneeded feature off could help performance.
These are the TCP/IP threads that will handle the client/server requests. NCS has found that the appropriate setting depends on how active the users are. If the users are less active, with only an average of a 5 to 10 items per day both sent and received, then 30 users per thread is sufficient. However, if the users are more active (more than 30 items are sent and received each day, in addition to Finds and busy searches), then 20 users per thread is recommended. It is important to adjust this number for each situation because each thread allocated take memory and resources. On the other hand, if there are not enough threads allocated then there will be pending requests and that means slower performance for the end users.
The QuickFinder Indexes are created for both libraries and user databases. They are used to speed up the query results for a Find as well as a document search. It is important to note that the QuickFinder does take up a lot of resources on the server, and this will often cause the utilization to peak until the indexing is complete. With GroupWise 5.5, an additional setting was added to allow this indexing to start on a specific offset from midnight. This gives the administrator more control to run the QuickFinder during a server's slowest time.
There is no recommended number for this setting. It will vary based upon whether or not there is a library, how often or important is the Find feature to the users, and whether or not there is a POA created to handle QuickFinder Indexing. If a Find (query) is made for a specific mailbox item, and that item has not been indexed, it will still be found but the search will take longer. If a query for a document is made, and that document has not been indexed, then the document will not be found.
Application connections are virtual connections. They are the workhorse for the IP traffic between client and POA. As new communication between client and POA is required, a new application connection will be spawned. After 5 seconds of no use, the application connection will time out and terminate. An average user will use approximately 4 connections per session at any one time. Each connection takes up about 8 K of memory. When application connections hit the maximum, the oldest connection is bumped to take care of the request of the new ones. If that old one was still in use, the client will request a new one, thus causing a vicious circle. If users complain about slowness, this setting may be too low.
There must be a physical connection created in order to generate application connections. A user can have multiple physical connections. In general, one physical connection per user is sufficient because not all users are going to active at one time. If GroupWise hits the maximum physical connections, the user will receive an error saying that they cannot connect to GroupWise at that time. Increasing the maximum connections for physical as well as applications does not pre-allocate memory. The settings are there to protect GroupWise from accessing too many resources on the server.
These two setting go together and the defaults are the recommendations. These settings are designed to keep peak performance on the server. If the processing load on the server is too heavy, the POA will start to delay the launch of new threads for 100 milliseconds. This allows the server to continue processing the current requests and still refrain from ignoring other responsibilities. The POA is also designed to load balance its requests as the threshold is approached; client/server threads become the highest priority. The POA will start to terminate other threads, such as GWCheck and QuickFinder, to free up resources for the client requests.
Another thing that can be done to improve performance on the server is to flag the WPCSIN and WPCSOUT directories (and their subdirectories) for Immediate Purge. These directories will exist below each of the post offices, domains, and gateways in the GroupWise system. In addition, each MTA will have an MSLOCAL directory structure that should be flagged for Immediate Purge as well. For more information on this, see TID# 2920356.
These directories have many files written to and deleted from them. Immediate Purge will help keep the volumes clean. If the administrator is running suballocation on the volume, the directory should have at least 30% disk space available at all times. This implies non-purgeable blocks. If the space is free but resides as purge able blocks, utilization will be affected dramatically. By setting Immediate Purge on high-traffic directories, the cleanup tasks will not be left to the server's purgeable blocks algorithm.
TID# 2939577 discusses POA log file interpretation. It provides ideas on how to read the POA log file to help optimize these settings.
Novell's GroupWise messaging system is a complex system consisting of over 100 domains. GroupWise requires that the routes or links between these domains be manually configured. If each domain had an individual link to every other domain in the system, more than 10,000 links would have to be configured and maintained. In addition, these links could theoretically open up to 80,000 TCP connections if all were in use simultaneously, causing bottlenecks and excessive network traffic on the Novell WAN. A link configuration has been designed to deal with these issues. This appendix discusses the objectives and methods of GroupWise link configuration used to configure Novell's GroupWise system.
Several important objectives have been identified relating to link configuration issues in the following areas.
- Follow the WAN topology. It is desirable to configure links so that they follow the design of the WAN. By configuring links to hop across a minimal number of WAN segments, connectivity issues are minimized. Additionally, it is necessary to avoid bouncing traffic twice across the same segment and it is desirable to use the shortest routes available.
- Control the number of GroupWise TCP connections and bandwidth usage across WAN links. Controlling the number of TCP connections will ensure faster file transmission and in general prevent GroupWise from hogging bandwidth.
- Minimize the number of links while maximizing the number of users connected via three hops or less. It is desirable to keep the number of links to a manageable number, while still providing the fewest number of hops possible, preferably two or three hops or less between domains.
- Isolate administration and replication traffic and minimize latency of replication. It is desirable to configure the links so that administration changes do not interfere with message delivery and also so that administrative changes are replicated quickly throughout the entire system.
- Minimize delivery time. It is desirable to minimize the time that messages take to be delivered.
- Maximize response times. It is desirable to maximize the response times for remote download requests and busy searches across the network.
To achieve the listed objectives, the following principles are used when configuring links.
- The primary domain is connected directly to all domains on WAN,
including direct connection to domains servicing Async gateways. This
ensures that administration traffic is isolated from message delivery
routes. It also ensures that administration changes are replicated as
quickly as possible throughout the system. The primary domain is not
permitted to service any post offices or gateways. This serves the same
- Each WAN site has a direct link to its closest regional hub domain,
usually just one hop across a slow WAN link. This permits close control
of bandwidth usage over slow WAN links and it also vastly reduces the
number of total links requiring configuration in the system. All other
domains in the system (except the primary domain) are linked indirectly
to each WAN site via this closest regional hub domain, inbound as well
- The principal domain at each major site and regional hub domains and
service domains such as Internet gateway domains are combined in a
backbone group and have direct links to each other in mesh configuration
across high speed WAN links. This balances maximizing the number of
users connected in three hops or less and minimizes the number of links
required for the entire system. Approximately 80% of the total users
will be located on 20% of the domains in the system. This helps to
minimize delivery time and to maximize response times.
- All domains physically connected to the same LAN are linked in a mesh configuration. This reduces the work load on the hub domains and improves performance without impacting available bandwidth or significantly increasing the number of links in the system.
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com