New! Gregg Hinchman has just written a new eBook Success with Clustering GroupWise 7: A Guide to Building a Highly Available GroupWise 7 System on an OES NetWare Cluster. He has graciously shared Chapter 1 with Cool Solutions, and we presented it in a four-part series.
Read Part 1 here
Success with Clustering GroupWise:A Guide to Building a Highly Available GroupWise System
by Tay Kratzer and Gregg A. HinchmanPurchase info: www.taykratzer.com
Updated with Part 3:
This guide is an invaluable, up-to-date resource on how to cluster GroupWise. Customers who are looking to cluster GroupWise will find in this guide real-world answers based upon experience setting up GroupWise in dozens of customer engagements.
If you are familiar with clustering great, most of this chapter will be a review. However, in this chapter we have laid out some basic terms that we assume you will know and understand. Also, the basics of our GroupWise system design are presented within this chapter. Both will be a guide for you, the reader throughout this book.
This chapter will not guide you though any specific steps. It is meant for you to read, and then consider your environment and perhaps you will take some actions before proceeding. For example, you may decide that you want to increase memory on the servers in your cluster. Also, to provide the maximum benefit from reading this book, we have provided strategic checkboxes [ ] in places where you will want to perform some action. We call these 'action items'. Here is an example:
[ ] Determine if you have the latest GroupWise Snapins
Chapter 1 - Clustering Basics
Download Chapter 1 here (pdf)
This section will provide you with a very 'light' view of clustering. We consider this information to be the minimum knowledge a GroupWise person must have about clustering.
The Cluster Object
What is a cluster? Simply put, it's a bunch of servers gathered round a SAN (Storage Area Network, aka. tons of hard drives and disk space) with excellent communication and sharing skills. In NetWare Remote Manager (NRM) a cluster object looks like three red globes connected to one another with pipes. Figure 1.1 shows a cluster object in NetWare Remote Manager (NRM).
Figure 1.1 The Cluster object in NetWare Remote Manager
The Cluster Node Object
A cluster node is a server that is participating in the cluster. Any cluster node (a server) can host any NSS (Novell Storage Services) pool and its subsequent volume at any given time. A cluster node object in NetWare Remote Manager looks like a server object with a red globe next to it. Figure 1.2 shows a cluster node object in NRM.
Figure 1.2 The Cluster Node object in NetWare Remote Manager
The Cluster Resource Object
A cluster resource object is an NSS Pool, for sake of conversation, with some other attributes that make it a "cluster resource" and effectively a "virtual" server. Novell Cluster Services makes each cluster resource appear as an NCP (NetWare Core Protocol) server. Therefore a volume on a cluster resource (NSS Pool) becomes a server with its own IP address. Each NSS Pool should have only 1 NSS volume and together they are considered 1 cluster resource. A cluster resource object looks like a volume object with a small red globe next to it. Figure 1.3 shows a cluster resource object in NRM.
Figure 1.3 The Cluster Resource object in NetWare Remote Manager
These cluster resources can travel between cluster node servers as needed. If one cluster node is down (failed) then the clustered resource hosted by that node can "failover" to another node, and if that node fails, then the resource can fail to the next, and so on, until you are out of nodes. During all this failover activity, end users are unaware of the failures, because their services are available to them.
The cluster is managed through iManager, NetWare Remote Manager, or even ConsoleOne. In the case of ConsoleOne you need the latest NCS (Novell Cluster Services) Snapins. NetWare Remote Manager (NRM) is loaded and configured automatically whenever a NetWare 6.5 server is installed. NRM is the fastest and easiest of cluster administration tools. It provides all the basic functionality needed to manage a cluster. To access NetWare Remote Manager, use a web browser and point to the following: https://ipaddress of server:8009 . iManager must be installed on at least one server in your eDirectory tree in order to be accessed. When installing a NetWare 6.5 server, you can manually choose to have iManager installed and running on the server. If you do, iManager can be accessed using a web browser pointed to: http://ipaddress of server/nps/iManager.html.
Cluster Administration --ConsoleOne
There are two views for managing clustering in ConsoleOne. They are: Cluster State View and Console View. Both can be accessed by right clicking on the cluster object, selecting View and then choosing either Cluster State View or Console View.
Note: To determine if you have the Clustering snapins:
- Load ConsoleOne
- Select Help|About Snapins
- Look for a Snapin called "Novell Cluster Services"
If for some reason you do not have these snapins, you can obtain these snapins as follows:
- Go to http://download.novell.com
- From the "Choose a Product" selection select "Novell Cluster Services"
- Submit the search, and find the ConsoleOne Snapin listed. For example when this guide was written the download choice was titled: "1.7 Snapin for ConsoleOne"
- Install the Snapins according to the installation instructions that accompany the download
[ ] Verify ConsoleOne Snapins version
ConsoleOne - Console View
The Console View allows you to manage all the current cluster resources configurations. You can also add cluster resources in Console View. Consider the Console View as the configuration view. There are cluster resource templates in Console View that make it easier to create cluster resources. There is even a GroupWise cluster resource template, but we recommend you ignore it. The cluster load/unload scripts are set up within the template to ease the clustering of GroupWise. However, these templates are not of much use. Luckily, you have this book -- you will be given details on how to build the GroupWise cluster resource load/unload scripts. Figure 1.4 shows the Cluster Console View.
Figure 1.4 The Cluster Console View in ConsoleOne
You can modify or change cluster resources through Console View by right clicking on the resource and selecting Properties. You can change the cluster resource IP Address. What? you ask, a cluster resource has its own IP Address? Yes it does. Essentially, a cluster resource "appears" to be a server. This is known as a "virtual server". That is the magic of clustering. Each volume is known by eDirectory as a server (or rather virtual server) therefore it must have its own IP Address. This makes clustering GroupWise easy because GroupWise 6.0x and above are fully IP enabled. Figure 1.5 shows the Cluster Resource object's IP Address property page in ConsoleOne.
A Cluster Resource IP Address property page
Other items that are available in the Properties of a cluster resource object are Cluster Load/Unload Scripts, Policies and Nodes.
Cluster Load/Unload Scripts
The cluster load/unload scripts are like login scripts for cluster resources. Ok, not quite the same but you get the idea. When a cluster resource is loaded, it reads and runs the information in the load script, and when it unloads, it reads and runs the unload script. Within the cluster load script are all the commands to activate the NSS pool that holds the cluster resource. For example statements such as "CVSBIND" and "NUDP" (which are explained later). There is also the command to add the secondary IP address. It's this secondary IP address that is considered the "server" IP address and it's this secondary IP address which is assigned to the clustered resource that we use for clustering GroupWise.
Figure 1.6 shows an example of a Cluster Load Script for a GroupWise cluster source.
A cluster load script
Each line is numbered for ease of reference in this guide, but it would not be numbered in an actual load script.
- nss /poolactivate=PMDOMVS
- mount PMDOMVL VOLID=254
- CLUSTER CVSBIND ADD PMDOMVS 192.168.20.101
- NUDP ADD PMDOMVS 192.168.20.101
- add secondary ipaddress 192.168.20.101
Line 1 activates the NSS pool which is titled PMDOMVS
Line 2 mounts the PMDOMVS volume and assigns it a volume ID of 254
Line 3 performs the CVSBIND ADD. This is the Cluster Virtual Server statement.
Line 4 performs the NUDP ADD. This enables the service advertising for the resource.
Line 5 adds the secondary IP address and binds it to the NetWare server.
Line 6 is the load line for GroupWise. We will discuss this at great length later in the book. For the time being remember we use an NCF file called "GWUP" to load GroupWise.
Here is an example of a Cluster Unload Script for a GroupWise cluster resource. Each line is numbered for ease of reference in this guide, but it would not be numbered in an actual unload script. Figure 1.7 shows an example of a Cluster Unload Script for a GroupWise cluster source.
Cluster Unload Script for a GroupWise cluster source.
- del secondary ipaddress 192.168.20.101
- NUDP DEL PMDOMVS 192.168.20.101
- CLUSTER CVSBIND DEL PMDOMVS 192.168.20.101
- nss /pooldeactivate=PMDOMVS /overridetype=question
Line 1 is the unload command for GroupWise. It is very important to unload GroupWise first, before running any other commands. This will avoid database corruption and it will also avoid potential cluster resource unload failures. Again, this line will be discussed later in the book, remember that we use an NCF file called "GWDN" to unload GroupWise.
Line 2 deletes the secondary IP address binding from the NetWare server
Line 3 performs the NUDP delete to stop service advertisement. NUDP has changed to provide "more graceful" client disconnects. This change waits until all NCP connections have been terminated. This wait time can vary, so another parameter, the "ODEL" parameter was added. The ODEL parameter change can be used to speed up the failover times of the cluster resources. This is specific to unload scripts only. The command is like this: NUDP ODEL. See Novell Support Knowledgebase TID #: 10086057
Line 4 performs the CVSBIND delete
Line 5 deactivates the NSS pool and adds a switch to override any questions asked during the process
Each cluster resource can have different policies. These policies state how the cluster resource will act during certain cluster events, such as failover. It is important to note here that GroupWise cluster resources Failback Mode should be set to "Disable". It is better to manually failback a cluster resource to its starting node. In the case of the Failover Mode, it should be set to "Auto" otherwise the cluster resource will never failover to its next node, GroupWise will never load on that node and you will have downtime. Figure 1.8 shows the Cluster Resource object's Policies property page in ConsoleOne.
A Cluster Resource - Policies property page
The Nodes property page is where you assign nodes (actual servers) to the cluster resources' failover list. You can add as many or as few as you require. The cluster resource and hence the GroupWise service will failover to each node in the order they are listed. Figure 1.9 shows the Cluster Resource "PMDOMVS_SERVER" object's Nodes property page in ConsoleOne.
A Cluster Resource - Nodes property page
ConsoleOne -Cluster State View
The Cluster State View is where you monitor the cluster, it's nodes and the cluster resources. You also load, unload and migrate cluster resources in this view. Assuming you have configured your cluster resource with nodes as previously discussed, you can simply click on the cluster resource in the lower portion of the Cluster State View and in the resulting dialog; select either the Offline or Migrate button. If you select Offline, then the cluster resource unload script will run, GroupWise will unload, and the cluster resource will be offline. It's like dismounting a volume.
In the case of migrating the cluster resource, you click on the resource, and then in the resulting cluster resource manage screen; select the cluster node that will be the "Migration Target" then select the Migrate button. At this point, the cluster resource unload script will run, GroupWise will unload, the cluster resource will go offline (actually its unassigned) for a moment, then the cluster load script will run, the cluster resource will be assigned to the new node and GroupWise will load. All of this can happen within 30-60 seconds and the user may never know. It is very important from a GroupWise clustering perspective to know how to online, migrate and offline cluster resources. Before the GroupWise system can go live on the cluster, you must test your load and unload scripts. Figure 1.10 shows the Cluster State View in NetWare Remote Manager
The Cluster State View in NetWare Remote Manager
Cluster Administration -NetWare Remote Manager
NetWare Remote Manager allows you to perform all the same features that exist in Cluster State View, such as online, offline and migration of a cluster resource. It also allows you to make configuration changes to the cluster resource, such as IP Address and Nodes. These features are done under the Clustering Menu link in Remote Manager. There are two management selections under the Clustering Menu link: Cluster Config and Cluster Management.
Cluster Config, is where you configure cluster resources. Cluster Config shown in Figure 1.11 below is similar to the ConsoleOne Cluster Console View. Cluster Management shown in Figure 1.12 below is where you manage the cluster. It is similar to the ConsoleOne Cluster State view in ConsoleOne. The nice part about Remote Manager is that it is talking directly to the server(s) without Snapins, which means you get the most accurate information. Every once in a while ConsoleOne Snapins may not function properly.
Cluster Config in NetWare Remote Manager
Cluster Management in NetWare Remote Manager
Accessing NetWare Remote Manager (NRM) is very easy. NRM is loaded and configured during the installation of the server. To access NetWare Remote Manager:
- Launch a Browser
- In the Address bar of the Browser, type HTTPS://ipaddress of server:8009
- Login to NetWare Remote Manager. You will have to type the full context of the user. A user with Admin rights is preferred.
- Use the left slider bar to move down to the Cluster Menu link.
- View both Cluster Management and Cluster Config. (We are assuming you have a cluster in place at this time.)
[ ] Access NetWare Remote Manager
Cluster Administration - iManager
Yet a third option for administering a cluster is iManager. iManager is the newest of the Novell Utilities for administering all Novell products. iManager must be installed during the installation of a server, or afterwards, in order to be used. iManager is a web-based administration tool that uses Apache2/Tomcat4. To access iManager:
- Launch a Browser
- In the Address bar of the Browser, type HTTPS://ipaddress of server/nps/iManager.html
- Login to iManager. You do not have to type the full context of the user. A user with Admin rights is preferred.
- Use the left slider bar, open the Cluster Administration link.
- View both Configuration and Management. (We are assuming you have a cluster in place at this time.)
iManager -Cluster Configuration
iManager's Cluster Configuration has the same features as NRM's Cluster Config and ConsoleOne's Console View. This means you can edit the cluster resource IP address, nodes, load/unload scripts and other settings. See Figure 1.13 and Figure 1.14 for screen captures of iManager.
iManager Cluster Administration -- Configuration view
iManager Cluster Administration -- Configuration view -- Cluster Resource Configuration
iManager -Cluster Management
The iManager Cluster Management link, shown in Figure 1.15, provides all the same features as ConsoleOne Cluster State View, and NetWare Remote Manager's Cluster Management link. You can online/offline or migrate a cluster resource, and check the status of the cluster.
Note: Currently, in order to use all of the features of iManager you must use Microsoft Internet Explorer. No other browser will work in our experience.
iManager Cluster Administration --Management view
That sums up the cluster administration utilities available. Of the three we have discussed, NetWare Remote Manager is our favorite. It's the easiest to use, requires no configuring and provides a well-rounded view of all the cluster components.
[ ] Access iManager
Cluster Enabled Volumes
A cluster enabled volume is one that has an eDirectory/NDS volume object associated to it and hence the application running on the volume requires an NDS volume object associated to it as well. We strongly recommend cluster enabling all GroupWise volumes. In the long run the benefits far exceed any setup issues you may encounter. There are 2 very good reasons to cluster enable GroupWise resources.
Mapped drives to the clustered resource remain available even if the resource fails over to another node. The mapping is to the ?virtual? server. Now we know what you may be thinking. ?Our GroupWise users don't map a drive to where the GroupWise post office is, so why talk about mapped drives.? But when you administer GroupWise, you almost always need at least one mapped drive to a domain. Also in order to view/edit the Properties of a gateway, you need a mapped drive to that gateway's domain. Finally, if you are testing GroupWise clustering or upgrading GroupWise, the last thing you want to do is to continue remapping drives every time a volume moves to a new cluster node.
ConsoleOne is the second reason. If the volumes are cluster enabled, then ConsoleOne will be able to connect to the GroupWise databases no matter which cluster node holds the clustered resource, after the first domain is mapped of course. We have seen occasions where ConsoleOne will not connect to domains unless it has a mapped drive to the domain. Again, this may be a short coming in ConsoleOne. Save yourself the potential grief and just cluster enable all GroupWise volumes.
The main lesson here is that you no longer manage GroupWise based upon \\Servername\volume. Rather, you manage GroupWise based upon its clustered resource name. If PMDOMVL is the clustered resource name (cluster enabled volume) and PMDOMVS is the virtual server then all your mappings are to the \\\PMDOMVL volume ONLY, not to the server that is hosting it at that moment. This means that if the volume moves to another host node your mapping NEVER breaks.
GroupWise Server Requirements
This brief section will discuss the hardware requirements for GroupWise. Understand that these are recommendations and your ?mileage? may vary. Clustering GroupWise from a hardware perspective is not much different than running GroupWise on stand alone servers. The one exception is RAM. NetWare 6 and NetWare 6.5 take larger amounts of RAM than older versions of NetWare. In order to determine the basic specifications for RAM on a clustered server for GroupWise, you first have to have an idea of what GroupWise services will run on that box. You also need to take into account the fact that the node may have to support multiple incidences of the GroupWise services. This means multiple MTA's, POA's and Gateways could be running on the same cluster node.
Out of the box with just the basic installed GroupWise services (1 POA, 1 MTA, 1 WebAccess Agent), assuming a virus scanning software and GWTSA running on the node, a server should have 2GB of RAM. This is enough RAM to support the OS and the services just mentioned. Add to that the clustering software, and if you are planning for failover and having multiple MTA's, POA's and gateways all being hosted on the same node, then we recommend adding at least 1GB more of RAM. This would bring the total up to 3GB of RAM per server in a NetWare 6 environment. In the case of NetWare 6.5, you might want to consider another 512MB of RAM for a total of 3.5GB of RAM. Remember these are recommendations for a general audience. Your system may require more or less based upon customized factors for your environment.
These are safety factors for clustering. Also, when we cluster GroupWise we use protected memory in order to run multiple instances of GroupWise services. Protected memory consumes more RAM than running GroupWise services in the OS memory space. It is far better to have a bit too much RAM than not enough. Remember, you are building a highly available GroupWise system, why let a little RAM stand in the way.
[ ] Here are a few ways to check if you have enough RAM
- Go to the server console
- Type in ?MONITOR? to load the monitor utility
- Select Disk Cache Utilization
- Look at the LRU Sitting Time.
If it is listed as a long period of time, say a couple days, then you have plenty of RAM. If, however it shows a shorter period of time, say 4 hours, then you are in need of more RAM. Also keep in mind how long the server has been up and running. If its been up and running for only 4 hours, then the LRU Sitting Time statistic is not valid yet.
The next parameter to check is Cache Buffers. This can be found in MONITOR, under System Resources. If the ?Cache buffer memory? value is low, say 40% and LRU Sitting Time is short, you are definitely in need of more RAM. However, if the Cache Buffers are high, say 80% then no RAM is needed. The gray area here is when LRU Sitting Time is low and Cache Buffers are high. You may need to tune your NetWare OS and eDirectory. This is rarely the case though.
Processors and NICs
Processors and NIC's are another requirement for GroupWise to perform efficiently. As always, purchase the best processor you can afford, multiple processors are nice to have, and GroupWise agents will take some advantage of them. As for network interface cards (NIC's), buy the best and of course the fastest available. In a severe failover situation you may have 3 or more post offices running on the same box. That means 3 or more times the traffic that will flow across the NIC. Gigabit NIC cards are the only way to go.
Note: Make sure in a 10/100MB environment that the switch ports and the servers are hard configured for 100MB full duplex. Auto configured NIC's and ports may cause a mismatch which will bring your network to a crawl and on occasion create corruption within the databases. In a gigabit environment, hard configure the server NIC and the switch if possible, otherwise let the server NIC auto configure, then verify that it is running full duplex. Also verify that the switch is running full duplex.
[ ] Verify your NICs and server switch ports
ConsoleOne and Snapins
ConsoleOne is used to manage GroupWise 6.x and above. Currently (at the time this guide was written), the GroupWise 6.5.2 Snapins are available and should be implemented if you have GroupWise 6.5. The original GroupWise 6.5 Snapins had intermittent issues with relaying administrative messages to the domains. This means that if you made a change to a GroupWise object, that domain you were connected to would receive the change, but the change would not always get submitted to other domains.
Determine if you are using the latest GroupWise Snapins
To determine the Snapins you are using do the following in ConsoleOne:
- Select Help|About Snapins
- Select the ?GroupWise Administration? Snapin
- Confirm that the version is the latest version, for example ?6.5.2?
IP Addresses and Ports
As we stated before, GroupWise will use the clustered resource IP address. Remember, this is the secondary IP address stated in the cluster load script. When you install or move GroupWise to a cluster, you will use the volume IP address (cluster resource) where GroupWise will reside. This guarantees that when the GroupWise volume fails over to another node, the GroupWise service will still run with the same IP address. But there is a ?gotcha?, and that is the IP ports. By default in GroupWise 6.0x and above, GroupWise components always listen on all bound IP addresses, not just the one specified. This means that if a POA fails over to another server running a POA, even though they have different IP addresses, they will both listen on the same common default ports, such as 1677. You can imagine the nightmare that would cause, GroupWise would become confused, messages would not get sent, the system would grind to a halt. So how do you over come this issue.
Well, the easiest way is planning. Make each port different for each GroupWise component. We recommend standardizing this and for sake of simplicity use numeric sequence. Let us show you. This assumes the MTA, POA and WebAccess Agent exist on the same cluster resource.
Cluster Resource IP Address: 192.168.20.11
Cluster Resource Name: PMDOMVS
Volume Name: PMDOMVL
GroupWise MTA MTP Port: 7101
GroupWise MTA HTTP Port: 3801
GroupWise POA MTP IN IP Address: 192.168.20.11
GroupWise POA MTP IN Port: 7301
GroupWise POA Client/Server Port: 1681
GroupWise POA HTTP Port: 2801
GroupWise POA MTP OUT IP Address: 192.168.20.11 (same as MTA IP Address)
GroupWise POA MTP OUT Port: 7101 (same as MTA MTP)
GroupWise POA IMAP Port: 141
GroupWise POA CAP Port: 1021
GroupWise POA Start QuickFinder Interval: 1 (hours after midnight)
GroupWise WebAccess IP Address: 192.168.20.11 (same as MTA IP Address)
GroupWise WebAccess TCP Port: 7201
GroupWise WebAccess HTTP Port: 4801
As you may have noticed, any port that is on the 192.168.20.11 clustered resource ends with a 1. Also, you will note that we created a standard for all ports. This standard will make it easy for anyone to follow after you when administering or troubleshooting GroupWise. The one exception to all of this is 1677. If you would like to configure the ?NGWNAMESERVER? functionality of GroupWise, then GroupWise insists on using port 1677 for the POA registered in your DNS as ?NGWNAMESERVER?.
Example of Port Standards
710x is for MTA MTP
720x is for WebAccess TCP Ports
730x is for POA MTP IN Ports
280x is for POA HTTP
380x is for MTA HTTP
480x is for Gateway HTTP
168x is for POA Client/Server
102x is for POA CAP
14x is for POA IMAP
Before you ever start implementing GroupWise or moving it in a cluster, create a GroupWise Design Matrix. The GroupWise Design Matrix will house all the information about the GroupWise system. The Design Matrix will specify:
- IP addresses and ports
- The cluster resource name
- The Primary cluster node (the first node the GroupWise cluster resource is regularly assigned to)
- The Primary cluster node IP address
- The ?virtual? UNC path
- The volume sizes
- The GroupWise service names, such as DOM1, PO1, etc.
This notion of a design matrix is best practice whether clustering GroupWise or not. Appendix A has a copy of a GroupWise Design Matrix, you may want to use it, or you may want to construct a design matrix in a spreadsheet program.
In GroupWise 6.5.2 Novell has fixed the ?/ip? switch. Namely, the ?ip-? switch tells the POA to only listen to the TCP/IP address you specify after the ?/ip-? switch. So it should now listen only on the ports associated to its IP address. This means that a GroupWise Design Matrix with a standardized port number scheme is not needed. All POA's can use the same 1677 port on the same node as another POA; because now they will listen only on their IP address for that port. This is good news!
Unfortunately, we do not recommend relying upon the use of the ?/ip-? switch in a clustering environment. The main reason is we want to be absolutely 100% guaranteed that a POA is only listening on its' appropriate ports. And the only way to do this is to use the GroupWise Design Matrix and standardize the ports numerically. Think about this for a moment. You have GroupWise running in a cluster, you have banked your career on 99.999% uptime for GroupWise and now you have to upgrade or apply a service pack to GroupWise. Unbeknownst to you, the upgrade or service pack could break the ?/ip-? switch and now you are fighting fires trying to figure out why GroupWise is misbehaving. Please understand we are not picking on Novell's developers here, they do an excellent job but mistakes happen. So a bit of planning and work at the beginning will save you hours of potential headaches in the future. Better to be safe, than sorry, at least that is our conservative approach.
[ ] Create a GroupWise Design Matrix and standardize names, ports and ip addresses
Protected memory is an administrator defined space where the NLM's will load. NLM's in this space do not cause other NLM's outside the protected memory space to stop performing, in essence ABENDing. Here's a simple analogy. Pretend you are an NLM. Now, pretend your office has 4 walls, a floor and a ceiling --it's a protected memory space. Assuming your door is closed and you do not share your office, try to touch a person outside your office. You cannot. This is how protected memory works. It isolates the NLM from other NLM's. This means that you can have 2 Post Offices running on 1 cluster node and they will not contend or corrupt each other. Protected memory is a feature within the NetWare OS. Unless manually forced into a protected memory space all NLM's load in the ?OS? memory space. Protected memory does use more RAM and does slow down performance. The performance hit is negligible; you can expect an increase on average of 20% more memory used than if you did not use protected memory.
Also, when you load multiple GroupWise components, say an MTA and a POA, in separate memory spaces then they will each load a copy of GWENN4.NLM in their respective memory spaces. So you have 2 copies loaded instead of 1 without protected memory. In the case of GroupWise and clustering, protected memory is a gift from Novell. By loading GroupWise NLMs into their own protected, and uniquely named memory space, the NLM is easier to manage. If for example an NLM will not unload, after an ABEND in a protected memory space, the protected memory space can just be removed, which will bring the NLM down -- no questions asked!
Then there is the benefit of auto restart of the memory space. If an NLM within the memory space ABEND's, the OS can restart the entire memory space. This pays big dividends in clustering and striving for 99.999% up time with GroupWise.
Example: You are running a POA and WebAccess Agent on the same clustered resource, but each component runs in its own memory space. The GWINTER.NLM ABENDS, rather than taking the whole cluster resource (and the cluster node) down with it, it is isolate to just its memory space. Here's the kicker. The OS notices the memory space has an ABENDed NLM and restarts the space and now the WebAccess Agent is back up and running again. This process usually takes place with 10 to 15 seconds, so unless users are pushing a button or link at that moment, they will never notice.
As you can see protected memory is a very valuable tool for GroupWise and for clustering GroupWise. Remember when you design the failover path for the GroupWise cluster resources, at some point multiple GroupWise components will load on the same host node. Protected memory will allow you to better manage GroupWise and the host node.
To see protected memory address spaces set up on your NetWare server (if any are set up) do the following:
- From the Console screen load Monitor
- From the ?Available Options? dialog, select ?Virtual Memory?
- Then select ?Address spaces?
In the ?Known Address Space? dialog you can now see the address spaces that you have set up on your NetWare server. There will always be an ?OS? address space. This is the base operating system. If you are using protected memory you will see the name of the protected memory space listed here also. In NetWare 6.5 memory spaces are listed in the ?Current Screens? function. To access this, at the server console hold down the Control key and press the Escape key. You will see the Current Screens which list out the products running on that server. Over to the right the memory space they run in is shown.
There are occasionally patches for the NetWare OS outside of the standard NetWare service pack. We advise that you confirm that you have the latest NetWare patches. NetWare 6.5 service pack 2 was released on June 30, 2004, as of this writing there are no additional patches available. Novell Cluster Services also has no patches available. In the case of GroupWise 6.5, as of this writing service pack 2 is available and no additional patches were available. Please take a moment and check Novell's Support site for any patches that may be available before you start implementing GroupWise 6.5.2 and NetWare 6.5.2 in a clustered environment.
Note: We chose to implement NetWare 6.5 service pack 1b for this writing. The main reason is due to a few noted issues with NetWare 6.5 service pack 2 when it was released. Additional services packs were released to compensate for the issues in service pack 2, most notably an eDirectory 184.108.40.206 patch. Do your research on service packs and potential pitfalls before you implement any solution. It saves time and lots of suffering.
[ ] Apply patches to NetWare
NSS Configuration Parameters
NSS does not require any adjusting in order to perform efficiently for GroupWise with NetWare 6.5. However, in the case of NetWare 6, NSS adjustments were required. We will list these requirements here. There are many switches that can be added to NSS to properly configure it. Below are the ones required for GroupWise on NetWare 6.0.
For NetWare 6.0 servers with Memory greater than 1GB:
For NetWare 6.0 servers with Memory less than 1GB:
The trick is how to get these parameters to take hold. In the past when you wanted to tune NetWare and its file system, you placed parameters in the STARTUP.NCF or the AUTOEXEC.NCF. Well, this is not the case with NSS. Using a text editor you will need to create a file called: NSSSTART.CFG. Remember, this is exclusive to NetWare 6.0 ONLY.
[ ] Configure the NSSSTART.CFG File for NetWare 6.0 (NetWare 6.5 not needed)
In the NSSSTART.CFG file you will place all the parameters on one line, and NSS will execute them when it loads. NSS reads this file from the C:\NWSERVER directory, so the NSSSTART.CFG will need to be placed there. Here is how the NSSSTART.CFG file should look:
Notice there are no spaces and the parameters just run on together. Remember in NetWare 6.5 there is no need for any NSS parameter changes because they have optimized NSS already. This section is just for the benefit of NetWare 6.0.
There are a few SET commands that need to be set on the cluster node servers that will host GroupWise cluster resources. These settings should be included in the STARTUP.NCF file of the server. Following are the set commands:
SET MEMORY PROTECTION NO RESTART INTERVAL = 0
SET HUNG UNLOAD WAIT DELAY = 60
The ?Set Memory Protection No Restart Interval? will allow the NetWare OS to restart a memory space immediately if an NLM causes the space to be unresponsive. This parameter is especially useful for the WebAccess Agent.
The ?Set Hung Unload Wait Delay? instructs the NetWare OS on how long it should wait for resources to unload before it kills the cluster resource. If you set this longer, then the cluster resource will take longer to unload and failover. If you set it shorter than GroupWise may not have enough time to unload. We recommend a 60 second wait delay.
[ ] Configure the STARTUP.NCF file
Our GroupWise System Design
As we continue throughout this book we will be referring to an example GroupWise system we are clustering. Here is how this GroupWise system is designed. Figure 1.16 represents this design.
- Primary domain -- no post offices or gateways
- Secondary domain -- supports two post offices.
- One post office is just dedicated to supporting GroupWise Document Management Services (DMS). The POA for this post office runs QuickFinder Indexing continuously.
- The second post office is for email only. No libraries are in this post office.
- Secondary domain -- supports just the GWIA gateway
- Secondary domain -- supports just the WebAccess Gateway
It should also be noted that we have one WebAccess gateway per one domain. This provides us with a substantial amount of flexibility in GroupWise design and management. As an example, if we decide to add another WebAccess gateway, we can just create a new cluster resource, create a new domain and WebAccess gateway and we are finished. If we needed to service one of the WebAccess Agents, we can do so without users being affected because we have a second one running.
Our GroupWise System Design
That is our GroupWise system. In order to maximize our learning, we will be performing both a new GroupWise system creation process and we will move existing GroupWise components to the cluster just like in a real environment. The post office secondary domain will be created along with the DMS post office. We will then create a new domain for a Primary domain and install that domain onto the cluster. Next, we will move the EMAIL post office to the cluster, and create new GWIA and WebAccess gateways. This allows us to demonstrate both principles: migrating and installing new to a cluster. We will use the Novell Server Consolidation Utility 2.5 when we move the GroupWise components to the cluster.
[ ] Decide which GroupWise components will be placed in the cluster
There are a several utilities that are very helpful in clustering GroupWise. The next few paragraphs acquaint you with these utilities so you understand the purpose they will play in configuring GroupWise to run on your cluster.
TCPCON will allow you to see what ports the NetWare server is listening on. To view the listening ports in TCPCON do the following:
- At a NetWare server that will be acting as a node in your cluster, type the following command at the console prompt: tcpcon
- From the first screen select Protocol Information
- TCP Connections and down arrow through the listing of port numbers.
Novell's Server Consolidation Utility
The second utility is Novell's Server Consolidation Utility (SCU). This utility is used to move data from one location to another. In our case, we use it to move a GroupWise post office to its new home on a cluster. This utility can be downloaded for free from Novell. Here's how you obtain this utility:
- Go to http://www.novell.com/download
- For the ?choose a product? selection choose: ?Novell NetWare Server Consolidation Utility?, version 2.6 is the latest when this guide was written.)
The Server Consolidation Utility performs server to server copies and is non-volatile to data. In other words, it only copies; it does not delete or move the data. We have seen copy speeds averaging up to 4GB an hour. But when going from compressed to non-compressed volumes it can be slower. Also, you are limited to the speeds of the hardware. As we have already stated we recommend having 10/100 NIC's statically set to 100 and full duplex and the switch ports they are plugged into also statically set to the same. This avoids the obvious mismatch that can corrupt data and more importantly will bring all copy jobs to a crawl. Also, having the source and destination on the same switch will increase copy speeds.
We recommend performing ?pre-copies? with the Server Consolidation Utility. A pre-copy is running the scheduled copy of data while the GroupWise system is up. This can be done in the middle of the day. This will take the longest and will report copy errors due to open files, and that is ok. On the day of the GroupWise system move, take down the GroupWise agents running on the server, then run the same scheduled copy job again and make sure you select ?Copy Files if newer? option. This will then start the copy process but will only copy those files from the source to the destination if they are newer. So, any user.db's, wphost.db or wpdomain.db, etc. will have a new modified date since the pre-copy was performed, therefore they will all be overwritten on the destination. At first, this copy will seem to take as long as the first, but it actually will end up taking about 25-33% of the original copy time. We have seen pre-copy jobs take 10 hours to copy a 20GB post office, but the second ?copy if newer? only takes 2.5 to 3 hours.
NetWare Logger Screen
The LOGGER screen on a NetWare 6.5 server is a valuable information tool. The logger screen will actually list out the events that take place on the server. You can use the up or down arrow or the page up or page down keys to view the log. This is very handy when testing and troubleshooting GroupWise on a server and in a cluster.
The Cluster Monitor (CMON) screen, shown in Figure 1.17, is very helpful in keeping track of the cluster environment. The CMON screen shows details on the cluster itself including which nodes are up and running, the Epoch number and which server is the holder of the Master IP Address. Hint: The server with the yellow ?up?, is the holder. The Epoch is the counter for events in a cluster. A server coming on line into a cluster is an event. A server ABENDing out of the cluster is an event. The Master IP Address is ?the IP Address' of the cluster.
Cluster Monitor Screen
Cluster Resource Manager
The Cluster Resource Manager (CRM) screen on a NetWare 6.5 server, shown in Figure 1.18, is useful as well when testing and troubleshooting cluster resource loading and unloading. This screen will provide information on cluster resources, when they load and unload and any errors directly associated with their loading/unloading.
Cluster Resource Manager Screen
NSS Management Utility
NSSMU is the NSS Management utility that runs on the server. This utility allows you to view all devices, partitions, pools and volumes that are available to a cluster node. Within this utility you can also add, delete, activate and deactivate pools and etc.. To run NSSMU at the server console type: nssmu
NSSMU view of Pools
NSSMU view of Pools
NSSMU view of Volumes
Creating Cluster Resources and Cluster Volumes
In this section we will walk you through the process of creating a cluster resource (NSS Pool, or virtual server). Then we will walk you through creating a cluster enabled volume on that cluster resources. We will be using NetWare Remote Manager to perform both operations. Our intention is not to make you an expert in the cluster resource creation process, but rather to give you a good knowledge and some help that will expand your knowledge of clustering.
Create a Cluster Resource (NSS Pool)
To get started you will need a web browser and a plan. Since we will be doing the planning in the next chapter --aptly titled --?Planning? we will give you a cluster resource we have planned out in order to demonstrate how to create a cluster resource. Our cluster resource (NSS pool) name is PMDOMVS. Our volume name is: PMDOMVL. Because our system is a demo system for the purpose of teaching the reader, our NSS pool will only be 100MB in size. Finally, we do assume you already have a cluster built, and all LUN's (Logical Unit Number -- or as we'll call them ?lots of disk space') are seen by all cluster nodes (servers). Let's do it!!
- Launch a Browser and enter the IP address of a cluster node along with the port for NetWare Remote Manager (NRM) and hit the Enter key, like this: HTTPS://192.168.20.1:8009
- Login into NRM
- On the left side under Manage Server select the Partition Disks Link
Figure 1.21 --
NRM Manager Server Category
- Using the right side scroll bar, scroll down till you find the ?disks' that contain all the free space. In our case, we are using iSCSI so the iSCSI.HAM is a dead give away for us.
Partition Disks link
- Select the Create link next to the ?Free Disk Space'. This will launch the ?File System Creation Operations' window.
File System Creation Operations Window
- Select the ?Create a New Pool' link
Note: We have found that if you want more control on naming your cluster resource and volumes you are better off first creating the new pool. Then going back to the new pool and creating the volume. Also for every clustered NSS Pool you create, eDirectory will attach ?_Server' to the end of the name. In the case of a volume it tags a ?_Vol' at the end of the name. Now we prefer to delete that -- but in eDirectory it will keep it. In the case of the volume however, it will not keep the ?_Vol' if you delete it during the creation process.
- Next, the NSS Pool Create window will appear. Fill in the pool size and the pool name. In our case the name is PMDOMVS. Now initially eDirectory will tag the ?_Server' to it. Just delete it.
File System Creation Operations Window
- Then check the ?Cluster Enable Pool' box. This is very important!! If you do not check this box you will not create a clustered resource.
- Select the Create link
- An informational prompt will appear asking if you want to create the pool, select OK
Do you want to create the cluster resource?
- Now the ?Cluster Pool Enabling Information' window will appear. Fix the Virtual Server Name as you wish it to read. Again our is: PMDOMVS
- Next, if you are going to use CIFS, then set the CIFS name as you wish it to appear.
- Then choose the protocol you are going to use -- NCP is the only one we will use. However, if you have Windows and Apples in your network you may choose the CIFS and AFP protocols as well.
- Next check the ?Auto Pool Activate' box
- 15.Finally, fill in the IP Address for this Virtual Server (cluster resource).
Virtual Server Configuration
- Select the Create button. This will create the cluster resource and then return you to the Partition Disks window.
- At the Partition Disks window use the right side scroll bar to scroll down to view the newly created NSS Pool.
Back to Partition Disks with the Cluster Resource created
- On the left side scroll down to the Cluster Menu category, and select the Cluster Management link. This will bring up the ?cluster state' view from NRM's perspective. You will now notice cluster resource you just created is running. You will also notice that it still has the ?_Server' tagged to it. Try as we might eDirectory and Novell Clustering Services will just not let us forget that the cluster resource is a server.
New cluster resource running
- Again on the left side select the Cluster Config link. Note the cluster resource appears here as well. As we have stated already above, the Cluster Config link is for configuring the cluster resource.
Note: You will notice that there is a link for ?New Cluster Volume' and that the icon appears to be the same for the cluster resource we have loaded and running. Do not let it confuse you. It's a small misrepresentation of the truth. The cluster resource is a NSS Pool, a Virtual Server, and a cluster resource. It's NOT a volume. We have to manually create the volume. If you select this link it will take you to the ?NSS Pool and Volume Create' window.
Cluster Config with new cluster resource
- Now let's switch to the LOGGER screen of the server where the cluster resource is running. In our case it's the NWGW01 server. You will note the PMDOMVS pool is activated and the secondary IP address is now resident on the NWGW01 server. This means the cluster resource is running and most importantly working as designed.
Cluster Node LOGGER screen showing the cluster resource loading
- As a final step you may want to test offline/online/migration of the cluster resource. This assures you that the cluster resource is configured and working correctly before you create a volume or place data on it.
That is all there is to creating a cluster resource. Next up we need to create a volume on that cluster resource. We highly recommend you create your cluster resources first. Then proceed to creating the volumes. Also it is Best Practice to have a 1 to 1 relationship between cluster resources and volumes. Now it's time to create a volume on a cluster resource.
Create a Cluster Volume
Creating a cluster volume is as easy as creating a cluster resource was in the last section. We will need a browser and NRM again.
- Launch a Browser and enter the IP address of a cluster node along with the port for NetWare Remote Manager (NRM) and hit the Enter key, like this:
- Login into NRM
- On the left side under Manage Server select the Partition Disks Link
NRM Manager Server Category -- Partition Disks link
- Using the right side scroll bar, scroll down till you find the cluster resource we created in the last section. For us the cluster resource is ?PMDOMVS?.
NRM Manager Server Category --Partition Disks link
Select the ?Add a Volume' link next to the cluster resource. This will bring up the ?NSS Volume Create' window.
NSS Volume Create window
- Type in the volume name and select the volume attributes you desire.
- Select Create.
- An informational prompt will appear asking if you want to create the pool, select OK
?Are you sure' informational prompt
- Next, in NRM under the Manage Server category select the Volumes link. This will now display the volumes mounted on this server as shown in Figure 1.35.
Volumes mounted on the server
- Then select the Partition Disks link and scroll down to the cluster resource. You will now see the cluster volume we have created.
Note: Notice the ?Create eDir Object' link. Do not use it. If you switch to iManager or ConsoleOne you will see that the volume object already appears in eDirectory. We have found that if you do select the ?Create eDir Object' link here, the volume will be tied specifically to the server it is currently mounted on. And the eDirectory object for the volume will list the ?servername_volumename?. This is NOT desired if you want the volume to be a clustered volume.
New Cluster Volume mounted
- Next switch to the LOGGER screen of the cluster node that is running the cluster resource and volume we just created. You will notice the new volume and its parent NSS pool are now both in an ?active' state. They are ready for data.
Both the NSS Pool and Volume are set to Active
- Finally launch Windows Explorer and browse through Network Neighborhood, through the eDirectory Tree and into the context of the cluster node, resource and volume. Notice the cluster resource shows up as a windows ?server' and under it are all volumes it knows about that are loaded on the same cluster node as itself. This includes, of course, its own volume. PMDOMVL in our example.
Note: We are not quite sure how better to describe how Windows Explorer sees virtual servers. It's quite confusing for the end user especially if they browse to multiple virtual servers and see the exact same volumes appear under them. In this case if two cluster resources (virtual servers) are running on the same cluster node, then both virtual servers will show all volumes mounted on that cluster node. An example of this can be seen in Figure 1.38 where the volumes of SYS, NG65OS, and _ADMIN show up under the PMDOMVS Cluster Resource (virtual server). Confused? Don't be, it's just an illusion.
Windows Explorer view of cluster resources and volumes
[ ] Create Cluster Resources and volumes
That sums up the basic clustering knowledge required to proceed with clustering GroupWise. Please realize that there is a lot more to know to build a cluster and Novell does a great job in their Advanced Technical Training Clustering class. We highly recommend it.
This chapter has laid out the basic server requirements and tuning parameters for NCS (Novell Clustering Services) and for GroupWise. This chapter presented a review of clustering components and management utilities. Finally this chapter provided only a few specific task-based instructions for setting up a cluster. We will re-list the ?action items here for this chapter.
Chapter 1 - Consolidated Task List
[ ] 1. Verify ConsoleOne Snapins version
[ ] 2. Access NetWare Remote Manager
[ ] 3. Access iManager
[ ] 4. Check your servers to see if you have enough RAM
[ ] 5. Verify your NICs and server switch ports
[ ] 6. Determine if you are using the latest GroupWise Snapins.
[ ] 7. Create a GroupWise Design Matrix and standardize names, ports and ip addresses.
[ ] 8. Apply patches to NetWare
[ ] 9. Configure the NSSSTART.CFG File for NetWare 6.0 (NetWare 6.5 not needed)
[ ] 10. Configure the STARTUP.NCF file
[ ] 11. Take an Inventory of what GroupWise components will be placed in the cluster.
[ ] 12. Create Cluster Resources and volumes
Download Chapter 1 here (pdf)
Success with Clustering GroupWise: A Guide to Building a Highly Available GroupWise System
by Tay Kratzer and Gregg A. Hinchman
Purchase here: www.taykratzer.com.
Disclaimer: As with everything else at Cool Solutions, this content is definitely not supported by Novell (so don't even think of calling Support if you try something and it blows up).
It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.