Troubleshooting Groupwise High Availability in a Linux Cluster Environment
Novell Cool Solutions: Tip
By Martin Prikril
Digg This -
Posted: 29 May 2007
The Groupwise High Availability Service runs really great, but there is a "not so nice" feature in the following situation.
First, let's set up the environment. Let's suppose we have an Open Enterprise Server 4 Node Cluster on Linux Kernel. There are six post offices, one Domain, and one Internet Agent, with two Cluster Resources on Groupwise 7 SP2. Groupwise Monitor runs on a single server with the Availability Service enabled.
And now the "not so nice" feature:
When you migrate the Resource from one node to another, depending on the Groupwise Monitor pooling time and the time the agents needs to stop, the GWHA Service may restart the already stopped Agent on the same Clusternode again. This happens because the Groupwise Monitor detects a failed (in this case a stopped) Agent and calls the GWHA Service to start this Agent.
The Cluster does not recognize this, so it dismounts the volumes, unbinds the IP address, and calls the Cluster load script on the next node. The "new" node then starts all agents.
The problem now is that some of the agents are running on two nodes. This is a very risky condition. In our environment, Groupwise performance was very poor in this situation, and we needed several hours to diagnose the problem.
Here is the really simple solution:
1. At the beginning of the cluster unload script, add this line:
ignore error /etc/init.d/xinetd stopNow the GWHA Service, which uses the Xinetd Daemon, cannot start the Agents during the unload Process.
2. At the end of the cluster unload script, put this line:
ignore error /etc/init.d/xinetd start
This will start the Xinetd Daemon again, and the GWHA Service on this node will work for future migrations.
. /opt/novell/ncs/lib/ncsfuncs # stop services ignore_error /etc/init.d/xinetd stop ignore_error /media/nss/GW/._CLUSTER/bin/stop-gw # NCP server and IP address ignore_error ncpcon unbind --ncpservername=CNW-VIE-01_GW_SERVER --ipaddress=$IP ignore_error del_secondary_ipaddress $IP # disk exit_on_error nss /pooldeact=GW # start xinetd ignore_error /etc/init.d/xinetd start
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com