The Groupwise High Availability Service runs really great, but there is a "not so nice" feature in the following situation.
First, let's set up the environment. Let's suppose we have an Open Enterprise Server 4 Node Cluster on Linux Kernel. There are six post offices, one Domain, and one Internet Agent, with two Cluster Resources on Groupwise 7 SP2. Groupwise Monitor runs on a single server with the Availability Service enabled.
And now the "not so nice" feature:
When you migrate the Resource from one node to another, depending on the Groupwise Monitor pooling time and the time the agents needs to stop, the GWHA Service may restart the already stopped Agent on the same Clusternode again. This happens because the Groupwise Monitor detects a failed (in this case a stopped) Agent and calls the GWHA Service to start this Agent.
The Cluster does not recognize this, so it dismounts the volumes, unbinds the IP address, and calls the Cluster load script on the next node. The "new" node then starts all agents.
The problem now is that some of the agents are running on two nodes. This is a very risky condition. In our environment, Groupwise performance was very poor in this situation, and we needed several hours to diagnose the problem.
Here is the really simple solution:
1. At the beginning of the cluster unload script, add this line:
ignore error /etc/init.d/xinetd stop
Now the GWHA Service, which uses the Xinetd Daemon, cannot start the Agents during the unload Process.
2. At the end of the cluster unload script, put this line:
ignore error /etc/init.d/xinetd start
This will start the Xinetd Daemon again, and the GWHA Service on this node will work for future migrations.
# stop services
ignore_error /etc/init.d/xinetd stop
# NCP server and IP address
ignore_error ncpcon unbind --ncpservername=CNW-VIE-01_GW_SERVER --ipaddress=$IP
ignore_error del_secondary_ipaddress $IP
exit_on_error nss /pooldeact=GW
# start xinetd
ignore_error /etc/init.d/xinetd start
Disclaimer: As with everything else at Cool Solutions, this content is definitely not supported by Novell (so don't even think of calling Support if you try something and it blows up).
It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.