Novell Home

Troubleshooting Groupwise High Availability in a Linux Cluster Environment

Novell Cool Solutions: Tip
By Martin Prikril

Digg This - Slashdot This

Posted: 29 May 2007
 

Problem

The Groupwise High Availability Service runs really great, but there is a "not so nice" feature in the following situation.

First, let's set up the environment. Let's suppose we have an Open Enterprise Server 4 Node Cluster on Linux Kernel. There are six post offices, one Domain, and one Internet Agent, with two Cluster Resources on Groupwise 7 SP2. Groupwise Monitor runs on a single server with the Availability Service enabled.

And now the "not so nice" feature:

When you migrate the Resource from one node to another, depending on the Groupwise Monitor pooling time and the time the agents needs to stop, the GWHA Service may restart the already stopped Agent on the same Clusternode again. This happens because the Groupwise Monitor detects a failed (in this case a stopped) Agent and calls the GWHA Service to start this Agent.

The Cluster does not recognize this, so it dismounts the volumes, unbinds the IP address, and calls the Cluster load script on the next node. The "new" node then starts all agents.

The problem now is that some of the agents are running on two nodes. This is a very risky condition. In our environment, Groupwise performance was very poor in this situation, and we needed several hours to diagnose the problem.

Solution

Here is the really simple solution:

1. At the beginning of the cluster unload script, add this line:

ignore error /etc/init.d/xinetd stop 
Now the GWHA Service, which uses the Xinetd Daemon, cannot start the Agents during the unload Process.

2. At the end of the cluster unload script, put this line:

ignore error /etc/init.d/xinetd start

This will start the Xinetd Daemon again, and the GWHA Service on this node will work for future migrations.

Example

. /opt/novell/ncs/lib/ncsfuncs

# stop services
ignore_error /etc/init.d/xinetd stop
ignore_error /media/nss/GW/._CLUSTER/bin/stop-gw

# NCP server and IP address
ignore_error ncpcon unbind --ncpservername=CNW-VIE-01_GW_SERVER --ipaddress=$IP
ignore_error del_secondary_ipaddress $IP

# disk
exit_on_error nss /pooldeact=GW

# start xinetd
ignore_error /etc/init.d/xinetd start


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell