The STONITH (shoot-the-other-node-in-the-head) capability allows Novell Cluster Services to remotely kill a suspect node by using remote power control instead of using a poison pill. STONITH does not require any action from the node being killed, unlike poison pills, which allows it to kill non-responsive nodes.
Using STONITH requires that you have server power management technology for all nodes in the cluster. STONITH supports remote accessible cards integrated in a cluster node’s hardware, such as Integrated Lights Out (iLO) from Hewlett-Packard (HP) and Dell Remote Access Card (DRAC) from Dell, and stand-alone web-based power switches. Refer to the vendor documentation to determine how your power management system works.
To use STONITH in Novell Cluster Services, you must create an executable /opt/novell/ncs/bin/NCS_STONITH_SCRIPT script file to authenticate to your power controller and turn off the power, cycle the power, or reset the power for the node. The script should take the node number as the only parameter. Node numbers are assigned as 0 to 31 for nodes 1 to 32. These are the same node numbers that appear in the /var/opt/novell/ncs/gipc.conf file. Creating the script file automatically enables STONITH for the node; you do not need to restart anything.
IMPORTANT:STONITH does not replace poison pills in Novell Cluster Services. Novell Cluster Services issues poison pills before running the STONITH script.
Use the following sample scripts as a guide for how to create a script that works with your power management technology:
This section provides a sample script for HP iLO power management cards. The sample code assumes the following setup for the cluster and iLO cards:
The iLO card on each node has been pre-configured to trust any instructions sent from each of the nodes in the same cluster.
Each cluster node’s iLO card is assigned a sequential static IP address, beginning with 192.168.188.201 on node 0, 192.168.188.202 on node 2, and so on up to 192.168.188.232 on node 32.
If you alternatively use DNS names in your script, the translation must be performed by the script.
The iLO card’s command to reset power is power reset.
Refer to the HP documentation to determine the commands available for your iLO cards.
For each node in the cluster, create the executable /opt/novell/ncs/bin/NCS_STONITH_SCRIPT file and add the script below. The presence of the file automatically enables STONITH for the cluster.
IMPORTANT:Ensure that you replace the sample information with the settings for your system.
#! /bin/bash if [ -n "$1" ]; then echo "Recycling power of node number $1 ... " iloIP=$(printf "192.168.188.2%02d" $(($1+1))) nodeIP=`grep "^nodeid .* ${1}\>" /var/opt/novell/ncs/gipc.conf | cut -d' ' -f2` while [ 1 ]; do ssh ${iloIP} power reset echo sleep 2 if ! ping -c 1 ${nodeIP} | grep "1 received" &> /dev/null then break fi sleep 5 done fi
This section provides a sample script for web-based power switches. The sample code assumes the following setup for the cluster and power switch:
Each cluster node is assigned a sequential static IP address, beginning with 10.10.189.100.
Each cluster node is plugged in to a sequential outlet in the power switch, beginning with node 0 in the first outlet.
The authentication information for the power switch management interface is:
User name: admin
Password: novell
IP address: 10.10.189.149
The power management switch’s command to cycle power is CCL.
Each vendor can have different command options available. Refer to your vendor documentation to determine the commands used by your power switch.
For each node in the cluster, create the executable /opt/novell/ncs/bin/NCS_STONITH_SCRIPT file and add the script below. The presence of the file automatically enables STONITH for the cluster.
IMPORTANT:Ensure that you replace the sample information with the settings for your system.
#! /bin/bash until [ -z "$1" ] do nodeNum=`expr $1 + 1` echo "Recycling power of node number $nodeNum ... " while [ 1 ]; do curl -u admin:novell http://10.10.189.149/outlet?${nodeNum}=CCL echo sleep 2 if ! ping -c 1 10.10.189.10${nodeNum} | grep "1 received" &> /dev/null then break fi sleep 5 done shift done