Novell Home

My Favorites

Close

Please to see your favorites.

All HAE nodes fail to start clustering after reboot

This document (7011302) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 11 (HAE)
SUSE Linux Enterprise Server 11 (SLES)
Split Brain Detection (SBD) Partitions

Situation

The boot screen shows the following errors:

Starting OpenAIS/Corosync daemon (corosync): Starting SBD - SBD failed to start; aborting.
Failed services in runlevel 3: openais

The /etc/sysconfig/sbd configuration file on all nodes shows:
SBD_DEVICE="/dev/sdc1;/dev/sdd1;/dev/sde1"
SBD_OPTS="-W"

# /usr/sbin/cibadmin -Q
Signon to CIB failed: connection failed
Init failed, could not perform requested operations
The stonith resource in the /var/lib/heartbeat/crm/cib.xml shows:
<primitive class="stonith" id="stonith-sbd" type="external/sbd">
  <instance_attributes id="stonith-sbd-instance_attributes">
    <nvpair id="stonith-sbd-instance_attributes-sbd_device" name="sbd_device" value="/dev/sdb1;/dev/sdc1;/dev/sdd1"/>
  </instance_attributes>
</primitive>

Resolution

Make the device list match in all the /etc/sysconfig/sbd files on all nodes and in the CIB database. There are two scenarios that need to be addressed. One is where the CIB database has the correct list of SBD devices, and the other is where the /etc/sysconfig/sbd file has the list of correct SBD devices. The resolution is different for each.



Method 1 when CIB Database is Correct
For example, the stonith resource in the /var/lib/heartbeat/crm/cib.xml shows:
<primitive class="stonith" id="stonith-sbd" type="external/sbd">
  <instance_attributes id="stonith-sbd-instance_attributes">
    <nvpair id="stonith-sbd-instance_attributes-sbd_device" name="sbd_device" value="/dev/sdb1;/dev/sdc1;/dev/sdd1"/>
  </instance_attributes>
</primitive>

1. On one node, modify the /etc/sysconfig/sbd file.
2. Change the SBD_DEVICE variable to match the CIB database.
SBD_DEVICE="/dev/sdb1;/dev/sdc1;/dev/sdd1"
SBD_OPTS="-W"
3. Save the copy the /etc/sysconfig/sbd file to all nodes in the cluster
scp /etc/sysconfig/sbd node2:/etc/sysconfig/sbd

4. Recreate the sbd partitions as listed in the CIB database
sbd -d /dev/sdb1 -d /dev/sdc1 -d /dev/sdd1 create



Method 2 when /etc/sysconfig/sbd is Correct
The correct /etc/sysconfig/sbd shows:
SBD_DEVICE="/dev/sdc1;/dev/sdd1;/dev/sde1"
SBD_OPTS="-W"

1. Rename the /etc/sysconfig/sbd file to /etc/sysconfig/sbd.save on all nodes in the cluster.
mv /etc/sysconfig/sbd /etc/sysconfig/sbd.save

2. Reboot all nodes in the cluster
3. Remove the stonith resource parameter list or add the correct sbd_device list.
Assuming stonith resource name of stonith-sbd:
crm_resource --delete --resource stonith-sbd --resource-type primitive

crm configure primitive stonith_sbd stonith:external/sbd params sbd_device="/dev/sdc1;/dev/sdd1;/dev/sde1"
-OR-
crm configure primitive stonith_sbd stonith:external/sbd

4. Rename the /etc/sysconfig/sbd.save back to /etc/sysconfig/sbd on all nodes in the cluster
mv /etc/sysconfig/sbd.save /etc/sysconfig/sbd

5. Reboot all nodes in the cluster

Cause

The SBD_DEVICE list in /etc/sysconfig/sbd did not match the Cluster Information Base (CIB) database stonith resource sbd_device list. They must match, or the CIB stonith resource should not have a sbd_device list specified. Without the sbd_device list in the CIB database, clustering will use the SBD_DEVICE list in /etc/sysconfig/sbd.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011302
  • Creation Date:02-NOV-12
  • Modified Date:02-NOV-12
    • SUSESUSE Linux Enterprise High Availability Extension
      SUSE Linux Enterprise Server

Did this document solve your problem? Provide Feedback