Novell Home

My Favorites

Close

Please to see your favorites.

Disk latency may cause unwanted node fencing

This document (7011350) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 11 (HAE)
SUSE Linux Enterprise Server 11 (SLES)
Split Brain Detection (SBD)

Situation

Occasionally a node will reboot due to SBD self-fencing when it may not have been necessary. The following error messages appear in the system logs:

sbd: [18584]: WARN: Latency: No liveness for 4 s exceeds threshold of 3 s (healthy servants: 0)
sbd: [18584]: WARN: Latency: No liveness for 5 s exceeds threshold of 3 s (healthy servants: 0)
sbd: [18584]: WARN: Latency: No liveness for 6 s exceeds threshold of 3 s (healthy servants: 0)
sbd: [18585]: WARN: Latency: 6 exceeded threshold 3 on disk /dev/disk/by-id/dm-uuid-mpath-3600508b40007015738922001340000

The sbd partition metadata shows the following:

# /usr/sbin/sbd -d /dev/sdb1 dump
==Dumping header on disk /dev/sdb1
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 10
==Header on disk /dev/sdb1 is dumped

Resolution

Check the health of the disks with SBD partitions and increase the watchdog timeout value and/or add SBD partitions. Remember the msgwait value should be about twice the watchdog value. It must be changed at the same time.

hn1:~ # cat /etc/sysconfig/sbd
SBD_DEVICE="/dev/sdb1;/dev/sdc1;/dev/sdd1"
SBD_OPTS="-W"

hn1:~ # sbd -1 10 -4 20 -d /dev/sdb1 -d /dev/sdc1 -d /dev/sdd1 create
Initializing device /dev/sdb1
Creating version 2 header on device 3
Initializing 255 slots on device 3
Device /dev/sdb1 is initialized.
Initializing device /dev/sdc1
Creating version 2 header on device 3
Initializing 255 slots on device 3
Device /dev/sdc1 is initialized.
Initializing device /dev/sdd1
Creating version 2 header on device 3
Initializing 255 slots on device 3
Device /dev/sdd1 is initialized.

hn1:~ # sbd -d /dev/sdb1 dump
==Dumping header on disk /dev/sdb1
Header version     : 2
Number of slots    : 255
Sector size        : 512
Timeout (watchdog) : 10
Timeout (allocate) : 2
Timeout (loop)     : 1
Timeout (msgwait)  : 20
==Header on disk /dev/sdb1 is dumped

Cause

SBD will self-fence the node if it can't read from the device for longer than the watchdog timeout; which defaults to 5 seconds. This is key, since sbd (as a sender) relies on the message either being delivered or the node having self-fenced if the device is unreadable.

You can increase this [watchdog] to 10 or even 20s (you need to recreate the sbd device for that, the timeouts are configured at creation time), but take care to adjust the msgwait timeout at the same time to approximately twice the watchdog timeout.

You can decrease the latency impact by adding SBD partitions. For example, if you have three SBD partitions, at least two of those devices would need to exceed the latency threshold before a self-fence would occur.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7011350
  • Creation Date:12-NOV-12
  • Modified Date:12-NOV-12
    • SUSESUSE Linux Enterprise High Availability Extension
      SUSE Linux Enterprise Server

Did this document solve your problem? Provide Feedback