Novell Home

My Favorites

Close

Please to see your favorites.

Cluster resources failing on stop take too long to recover

This document (7012355) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise High Availability Extension 11
SUSE Linux Enterprise Server 11

Situation

Once a resource has a stop failure, the node is supposed to be fenced to recover from the failure. It is taking 10 minutes for the resource to detect and recover from the stop failure.

A portion of the cluster information base (CIB) looks like this:

<op_defaults>
<meta_attributes id="op_defaults-options">
<nvpair id="op_defaults-options-timeout" name="timeout" value="600"/>
<nvpair id="op_defaults-options-record-pending" name="record-pending" value="true"/>
</meta_attributes>
</op_defaults>

Resolution

Remove the Operation Defaults timeout value

1. Launch Pacemaker GUI (hb_gui)
2. Login and select Operation Defaults
3. Highlight "timeout" in the Meta Attributes tab.
4. Select Remove and Yes to confirm.

Cause

The default timeout value for stop failures was set too high.

Additional Information

If a resource does not have it's own stop timeout value, the operation defaults timeout value will be used. When a resource is stopped on the node, but that resource fails to stop as it should, the cluster will wait for the time specified in the operation defaults timeout. If the value is too high, it will take too long for the resource to be started on another node, thus decreasing its availability. This value should really never be more than 60 seconds.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7012355
  • Creation Date:02-MAY-13
  • Modified Date:06-MAY-13
    • SUSESUSE Linux Enterprise High Availability Extension
      SUSE Linux Enterprise Server

Did this document solve your problem? Provide Feedback