8.3 OCF Return Codes and Failure Recovery

According to the OCF specification, there are strict definitions of the exit codes an action must return. The cluster always checks the return code against the expected result. If the result does not match the expected value, then the operation is considered to have failed and a recovery action is initiated. There are three types of failure recovery:

Table 8-1 Failure Recovery Types

Recovery Type

Description

Action Taken by the Cluster

soft

A transient error occurred.

Restart the resource or move it to a new location.

hard

A non-transient error occurred. The error may be specific to the current node.

Move the resource elsewhere and prevent it from being retried on the current node.

fatal

A non-transient error occurred that will be common to all cluster nodes. This means a bad configuration was specified.

Stop the resource and prevent it from being started on any cluster node.

Assuming an action is considered to have failed, the following table outlines the different OCF return codes and the type of recovery the cluster will initiate when the respective error code is received.

Table 8-2 OCF Return Codes

OCF Return Code

OCF Alias

Description

Recovery Type

0

OCF_SUCCESS

Success. The command completed successfully. This is the expected result for all start, stop, promote and demote commands.

soft

1

OCF_ERR_GENERIC

Generic there was a problem error code.

soft

2

OCF_ERR_ARGS

The resource’s configuration is not valid on this machine (for example, it refers to a location/tool not found on the node).

hard

3

OCF_ERR_UNIMPLEMENTED

The requested action is not implemented.

hard

4

OCF_ERR_PERM

The resource agent does not have sufficient privileges to complete the task.

hard

5

OCF_ERR_INSTALLED

The tools required by the resource are not installed on this machine.

hard

6

OCF_ERR_CONFIGURED

The resource’s configuration is invalid (for example, required parameters are missing).

fatal

7

OCF_NOT_RUNNING

The resource is not running. The cluster will not attempt to stop a resource that returns this for any action.

This OCF return code may or may not require resource recovery—it depends on what is the expected resource status. If unexpected, then soft recovery.

N/A

8

OCF_RUNNING_MASTER

The resource is running in Master mode.

soft

9

OCF_FAILED_MASTER

The resource is in Master mode but has failed. The resource will be demoted, stopped and then started (and possibly promoted) again.

soft

other

N/A

Custom error code.

soft