Novell Doc: NDK: Cluster Services Developer Kit

D.1 HMO Architecture

Because cluster-enabled resources automatically fail over when a server fails, you can continuously access service without interruption. However, an individual service sometimes may become unavailable or very slow to respond, while other services and the server itself run normally.

While the Cluster Services Features monitor the status of each node in a cluster, the HMO utility monitors the system performance, or health, of specific individual services that you enable within a cluster using the Cluster Services Developer Kit.

The health performance parameters that trigger some response are service-specific and configured when the service is enabled. Once enabled, the HMO helps monitor and manage the service within the cluster. If a health problem is detected within the service, the HMO can trigger various responses to restore the service to optimal levels.

D.1.1 HMO NLM Structure

The HMO (hmonx_nlm) in section A of the figure is a two-tiered NLM that consists of a skeleton (hmo.c) and a second component that is configured to pass information that is specific to the service being monitored (ldaphmo.c). The skeleton contains two generic interface probes called looksAlive and isAlive. You configure these probes to monitor the health of the service and also to specify how Cluster Services manages the service if service health levels reach set minimums.

Figure D-1 LDAP Health Monitor Object Architecture

The lightweight looksAlive probe is designed to do a cursory test of the service; it is designed to run more frequently than isAlive. If the service fails according to the thresholds configured in looksAlive or isAlive, the skeleton uses the NCSSDK to trigger one or more response actions. These actions ensure that the service retains maximum health and availability, as illustrated by section B in the figure. Response actions might include the following:

Migrating the service to another more reliable resource within the cluster
Restarting the service on the same node
Directing the service to use another network
Stopping the current processing and moving to the next task (typical for an e-mail service processing a message with a virus)

HMOLDAP Example

The hmoldap_nlm file monitors the health of LDAP running in the cluster. In this example (as illustrated in section C), looksAlive might be configured to perform a simple_bind every 20 seconds, while isAlive might be called to perform a deeper ldap_search query of the service every 60 seconds.

If probing deems the service fails based upon query results, the skeleton might invoke the NCSSDK to migrate the LDAP server to another more stable environment within the cluster or trigger some other action. You specify the response actions you want when you set up the skeleton tier of the NLM, as demonstrated in the Health Monitor Object (HMO) sample code .

Implementing this API is an effective way to reduce the impact of LDAP servers that occasionally lock up. The probes do a simple bind for looksAlive and a deeper search from calling isAlive.

The cluster resource is an IP address over which the client is configured to access LDAP. Consequently, when the IP address moves from one LDAP server in the cluster to another, clients automatically switch to an alternate LDAP server, even the "dumb" LDAP clients that are configured to use only one specific LDAP server. This functionality provides higher reliability for these LDAP clients.Use the Health Monitor Object (HMO) sample code as a template to write HMOs for services other than LDAP.