4.3 Additional Cluster Operating Instructions

The following instructions provide additional information for operating Novell Cluster Services.

4.3.1 Installing NetWare on a Server To Be Added to an Existing Cluster

  1. Install fibre channel hardware.

    NetWare will automatically detect and load the proper drivers when it installs.

  2. Install NetWare, including the latest Service Pack, on the new server.

  3. Reboot the server.

  4. Install Novell Cluster Services on the new server.

  5. (Conditional) If you have changed the failover order, add the new server to the failover list.

    By default, Novell Cluster Services will include all nodes on the failover list, including newly added nodes.

4.3.2 Readding a Node to a Cluster That Was Prevously in the Cluster

Previously, if you were reinstalling a cluster node and using the same server name, you had to reconfigure Novell Cluster Services on that node and restart cluster software on all nodes in the cluster prior to adding the node back into the cluster. An option now exists in iManager to automate the process for readding a node to a cluster.

  1. If necessary, install NetWare, including the latest Service Pack on the server using the same node name and IP address.

  2. In the left column of the main iManager page, locate Clusters, then click the Cluster Options link.

  3. Type the cluster name or browse and select it, then click the Repair button under the cluster name.

    This will automatically configure Novell Cluster Services on the server. This lets you keep the existing cluster nodes running and in the cluster while reinstalling a failed cluster node.

  4. After clicking the Repair button to automatically configure Novell Cluster Services on the new cluster node, copy the ldncs.ncf and uldncs.ncf files from the sys:\system directory on one of the other cluster servers to the sys:\system directory of the new cluster server.

4.3.3 Cluster-Enabled Volume Connection Required for Some Utilities

Because Novell Cluster Services uses eDirectory™ to find objects and resolve names, you must first establish a client connection to a cluster-enabled volume for it to be visible to certain utilities.

Do this by browsing to and selecting the eDirectory Volume object using Windows Explorer.

4.3.4 Some Applications Do Not Fail Over

Although all NetWare 6.5 applications will run on a cluster node, not all applications are capable of being configured as a cluster application and failed over to a new node.

4.3.5 Cluster Services Fails to Start

If you chose to have Novell Cluster Services software start automatically after installation, you might encounter a problem with Cluster Services software starting on some servers immediately after the installation. This problem is due to Cluster Services software starting prior to cluster-related eDirectory objects being created and replicated.

To resolve this problem, wait for a few minutes and then manually start Cluster Services software by entering ldncs at the server console.

4.3.6 Preventing Cascading Failovers

Cascading failover occurs when a bad cluster resource causes a server to fail, then fails over to another server causing it to fail, and then continues failing over to and bringing down additional cluster servers until possibly all servers in the cluster have failed.

Novell Cluster Services now incorporates functionality that detects if a node has failed because of a bad cluster resource and prevents that bad resource from failing over to other servers in the cluster.

This functionality is enabled by default when you install Novell Cluster Services. Cascading failover prevention can be disabled by adding the /hmo=off parameter to the clstrlib command in the sys:\system\ldncs.ncf file.

After adding the parameter, the line should appear as follows:

clstrlib /hmo=off

If you disable cascading failover prevention on one cluster server, you must do it on all servers in the cluster.

You must manually unload and reload Novell Cluster Services software on every cluster server in order for this change to take effect. To do this, use the uldncs command to unload cluster software and the ldncs command to reload cluster software.

Resource Quarantine

If cascading failover protection is enabled, a resource might be put into quarantine if it causes server abends for a three-day period. If Novell Cluster Services software determines that the resource is likely responsible for abends, and loading the resource will put the cluster in grave danger, it will cause the resource to go into a comatose state (quarantine it) rather than letting it load on (and potentially cause to fail) other cluster nodes.

The resource can still be manually brought online and manually migrated to other cluster nodes. To get the resource out of quarantine, you can disable cascading failover prevention. Cascading failover prevention can then be re-enabled by removing the clstrlib /hmo=off line from the sys:\system\ldncs.ncf file, then unloading and reloading Novell Cluster Services software.

Novell Cluster Services does the following to determine if a resource should be put into quarantine:

  1. Traces back the history of node failures for the suspected bad resource. This includes
    • What node the resource was running or loading on.
    • If the node failed.
    • The state the resource was in when the node failed.
    • If there were other resources trying to load when the node failed.
  2. Repeats the above process until one of the following happens
    • The end of the cluster log file is reached
    • Enough node failures are found
    • Found that the node did not fail
    • The entries in the log file are more than three days old

If the resource attempts to load on a node where it was previously loaded and there are additional nodes still available in the cluster, it will not be quarantined and will be allowed to load. Also, a resource is not quarantined when it is initially brought online.

Factors that might contribute to a resource being quarantined include:

  • A large number of node failures
  • No other resources are causing node failures
  • The resource never reaches a running state

Factors that might help prevent a resource from being quarantined include:

  • A small number of node failures
  • Other resources are causing node failures
  • The resource reaches a running state
  • There is one node left up and running in the cluster

Resource quarantine is disabled if

  • Cascading failover prevention is turned off.

    This is done by adding the /hmo=off parameter to the clstrlib command in the sys:\system\ldncs.ncf file.

  • There is no shared storage (SAN) or SBD partition
  • There are enough nodes in the cluster to form a quorum

4.3.7 Cluster Maintenance Mode

Cluster maintenance mode lets you temporarily suspend the cluster heartbeat while hardware maintenance is being performed. This is useful if you want to reset or power down the LAN switch without bringing down cluster servers. See Section 4.4, Novell Cluster Services Console Commands for more information.

If the master server in the cluster goes down while the cluster is in cluster maintenance mode, you must enter cluster maintenance off on all remaining cluster servers to bring the cluster out of maintenance mode. This is only necessary if the master server in the cluster goes down. If the master server in the cluster is up, you can enter cluster maintenance off on one server in the cluster to bring the entire cluster out of maintenance mode.

4.3.8 Unload Cluster Services Software When Servicing Shared Storage

If you need to power down or recycle your shared storage system, you must unload Cluster Services software prior to doing so. You can unload the software using the uldncs command at the server console of each cluster server.

4.3.9 Displaying Bound Virtual IP Addresses

To verify that a virtual IP address is bound, enter display secondary ipaddress at the server console of the cluster server where the virtual IP address is assigned. This will display all bound virtual IP addresses. A maximum of 256 virtual IP addresses can be bound.

4.3.10 Upgrading the NSS Volume Media Format

A new media format for NSS volumes is available for OES SP1 NetWare and NetWare 6.5 SP4 that provides improved support for hard links. The media is not automatically upgraded to the new format when you upgrade to OES SP1 NetWare or a NetWare 6.5 SP4.

See Upgrading the Media Format (NetWare) in the Novell Storage Services File System Administration Guide for OES for more information and to determine if you should upgrade your NSS media format.