Novell Doc: NDK: Cluster Services Developer Kit

1.1 Cluster Services Features

Here are some of the key features provided by NCS:

1.1.1 Multi-Node Distributed Failover

When a node fails, distributed failover allows applications and services to be distributed to multiple surviving servers to prevent an overload of any single node. Multi-node failover support, known as Fan-out Failover™, allows users to have continued access to their resources even in the event of major failures where more than one node in the cluster goes down.

The cluster’s resources can be configured such that if one or more of the cluster nodes ever fails, any surviving nodes in the cluster can take over for the failed nodes. In clusters that utilize shared disk storage systems, the surviving nodes will remount the failed nodes’ volumes and activate any cluster resources that had been provided by the failed nodes such as sever-based applications.

1.1.2 Transparent Client Reconnect

When a node failure occurs, any NetWare clients that were logged into that NetWare server are automatically and transparently reconnected to the surviving node in the cluster responsible for taking over the failed node’s data. Transparent client reconnect preserves users’ drive mappings when their volumes are remounted on a surviving server.

This feature also supports open files and file locks on Win95/98 clients, depending on the operational mode of the file. Additionally, server-based applications with transaction tracking can be configured such that when a node failure occurs transactions proceed uninterrupted by the failure and unnoticed by the user.

1.1.3 No Downtime for Server Maintenance

NetWare Cluster Services provides users continual access to the network resource provided by a cluster even during routine maintenance shut downs of individual NetWare servers within the cluster. Before shutting down a server for maintenance, network administrators can migrate that server’s cluster resources to another NetWare server in the cluster. As a result, users experience no interruption in access to or service from the network resources provided by the cluster.

1.1.4 Automatic Trustee Migration

On NetWare 5.x, when a cluster node takes over cluster resources for a failed node, the Novell eDirectory trustee rights from the failed nodes are automatically transferred to the surviving node. This allows users to have continued access to the data and services provided to them by their trustee rights.

With NetWare 6, all trustee rights are represented by a globally unique ID that does not require translation between servers. Therefore, the trustee migration process is redundant in NetWare 6 clusters and is no longer used.

1.1.5 Split Brain Detection

The patent-pending Split Brain Detector (SBD) avoids data corruption during split brain conditions by preventing multiple network servers in a cluster from trying to mount and access the same volume of a failed node. A split brain condition exists when a disruption in LAN communication makes it impossible for normal inter-node communication to take place within the cluster.

In this event, certain nodes can become isolated from the others so that the separate nodes determine that they are the only surviving nodes. This creates a dangerous situation because all nodes might still have access to the shared data storage system. In this instance, if two separate nodes access the same volume, data could be corrupted. NetWare Cluster Service’s SBD can detect and resolve these conditions, ensuring that no data corruption occurs.

1.1.6 NSS Support

NetWare Cluster Services support of Novell Storage Service™ (NSS) volumes in shared storage systems considerably reduces the time it takes to remount a volume from a failed node to a surviving node in the cluster. Providing faster volume remount facilitates transparently migrating resources to surviving nodes.

1.1.7 Single-Point Administration

NetWare Cluster Services leverages the power of Novell eDirectory by taking advantage of its single point administration capabilities. Cluster resource information, protocols, and polices are maintained in eDirectory cluster containers and objects and can be managed through ConsoleOne®, Novell’s Java^*-based management tool. Additionally, network administrators can use a Web browser to view the current status of the cluster’s servers and resources from any Internet connection.

1.1.8 Epoch Number

The epoch number indicates the number of times the cluster state has changed. The cluster state changes every time a server joins or leaves the cluster. For example, when a group of clustered nodes are activated and they register with each other, an epoch event is logged. When a node subsequently leaves the cluster, the epoch number is increased by one. In this way, changes within the cluster are identified and documented so that cluster changes are monitored and managed. This process is described more fully by NCS_EPOCH_CHANGE_CALLBACK.