1.4 Clustering for High Availability

A OES Cluster Services for Linux cluster consists of the following components:

2 to 32 OES servers, each containing at least one local disk device.
Cluster Services software running on each Linux server in the cluster.
A shared disk subsystem connected to all servers in the cluster (optional, but recommended for most configurations).
Equipment to connect servers to the shared disk subsystem, such as one of the following:
- High-speed Fibre Channel cards, cables, and switches for a Fibre Channel SAN
- Ethernet cards, cables, and switches for an iSCSI SAN
- SCSI cards and cables for external SCSI storage arrays

The benefits that OES Cluster Services provides can be better understood through the following scenario.

Suppose you have configured a three-server cluster, with a web server installed on each of the three servers in the cluster. Each of the servers in the cluster hosts two websites. All the data, graphics, and web page content for each website is stored on a shared disk system connected to each of the servers in the cluster. Figure 1-1 depicts how this setup might look.

Figure 1-1 Three-Server Cluster

During normal cluster operation, each server is in constant communication with the other servers in the cluster and performs periodic polling of all registered resources to detect failure.

Suppose Web Server 1 experiences hardware or software problems and the users who depend on Web Server 1 for Internet access, email, and information lose their connections. Figure 1-2 shows how resources are moved when Web Server 1 fails.

Figure 1-2 Three-Server Cluster after One Server Fails

Web Site A moves to Web Server 2 and Web Site B moves to Web Server 3. IP addresses and certificates also move to Web Server 2 and Web Server 3.

When you configured the cluster, you decided where the websites hosted on each web server would go if a failure occurred. You configured Web Site A to move to Web Server 2 and Web Site B to move to Web Server 3. This way, the workload once handled by Web Server 1 is evenly distributed.

When Web Server 1 failed, Cluster Services software did the following:

Detected a failure.
Remounted the shared data directories (that were formerly mounted on Web Server 1) on Web Server 2 and Web Server 3 as specified.
Restarted applications (that were running on Web Server 1) on Web Server 2 and Web Server 3 as specified.
Transferred IP addresses to Web Server 2 and Web Server 3 as specified.

In this example, the failover process happened quickly and users regained access to website information within seconds, and in most cases, without logging in again.

Now suppose the problems with Web Server 1 are resolved, and Web Server 1 is returned to a normal operating state. Web Site A and Web Site B automatically fail back (that is, they are moved back to Web Server 1) if failback is configured for those resources, and Web Server operation returns to the way it was before Web Server 1 failed.

OES Cluster Services also provides resource migration capabilities. You can move applications, websites, and so on to other servers in your cluster without waiting for a server to fail.

For example, you could manually move Web Site A or Web Site B from Web Server 1 to either of the other servers in the cluster. You might want to do this to upgrade or perform scheduled maintenance on Web Server 1, or to increase performance or accessibility of the websites.