1.2 Disaster Recovery Implementations

Stretch clusters and cluster-of-clusters are two approaches for making shared resources available across geographically distributed sites so that a second site can be called into action after one site fails. To use these approaches, you must first understand how the applications you use and the storage subsystems in your network deployment can determine whether a stretch cluster or cluster of clusters solution is possible for your environment.

1.2.1 LAN-Based versus Internet-Based Applications

Traditional LAN applications require a LAN infrastructure that must be replicated at each site, and might require relocation of employees to allow the business to continue. Internet-based applications allow employees to work from any place that offers an Internet connection, including homes and hotels. Moving applications and services to the Internet frees corporations from the restrictions of traditional LAN-based applications.

By using Novell exteNd Director portal services, Novell Access Manager, and ZENworks, all services, applications, and data can be rendered through the Internet, allowing for loss of service at one site but still providing full access to the services and data by virtue of the ubiquity of the Internet. Data and services continue to be available from the other mirrored sites.

1.2.2 Host-Based versus Storage-Based Data Mirroring

For clustering implementations that are deployed in data centers in different geographic locations, the data must be replicated between the storage subsystems at each data center. Data-block replication can be done by host-based mirroring for synchronous replication over short distances up to 10 km. Typically, replication of data blocks between storage systems in the data centers is performed by SAN hardware that allows synchronous mirrors over a greater distance.

For stretch clusters, host-based mirroring is required to provide synchronous mirroring of the SBD (split-brain detector) partition between sites. This means that stretch-cluster solutions are limited to distances of 10 km.

Table 1-1 compares the benefits and limitations of host-based and storage-based mirroring.

Table 1-1 Comparison of Host-Based and Storage-Based Data Mirroring

Capability

Host-Based Mirroring

Storage-Based Mirroring

Geographic distance between sites

Up to 10 km

Can be up to and over 300 km. The actual distance is limited only by the SAN hardware and media interconnects for your deployment.

Mirroring the SBD partition

An SBD can be mirrored between two sites.

Yes, if mirroring is supported by the SAN hardware and media interconnects for your deployment.

Synchronous data-block replication of data between sites

Yes

Yes, requires a Fibre Channel SAN or iSCSI SAN.

Failover support

No additional configuration of the hardware is required.

Requires additional configuration of the SAN hardware.

Failure of the site interconnect

LUNs can become primary at both locations (split brain problem).

Clusters continue to function independently. Minimizes the chance of LUNs at both locations becoming primary (split brain problem).

SMI-S compliance

If the storage subsystems are not SMI-S compliant, the storage subsystems must be controllable by scripts running on the nodes of the cluster.

If the storage subsystems are not SMI-S compliant, the storage subsystems must be controllable by scripts running on the nodes of the cluster.

1.2.3 Stretch Clusters vs. Cluster of Clusters

A stretch cluster and a cluster of clusters are two clustering implementations that you can use with Novell Cluster Services to achieve your desired level of disaster recovery. This section describes each deployment type, then compares the capabilities of each.

Novell Business Continuity Clustering automates some of the configuration and processes used in a cluster of clusters. For information, see Section 1.3, Business Continuity Clustering.

Stretch Clusters

A stretch cluster consists of a single cluster where the nodes are located in two geographically separate data centers. All nodes in the cluster must be in the same Novell eDirectory tree, which requires the eDirectory replica ring to span data centers. The IP addresses for nodes and cluster resources in the cluster must share a common IP subnet.

At least one storage system must reside in each data center. The data is replicated between locations by using host-based mirroring or storage-based mirroring. For information about using mirroring solutions for data replication, see Section 1.2.2, Host-Based versus Storage-Based Data Mirroring. Link latency can occur between nodes at different sites, so the heartbeat tolerance between nodes of the cluster must be increased to allow for the delay.

The split-brain detector (SBD) is mirrored between the sites. Failure of the site interconnect can result in LUNs becoming primary at both locations (split brain problem) if host-based mirroring is used.

In the stretch-cluster architecture shown in Figure 1-1, the data is mirrored between two data centers that are geographically separated. The server nodes in both data centers are part of one cluster, so that if a disaster occurs in one data center, the nodes in the other data center automatically take over.

Figure 1-1 Stretch Cluster

Cluster of Clusters

A cluster of clusters consists of multiple clusters in which each cluster is located in a geographically separate data center. Each cluster can be in different Organizational Unit (OU) containers in the same eDirectory tree, or in different eDirectory trees. Each cluster can be in a different IP subnet.

A cluster of clusters provides the ability to fail over selected cluster resources or all cluster resources from one cluster to another cluster. For example, the cluster resources in one cluster can fail over to separate clusters by using a multiple-site fan-out failover approach. A given service can be provided by multiple clusters. Resource configurations are replicated to each peer cluster and synchronized manually. Failover between clusters requires manual management of the storage systems and the cluster.

Nodes in each cluster access only the storage systems co-located in the same data center. Typically, data is replicated by using storage-based mirroring. Each cluster has its own SBD partition. The SBD partition is not mirrored across the sites, which minimizes the chance for a split-brain problem occurring when using host-based mirroring. For information about using mirroring solutions for data replication, see Section 1.2.2, Host-Based versus Storage-Based Data Mirroring.

In the cluster-of-clusters architecture shown in Figure 1-2, the data is synchronized by the SAN hardware between two data centers that are geographically separated. If a disaster occurs in one data center, the cluster in the other data center takes over.

Figure 1-2 Cluster of Clusters

Comparison of Stretch Clusters and Cluster of Clusters

Table 1-2 compares the capabilities of a stretch cluster and a cluster of clusters.

Table 1-2 Comparison of Stretch Cluster and Cluster of Clusters

Capability

Stretch Cluster

Cluster of Clusters

Number of clusters

One

Two or more

Number of geographically separated data centers

Two

Two or more

eDirectory trees

Single tree only; requires the replica ring to span data centers.

One or multiple trees

eDirectory Organizational Units (OUs)

Single OU container for all nodes.

As a best practice, place the cluster container in an OU separate from the rest of the tree.

Each cluster can be in a different OU. Each cluster is in a single OU container.

As a best practice, place each cluster container in an OU separate from the rest of the tree.

IP subnet

IP addresses for nodes and cluster resources must be in a single IP subnet.

Because the subnet spans multiple locations, you must ensure that your switches handle gratuitous ARP (Address Resolution Protocol).

IP addresses in a given cluster are in a single IP subnet. Each cluster can use the same or different IP subnet.

If you use the same subnet for all clusters in the cluster of clusters, you must ensure that your switches handle gratuitous ARP.

SBD partition

A single SBD is mirrored between two sites by using host-based mirroring, which limits the distance between data centers to 10 km.

Each cluster has its own SBD.

Each cluster can have an on-site mirror of its SBD for high availability.

If the cluster of clusters uses host-based mirroring, the SBD is not mirrored between sites, which minimizes the chance of LUNs at both locations becoming primary.

Failure of the site interconnect if using host-based mirroring

LUNs might become primary at both locations (split brain problem).

Clusters continue to function independently.

Storage subsystem

Each cluster accesses only the storage subsystem on its own site.

Each cluster accesses only the storage subsystem on its own site.

Data-block replication between sites

For information about data replication solutions, see Section 1.2.2, Host-Based versus Storage-Based Data Mirroring.

Yes; typically uses storage-based mirroring, but host-based mirroring is possible for distances up to 10 km.

Yes; typically uses storage-based mirroring, but host-based mirroring is possible for distances up to 10 km.

Clustered services

A single service instance runs in the cluster.

Each cluster can run an instance of the service.

Cluster resource failover

Automatic failover to preferred nodes at the other site.

Manual failover to preferred nodes on one or multiple clusters (multiple-site fan-out failover).

Failover requires additional configuration.

Cluster resource configurations

Configured for a single cluster

Configured for the primary cluster that hosts the resource, then the configuration is manually replicated to the peer clusters.

Cluster resource configuration synchronization

Controlled by the master node

Manual process that can be tedious and error-prone.

Failover of cluster resources between clusters

Not applicable

Manual management of the storage systems and the cluster.

Link latency between sites

Can cause false failovers.

The cluster heartbeat tolerance between master and slave must be increased to as high as 30 seconds. Monitor cluster heartbeat statistics, then tune down as needed.

Each cluster functions independently in its own geographical site.

Evaluating Disaster Recovery Implementations for Clusters

Table 1-3 illustrates why a cluster of cluster solution is less problematic to deploy than a stretch cluster solution. Manual configuration is not a problem when using Novell Business Continuity Clustering for your cluster of clusters.

Table 1-3 Advantages and Disadvantages of Stretch Clusters versus Cluster of Clusters

Stretch Cluster

Cluster of Clusters

Advantages

  • It automatically fails over when configured with host-based mirroring.

  • It is easier to manage than separate clusters.

  • Cluster resources can fail over to nodes in any site.

  • eDirectory partitions don’t need to span the cluster.

  • Each cluster can be in different OUs in the same eDirectory tree.

  • IP addresses for each cluster can be on different IP subnets.

  • Cluster resources can fail over to separate clusters (multiple-site fan-out failover support).

  • Each cluster has its own SBD.

    Each cluster can have an on-site mirror of its SBD for high availability.

    If the cluster of clusters uses host-based mirroring, the SBD is not mirrored between sites, which minimizes the chance of LUNs at both locations becoming primary.

Disadvantages

  • The eDirectory partition must span the sites.

  • Failure of site interconnect can result in LUNs becoming primary at both locations (split brain problem) if host-based mirroring is used.

  • An SBD partition must be mirrored between sites.

  • It accommodates only two sites.

  • All IP addresses must reside in the same subnet.

  • Resource configurations must be manually synchronized.

  • Storage-based mirroring requires additional configuration steps.

Other Considerations

  • Host-based mirroring is required to mirror the SBD partition between sites.

  • Link variations can cause false failovers.

  • You could consider partitioning the eDirectory tree to place the cluster container in a partition separate from the rest of the tree.

  • The cluster heartbeat tolerance between master and slave must be increased to accommodate link latency between sites.

    You can set this as high as 30 seconds, monitor cluster heartbeat statistics, and then tune down as needed.

  • Because all IP addresses in the cluster must be on the same subnet, you must ensure that your switches handle ARP.

    Contact your switch vendor or consult your switch documentation for more information.

  • Depending on the platform used, storage arrays must be controllable by scripts that run on OES 2 Linux if the SANs are not SMI-S compliant.