1.2 Disaster Recovery Implementations

There are two main Novell Cluster Services implementations that you can use to achieve your desired level of disaster recovery. These include a stretch cluster and a cluster of clusters. The Novell Business Continuity Clustering product automates some of the configuration and processes used in a cluster of clusters.

1.2.1 Stretch Clusters vs. Cluster of Clusters

Stretch Clusters

A stretch cluster consists of one cluster in which the nodes in the cluster are located in geographically separate areas. All nodes in the cluster must be in the same eDirectory™ tree. In this architecture, the data is mirrored between two data centers that are geographically separated. All the machines in both data centers are part of one cluster, so that if a disaster occurs in one data center, the other automatically takes over.

Figure 1-1 Stretch Cluster

Cluster of Clusters

A cluster of clusters consists of two or more clusters in which each cluster is located in a geographically separate area. A cluster of clusters provides the ability to fail over selected cluster resources or all cluster resources from one cluster to another cluster. Typically, replication of data blocks between SANs is performed by SAN hardware, but it can be done by host-based mirroring for synchronous replication over short distances.

Figure 1-2 Cluster of Clusters

Implementation Comparison

Table 1-1 Disaster Recovery Implementation Comparison

Stretch Cluster

Cluster of Clusters

Advantages

  • It automatically fails over.

  • It is easier to manage than separate clusters.

  • The chance of LUNs at both locations becoming primary is minimized.

  • eDirectory partitions don't need to span the cluster.

  • Each cluster can be in a separate eDirectory tree.

  • IP addresses for each cluster can be on different IP subnets.

  • It accommodates more than two sites, and cluster resources can fail over to separate clusters (multiple-site fan-out failover support).

  • SBD partitions are not mirrored between sites.

Disadvantages

  • Failure of site interconnect can result in LUNs becoming primary at both locations (split brain problem) if host-based mirroring is used.

  • An SBD partition must be mirrored between sites.

  • It accommodates only two sites.

  • All IP addresses must reside in the same subnet.

  • The eDirectory partition must span the cluster.

  • Resource configurations must be kept in sync manually.

Other Considerations

  • Host-based mirroring is required to mirror the SBD partition between sites.

  • Link variations can cause false failovers.

  • You could consider partitioning the eDirectory tree to place the cluster container in a partition separate from the rest of the tree.

  • The cluster heartbeat must be increased to accommodate link latency between sites.

    You can set this as high as 30 seconds, monitor cluster heartbeat statistics, and then tune down as needed.

  • Because all IP addresses in the cluster must be on the same subnet, you must ensure that your routers handle gratuitous ARP.

    Contact your router vendor or consult your router documentation for more information.

  • Depending on the platform used, storage arrays must be controllable by scripts that run on NetWare® or Linux if the SANs are not SMI-S compliant.

1.2.2 Novell Business Continuity Clusters

Novell Business Continuity Clusters is a cluster of clusters similar to what is described above, except that the cluster configuration, maintenance, and synchronization have been automated by adding specialized software.

Novell Business Continuity Clustering software is an integrated set of tools to automate the setup and maintenance of a Business Continuity infrastructure. Unlike competitive solutions that attempt to build stretch clusters, Novell Business Continuity Clustering utilizes a cluster of clusters. Each site has its own independent clusters, and the clusters in each of the geographically separate sites are each treated as “nodes” in a larger cluster, allowing a whole site to do fan-out failover to other multiple sites. Although this can currently be done manually with a cluster of clusters, Novell Business Continuity Clustering automates the system by using eDirectory and policy-based management of the resources and storage systems.

Novell Business Continuity Clustering software provides the following advantages:

  • Integrates with SAN hardware devices to automate the failover process using standards based mechanisms such as SMI-S.

  • Utilizes Novell Identity Manager technology to automatically synchronize and transfer cluster-related eDirectory objects from one cluster to another.

  • Provides the capability to fail over as few as one cluster resource, or as many as all cluster resources.

  • Includes intelligent failover that lets you do site failover testing as a standard practice.

  • Provides scripting capability for enhanced control and customization.

  • Provides simplified business continuity cluster configuration and management by using the browser-based iManager management tool.

  • Runs on Linux* and NetWare.

1.2.3 Usage Scenarios

There are several Business Continuity Clustering usage scenarios that can be used to achieve the desired level of disaster recovery. Three possible scenarios include:

Two-Site Business Continuity Cluster Solution

The two-site solution can be used in one of two ways:

  • A primary site in which all services are normally active, and a secondary site which is effectively idle, with the data mirrored at it and the applications and services ready to load if needed.

  • Two active sites each supporting different applications and services. Either site can take over for the other site at any time.

The first option is typically used when the purpose of the secondary site is primarily testing by the IT department. The second option is typically used in a company that has more than one large site of operations.

Figure 1-3 Two-Site Business Continuity Cluster

Multiple-Site Business Continuity Cluster Solution

This is a large Business Continuity Cluster solution capable of supporting up to 32 nodes per site and more than two sites. Services and applications can do fan-out failover between sites. Replication of data blocks is typically done by SAN vendors, but can be done by host-based mirroring for synchronous replication over short distances. The illustration below depicts a four-site business continuity cluster.

Figure 1-4 Multiple-Site Business Continuity Cluster

Using the Novell Portal Services, iChain®, and ZENworks® products, all services, applications, and data can be rendered through the Internet, allowing for loss of service at one site but still providing full access to the services and data by virtue of the ubiquity of the Internet. Data and services continue to be available from the other mirrored sites. Moving applications and services to the Internet frees corporations from the restrictions of traditional LAN-based applications. Traditional LAN applications require a LAN infrastructure that must be replicated at each site, and might require relocation of employees to allow the business to continue. Internet-based applications allow employees to work from any place that offers an Internet connection, including homes and hotels.

Low-Cost Business Continuity Cluster Solution

The low-cost business continuity cluster solution is similar to the previous two solutions, but replaces Fibre Channel arrays with iSCSI arrays. Data block mirroring can be accomplished either with iSCSI-based block replication, or host-based mirroring. In either case, snapshot technology can allow for asynchronous replication over long distances. However, the low-cost solution does not necessarily have the performance associated with higher-end Fibre Channel storage arrays.