Tech Talk 1 by Ken Baker
How Many Nines Do You Need?
The Site-to-Site High Availability of Novell Business Continuity Clustering
- Proof Point O’Reilly Automotive implemented SUSE Linux Enterprise Server in its data center and across more than 1,800 stores. A flexible Linux platform helps the company be more competitive and keep pace with double-digit annual growth.
- Trend Talk Computing in the Cloud is Taking its Rightful Place with Online Hosted Backup and Recovery Services
Preparing to Implement Novell Business Continuity Clustering
The first step in ensuring that your critical data and services remain available in the event of a disaster is to design your infrastructure based on your business needs. This means identifying the key system factors that drive your business. You need to determine which of your services are most critical to your operations, where those services currently run, and where they need to be running to ensure business continuity.
One of the keys to business continuity is to make sure your individual data center clusters are rock solid; however, your business continuity plans also need to take into consideration Local Area Network (LAN) connectivity, SAN connectivity, storage design and eDirectory design.
Your main goal for LAN connectivity in your clusters is to protect your heartbeat process to avoid false split-brain scenarios. The heartbeat process basically consists of a node sending out a ping to let other cluster members know that it is running as expected. If connectivity problems prevent a node’s heartbeat ping from reaching other nodes in the cluster, the cluster must decide that the node is not functioning and must cast that node out of the cluster, moving its cluster resources to another functioning node. But if both cluster nodes happen to stay alive and try to assume responsibilities of all the cluster resources, it results in diverging sets of data on those servers and even data corruption on the shared disk.
To ensure that heartbeat pings reach other cluster members and avoid unnecessary resource migrations caused by split-brain scenarios, Novell employs a patented split-brain detection method that uses both LAN- and SAN-based communication to determine the true state of cluster nodes. So, even if a server loses LAN connectivity, Novell Cluster Services can still receive heartbeat pings via SAN-based communications and vice versa.
Even though Novell Cluster Services provides multiple paths for heartbeat communications, it still is a good idea to have redundant LAN communication paths between clients and cluster nodes. Also, it's recommended that you have a dedicated virtual LAN (VLAN) and a dedicated IP address range for each cluster. You should also have redundant links between your data center sites. These steps provide additional protection against unnecessary resource migrations and divergent data sets.
In terms of SAN connectivity, you need to ensure redundant access to the Split Brain Detection partition to avoid false SAN device alerts. You also need to ensure redundant connections to each data disk. This might mean connecting cluster nodes and storage systems via two independent fabrics, configuring two paths between cluster nodes and storage systems using native multi–pathing technology or vendor-specific solutions, or having a minimum of two mirror connections between storage systems over different fabrics and Wide Area Networks (WANs).
For storage design, you need to have independent resources for your failover. In a Business Continuity Clustering environment, the Logical Unit Number (LUN) is the failover unit. While it is possible to have multiple storage pools or partitions for each LUN, it is not recommended. The primary storage design principle is to have one storage pool per LUN. Also, as mentioned previously, data must be mirrored between data centers. You can implement host-based mirroring, but storage-based mirroring is recommended. If you use host-based mirroring, make sure that mirrored partitions are only accessible for the nodes of one of the member clusters at any given time.
A major part of preparing for a Novell Business Continuity Clustering implementation will be figuring out what data to replicate. While you might want to mirror all the data from one data center SAN onto another data center SAN, it’s typically not feasible financially or operationally. You need to determine how much business data you need to replicate between your data centers, and that begins with an assessment of your various data sets.
You also need to determine the frequency of data replication. Do you need your data mirrored in real time, or is some level of latency acceptable? Your SAN vendor can be a valuable resource in helping you determine the best scenarios for your unique data replication needs.
Basic rules of thumb for real-time mirroring include having link speeds of 1 GB or better, fibre-channel cable lengths less than 200 kilometers between sites, and dedicated links. For distances greater than 200 kilometers, factors to consider include the amount of data being transferred, the bandwidth of the link, and whether or not snapshot technology is being used.
In terms of eDirectory design, while you can have clusters in separate eDirectory trees, the recommended configuration is to have each cluster in the same eDirectory tree. You can greatly simplify cluster administration if you have a separate Organizational Unit (OU) for each geographical cluster (or cluster OU). You should also install all nodes of a cluster into the same container (cluster OU) and place the cluster object in this container as well.
One of your main goals for eDirectory design is to avoid a Novell Directory Services (NDS) Sync state by ensuring direct access to cluster configuration information for each cluster node. You can best accomplish this by partitioning the cluster OU and replicating it to eDirectory servers that hold a replica of the parent partition, and to all the cluster nodes. This will help prevent resources from staying in a Novell Directory Services Sync state when modifications are made to their configurations.
When designing clusters in a Novell Business Continuity Clustering environment, you’ll need a unique configuration for each cluster resource. This means making sure all possible IP addresses and volume IDs of a Business Continuity Clustering-enabled resource are unique across all Business Continuity Clustering peer clusters.
Each cluster must also have a unique name, even if the clusters reside in different eDirectory trees. Note that clusters cannot have the same name as any of the eDirectory trees in the business continuity cluster.
Time to Protect Against Downtime
Every organization’s tolerance for downtime is different, but whether it’s thousands or millions of dollars at stake, every second you’re down costs you. If you can’t risk the lost productivity, sales, customers, partnerships and overall business viability that data center downtime creates, it’s time to take a look at the proactive protection Novell Business Continuity Clustering provides against data center-wide failures.