Tech Talk 1 by Ken Baker
How Many Nines Do You Need?
The Site-to-Site High Availability of Novell Business Continuity Clustering
- Proof Point O’Reilly Automotive implemented SUSE Linux Enterprise Server in its data center and across more than 1,800 stores. A flexible Linux platform helps the company be more competitive and keep pace with double-digit annual growth.
- Trend Talk Computing in the Cloud is Taking its Rightful Place with Online Hosted Backup and Recovery Services
Almost everyone carries some sort of insurance to protect against unforeseen mishaps; but the protection typically provided is reactive and only comes in the form of cash compensation. If you get in a fender bender, insurance will pay to remove the dents in your car and repaint the damaged area, but it doesn’t cover hidden costs such as the hassle of getting estimates, arranging alternate transportation while the car is in the shop, let alone having to drive a car that just isn’t in the same condition it used to be even after it is repaired.
In the business world those hidden costs can lead to serious consequences for an organization. When a disaster strikes, can you afford to be without mission-critical services for days, hours or even minutes? How will downtime affect your revenue stream? Your customer relationships? Your ability to compete? A cash policy payout provides little, if any, relief in these areas. That’s why more and more organizations invest in the proactive protection that Novell Cluster Services and Novell Business Continuity Clustering provide.
High Availability Basics
If you use Novell Cluster Services (an entitlement for a two-node cluster is included with Novell Open Enterprise Server), you know that if one of your data center’s mission-critical servers fails, the services it hosts will be automatically migrated to another server in the data center within seconds. This provides you and your users with uninterrupted access to critical data and applications. In a Novell Cluster Services cluster, you can have anywhere from two to 32 server nodes participating in a cluster relationship, sharing the same storage resources and working together to ensure uninterrupted service from that data center. Novell Cluster Services keeps the different cluster nodes in that data center constantly aware of the state of other nodes so that in the event of a server failure it can gracefully move services from one node to another.
The reliability of Novell Open Enterprise Server and Novell Cluster Services delivers many customers an unparalleled 99.999 percent uptime; but there are times when even that five-nines availability isn’t enough. For example, what do you do if disaster strikes your entire data center? That’s where Novell Business Continuity Clustering comes in; it let’s you migrate your mission-critical services from one data center to another.
Built on top of Novell Open Enterprise Server and Novell Cluster Services, Business Continuity Clustering provides site-to-site failover of critical workgroup and networking services. Services running on either NetWare or SUSE Linux Enterprise Server in a Novell Open Enterprise Server environment can easily fail over to another cluster in a completely different geographic location. (See Figure 1.) As a result, even if a major catastrophe affects one of your data centers, you can eliminate downtime and ensure that your critical services remain available.
Is Business Continuity Clustering Right for You?
So, when does it make sense to take advantage of the enhanced high availability and disaster recovery that Novell Business Continuity Clustering offers? If you have multiple data centers that have shared Storage Area Network (SAN) storage and already take advantage of Novell Cluster Services, Novell Business Continuity Clustering is a natural next step in your high-availability setup to even protect you from entire data center disasters.
Novell Business Continuity Clustering is also ideal for government agencies, health care organizations, financial service businesses, and any organization that requires uninterrupted access to mission-critical applications and data. If your organization has data centers in environmentally sensitive locations—such as hurricane, tornado and earthquake zones—you should seriously consider taking advantage of the solution, as should any organization that requires full remote failover in the event of a disaster.
When determining whether Novell Business Continuity Clustering is right for you, you need to first understand the high-availability needs of your business. What data and services are necessary for your business to function? What are the interdependencies for those critical services and data?
Next, you need to calculate the costs of downtime associated with those mission-critical services and data. What’s the financial impact of downtime in terms of lost sales, decreased productivity, IT expenses to restore services and any other direct costs? What are the indirect costs, such as reduced revenue because of negative market, customer or partner perceptions? Once you understand the actual costs of downtime, you can start to recognize the benefits of having better than 99.999 percent uptime with Novell Business Continuity Clustering.
How Does Novell Business Continuity Clustering Work?
In an effort to protect their mission-critical data and services against potential natural or man-made disasters, many organizations have built and deployed mirrored data centers that are geographically separated. (See Not Just Disasters.) Unfortunately, setting up and maintaining mirrored data centers is generally a very manual process that takes a great deal of planning and synchronization. Configuration changes have to be carefully planned and replicated. Any mistake in the administration of a redundant site can prevent it from being able to effectively take over in the event of a disaster. By contrast, Novell Business Continuity Clustering works in conjunction with the mirroring capability of your SAN to automate cluster configuration, maintenance and synchronization. (See Figure 2.)
Novell Business Continuity Clustering utilizes a “cluster of clusters” infrastructure. Each geographically separated data center has its own independent cluster.
Each of these independent clusters are treated as "nodes" in a larger cluster, allowing a whole site to fail over to a different data center site. Novell Business Continuity Clustering automates this failover process by leveraging Novell eDirectory and policy-based management of cluster resources and storage systems.
For example, say you have a clustered data center in New York and another in Boston. Your data center SAN in New York keeps a mirrored copy of all your business data stored on the Boston SAN, and vice versa. These mirrors are kept in synch using your SAN mirroring software. The role of Novell Business Continuity Clustering is to keep each cluster aware of the status of the others. It uses eDirectory to store resource and configuration information of each cluster, automatically transferring that information between or within directory trees as needed.
So, because the New York cluster knows everything about the Boston cluster, if the entire Boston cluster goes down, the New York cluster will know how to properly fail over all the Boston resources. When needed, a single click will automatically migrate and load these resources onto your New York cluster. See Figure 3.) (See Coming Soon: Preferred Clusters.) It’s important to note that this process does not migrate the business data. That data should already exist as mirrored secondary storage on the SAN. Instead, it’s migrating the cluster resources, such as the applications or services that were being hosted by the cluster in the Boston data center.
Novell Business Continuity Clustering Competitive Advantages
Rather than taking the “cluster of clusters” approach employed by Novell Business Continuity Clustering, most competing offerings use stretch clusters. A stretch cluster is basically one large cluster that has nodes participating in varied geographical locations. So, instead of having a cluster of clusters comprised of a four-node cluster in New York and a separate four-node cluster in Boston, a stretch cluster would simply be one eight-node cluster with four nodes in New York and four nodes in Boston.
While this might appear simpler for migrating resources between geographical locations, it introduces some significant problems due to the inherent latency that will exist between the geographically separated cluster nodes.
The first casualty of stretch clusters is reliability. Due to the latency between clustered nodes, your cluster cannot detect unavailability of a resource as quickly. Also due to the distances, split-brain scenarios are more likely to occur. You simply cannot get the cluster reliability like you can in a Novell Cluster Services and Novell Business Continuity Clustering environment. The stretch cluster only provides disaster recovery benefits and cannot deliver productivity enhancements. In fact, in most cases it hampers the performance capabilities and reliability of the cluster.
The other major disadvantage associated with stretch cluster latency is the effect on heartbeat responses and maintenance. For example, you might configure your Novell Cluster Services environment to automatically fail over a node’s resources if it goes for eight seconds without issuing a heartbeat. If you configure a stretch cluster with an eight-second heartbeat response, you’ll have servers frequently and unnecessarily failing over due to the normal amount of packets that get dropped over WAN connections.
As a result, you’re more likely to lengthen your heartbeat settings to account for the inherent latency. This means that when you do have a failed server, it will take more time for the cluster to recognize it and begin the failover process. Not only does this lengthen your downtime, but the delay can create split-brain situations where more than one node believes it is the primary server, leading to divergent sets of data that require significant effort to correct.
When comparing the different approaches, remember that Novell Business Continuity Clustering not only protects your business systems against disaster, but it also simplifies cluster maintenance and enables you to fully utilize the processing power of your cluster investment, so you can also utilize your mirrored remote site clusters for regular production usage. While mirrored clusters in other vendors' solutions are only used in the event of a disaster, Novell enables you to use the latent processing power of your investment, even if a disaster is never encountered. Also, Novell Business Continuity Clustering is the only solution to provide you a complete site-to-site failover solution for your entire Novell workgroup infrastructure.