So what do you do if you're interested in an open source approach to server virtualization but have no direct experience? A common sense approach is to get your proverbial feet wet as quickly and safely as possible by experimenting offline at small scale with key components such as the Xen hypervisor.
This article describes a perfect first project—a demonstration recently documented by Novell engineers that provides an accessible introduction to server virtualization by emulating an enterprise-class storage solution. At tabletop scale it replicates many of the core elements and processes of a Linux-based High Availability Storage Foundation (HASF).
> About HASF
High Availability Storage Foundation is a featured technology in the SUSE Linux Enterprise 10 platform. Using open source components it creates a highly available enterprise-class foundation that protects critical data while lowering costs, simplifying storage management, and most important, keeping your enterprise running.
Figure 1 illustrates a production HASF implementation. (See Figure 1.) The demonstration-scale environment I describe in this article replicates the solution elements found inside the dotted line. Our demo creates the core elements of this environment. When done, a virtual machine will be running in a cluster and will be restarted by that cluster when necessary.
Because of obvious space constraints, the description offered here is a high-level overview of a detailed account from the Novell Linux Technical Library.
- 2 off-the-shelf PCs, each configured with 1GB RAM
- 1 GB/sec Ethernet switch
- Software: SUSE Linux Enterprise Server 10. (Download an evaluation version at novell.com/linux. Alternatively, openSUSE 10.1 can be used, with the understanding that Novell provides no technical support for the free distribution.)
> Step One: Installing SUSE Linux 10 on the Physical Cluster Nodes
To begin, install a nearly identical OS on each PC. Booting from the installation disc, custom partition each hard drive with a swap partition equal in size to the installed RAM, and a root partition. Leave at least 6GB of free, unpartitioned space on each disk.
In addition to the base system, install the following RPMs on each machine: Novell AppArmor, high availability (Heartbeat v2), the GNOME desktop environment, x windows system, print server and Xen virtual machine host server. When the initial installation is complete, follow system prompts to assign hostnames, passwords and static IP addresses.
To ensure that the cluster nodes will recognize each other, use YaST to configure hostname resolution, providing each node with the hostnames, aliases and IP addresses for both machines. Then synchronize time across the cluster nodes by:
- manually synchronizing the date and time settings
- configuring node 1 as a Network Time Protocol (NTP) server, and
- configuring node 2 to use the new NTP server as its time synchronization source.
> Step Two: Setting Up Heartbeat v2 (High Availability Clustering)
The Heartbeat program is a core component of the High-Availability Linux. It performs death-of-node detection, communications and cluster management in one process.
Version 2.0 features a layered architecture. A messaging and infrastructure layer contains components that handle heartbeat messaging. A membership layer synchronizes the shared view of which nodes are members of the cluster. And a resource allocation layer coordinates cluster activity through several key components.
- The Cluster Resource Manager (CRM) acts as the master of ceremonies, monitoring communications between resource allocation components and maintaining the Cluster Information Base (CIB).
- The Cluster Information Base is an in-memory XML representation of the whole cluster configuration and status.
- The Policy Engine (PE) and Transition Engine (TE) compute the actions necessary to implement a change initiated by the DC (PE) and then execute those actions (TE). This component pair only runs on the DC node.
- The Local Resource Manager (LRM) is a service that calls local Resource Agents (RAs) on behalf of the CRM, thus performing start/stop/monitor operations and reporting results to the CRM.
- Resource Agents (RAs) are scripts that have been specifically written to start, stop and monitor certain services.
Two aspects of the demonstration setup vary from standard production practice:
- only one medium for heartbeat messaging transport is configured
- a software/demonstration STONITH device (a power switch used for automatically isolating a failed node) is configured rather than an actual physical switch.
To configure Heartbeat v2, use YaST on node 1 to identify both participating (member) nodes, set (or not) authentication keys, select the heartbeat communication medium and start Heartbeat. Propagate this configuration to node 2 and start Heartbeat there.
> Step Three: Creating a SAN with iSCSI
Because the objective of our demonstration is to enable each node in the cluster to start an identical virtual machine, we'll need a shared storage resource for image and configuration files. By implementing an iSCSI storage device on one node and allowing both nodes to access it we can create a fully functional mini-SAN.
The iSCSI software consists of two parts:
- The target software, which turns a bunch of disk space into a SAN Lun, is sometimes referred to as the iSCSI "server."
- The initiator software logs into the iSCSI target over the network and provides access.
Neither target nor initiator software is installed by default, but YaST will install it for us when needed.
Using YaST, begin by creating a new primary disk partition on node 2, using all remaining free space. Then install the iSCSI target software, configuring it to start on boot, and specifying the path to the Lun. Then install the initiator software on node 1, configuring it to start on boot and to connect with the iSCSI target on node 2.
Still in YaST on node 1, partition the iSCSI Lun on node 2, creating one partition of at least 5GB for the virtual machine image file, a second partition of 500 MB for the virtual machine configuration file, and a third of approximately 100 MB for dedicated virtual machine storage.
When you click "apply," YaST partitions the iSCSI Lun and puts a Reiser file system in the partitions. Now install and configure the initiator software on node 2.
> Step Four: Setting Up Oracle Cluster File System 2
Because SUSE Linux uses a Reiser file system that is not clusteraware, we'll need an alternate file system that provides cache coherence to avoid corrupting the VM image and configuration files as the two cluster nodes move portions of them into cache. Fortunately, the SUSE Linux Enterprise Server 10 distribution includes Oracle Cluster File System 2 (OCFS2), which is included in the initial OS installation; however, OCFS2 has its own heartbeat solution, and must be configured to use membership instructions coming from Heartbeat v2, which in turn must be configured to pass cluster membership information to OCFS2. Accomplish this by creating on each node a new Resource Agent in the resource allocation layer of Heartbeat v2. The new agent will, under the direction of the LRM, notify OCFS2 about changes in cluster membership.
Both nodes can be configured from the OCFS2 console running on one machine. Using the console, provide identification and address information for each node to configure the file cluster, then propagate the configuration to the other node using ssh. Both nodes now have the OCFS2 configuration file, and the stack is running on node1 but will not survive a reboot. The stack is not yet running on node 2.
Using the /etc/init.d/o2cb configure command, configure both OCFS2 stacks to start at boot and switch their heartbeat sources to Heartbeat v2. Finally, use the mkfs.ocfs2 command to create OCFS2 file systems in the two Lun partitions we created for image and configuration files. But before we can mount these file systems on our cluster nodes, we must:
- enable a STONITH device, and
- reconfigure Heartbeat v2 to communicate membership status to OCFS2.
Setting Up the STONITH Framework
The STONITH framework consists of a set of daemons, one running on each node, which communicate among themselves to coordinate node fencing and restoration processes. Message routing and process execution are dependent on a set of XML values stored in the CIB. Configuring STONITH requires the creation of a set of XML blobs that are then loaded into the CIB. One file defines the restart process for a failed resource; a second configures and starts STONITH clones on each cluster node—one each for the imagestore and configstore partitions. (See the full PDF file for complete instructions; this part of the process is quite detailed.)
Finally, the two file systems can be mounted, and should be visible in their respective directories on both nodes. Once "ssh with keys" is set up between the two cluster nodes (allowing one to log on to the other without providing a password), the cluster is ready for crash testing. When the heartbeat process is manually stopped on one node, the cluster will automatically shut it down, reset and reboot it, and remount the image and configuration file systems.
> Step Five: Setting Up a Xen Virtual Machine
The Xen hypervisor runs directly on top of commodity hardware and allows for several virtual operating systems to run on top of it. These VMs are usually called "domains." One domain is the "master" domain—the domain from which all other VMs are managed.
The master domain is usually called "domain 0." Domain 0 behaves very much like a traditional operating system and is required for the other virtual machines to run. It's the domain that will boot on top of Xen when the physical machine is booted.
Xen uses paravirtualization, which means the guest operating systems know they run on virtualized hardware. This design was chosen because it allows for increased performance. The VMs other than domain 0, usually called "user domains," can't access disk and network resources directly, but pass their requests on to domain 0.
Domain 0 handles the requests and sends the retrieved information back to the user domain that requested it. The VMs have specialized disk/net drivers that realize all this. SUSE Linux Enterprise Server 10 ships with Xen and several tools to create and manage VMs. (See Figure 2.)
Because we now want our systems to boot on Xen we'll first need to reconfigure the GRUB bootloader file and reboot both nodes simultaneously, checking to see that both systems have restarted using the Xen kernel. We'll also need to make sure the loop device synch mode is set to "safe" for maximum data protection during write operations.
Creating a Virtual Machine on Node 1
To create a virtual machine on node 1, simply insert the SUSE Linux Enterprise Server 10 installation disk, open YaST, and in "System" click on "Virtual Machine Management." Then click "Add." Start the OS installation program and follow the prompts to prepare the new virtual machine environment.
YaST will automatically open a non-graphical interface to allow installation and configuration. Again, follow the prompts through the normal new installation settings selection. When the process is complete, the new VM will let you log in. If you shut down the VM, you can restart it on either node.
> Step Six: Virtual Machines as Cluster Resources
With our virtual SUSE Linux Enterprise Server and Heartbeat/OCFS2 cluster both installed on our physical server nodes, we can now begin the final integration tasks that will enable high availability of storage services running on the VM.
First, we'll need to load a constraint in the cluster CIB stating that we want the VM to run on node1 when we start it in the cluster. Because we can't simulate a failure on node 2 without taking down the SAN, we must restart the VM. Basically, we create a text file that attributes a very high affinity (value "INFINITY") to node 1 for cluster resource "vm1" and save that file in the CIB. Then we create a second file to define the VM as a cluster resource and specify a monitoring operation. Also load this into the CIB. Finally, because successful VM startup is dependent on availability of the image and configuration stores, we create a final text file that imposes those two load order constraints and save it into the CIB.
> Step Seven: Testing VM High Availability
Now it's time to see all our physical and virtual cluster components in action. By manually killing the heartbeat processes on node 1 we can simulate a system crash. If you've done all the preparations correctly, after a short time node 1 will reboot due to a STONITH operation initiated on the DC. The VM will be migrated to node 2. Node 1 will reboot into the cluster, and after some seconds the imagestore and configstore will be remounted on node 1. The VM will then be migrated back to node 1. And you will have successfully demonstrated high availability of a virtual service on a fully open source solution stack.
> For Real Insights on Open Source Virtualization, Consult Novell
That completes our high-level overview of this HASF demonstration; we hope it's been informative. Clearly, anyone interested in building and operating this test environment should consult the detailed documentation at novell.com/linux/technical_library/has.pdf. It's just one of the many resources Novell is developing to help organizations make successful transitions to open source technologies. Make Novell your trusted advisor, and let us demonstrate how open source innovation can help make your enterprise more successful, secure, responsive and manageable.