Installing SLES11 SP1 x64 HA extension
Configure shared storage
Configure HA cluster
Configure HA resource
Configure stonith resource
Configure LVM resources
Configure file system resource
Configure IP resource
Install and configure Sentinel
The basic idea of this solution is to install Sentinel into a shared storage (SAN) so that any machine in our cluster can easily mount this as a directory and has everything needed to run that instance of Sentinel. The solution provides a fallback mechanism that allow another machine to take over the sentinel server in case the current machine running it is offline for some reason. Keeping that in mind, we should not expect most of the active operations to be kept after the current node suddenly went down. What we have instead is a highly available sentinel server that will continuously listen to events and perform tasks that it was configured with. The structure of this cluster is similar to another Cool Solution article written by Jan Kalcic. Many thanks for the excellent write up!
For the first part, we will mostly use Yast to setup most of the components of the cluster software stack. Since Yast provides a ncurse interface. You won't need to have access to the graphical UI of your cluster's machines. However, the second part will use a GUI-only application, crm_gui, for configuring cluster's resources. The equivalent command line application is crm which can also be used. For more information on crm, please refer to the official Novell HA Extension’s documentation.
For a production-level Linux HA solution with shared storage, it is recommended to implement a fencing mechanism into the cluster. The idea is that one shared storage should be accessed (write to) by one node at a time. If something causes a communication error and another node tries to write to the shared storage, data on the shared storage will be corrupted. Shared storage needs to be protected/fenced when this happens. There are different ways to implement fencing and stonith is one of the methods supported by SLES. In this tutorial, I will cover implementing a stonith resource using Split Brain Detector (SBD). Read more about SBD in the documentation.
1. Two machine running SLES11 SP1 x64.
2. SLES11 SP1 x64 High Availability extension iso image file.
3. For shared storage, I will employ another SLES11 SP1 x64 machine to provide iSCSI targets. Again in production environment, the shared storage will also have to be HA just as the cluster. This HA solution doesn't include HA storage solution. Please discuss with your storage administrator first about how to set this up!
4. Four static IPs:
4.1. Two static IPs for each of the node.
4.2. One static IP for the iSCSI machine.
4.3. One static IP for the cluster, this will be assigned dynamically to the node currently running Sentinel.
5. Four host names:
5.1. Two host names for each of the node (for this tutorial, I will use "node01," and "node02")
5.2. One host name for the iSCSI machine (I will use "iSCSI")
5.3. One host name for the cluster (I will use "cluster")
6. Please contact your network administrator about the usage of static IPs and host names in your company's network environment!
Install HA extension:
1. Download from: http://download.novell.com/Download?buildid=9xvsJD...
2. You will need an Novell account.
3. Download iso file to each machine.
4. Go to Yast, Add-on products, Add.
5. Select local ISO Image.
6. Browse to iso image.
7. "Software selection and system tasks" window appear: click on "High Availability" check box.
8. Configure your HA extension subscription if you already have one.
9. Do the same thing for the other machine.
10. Set host name. Go to Yast, Network Settings.
11. Go to Hostname/DNS tab.
12. In "Hostname" text-box: node01.
13. Do the same thing for the other machine and the iSCSI machine as well..
14. Check for connection between our machines:
15. From each machine check to see if you can ping the other two using their host names.
16. If this failed, you may want to contact your network administrator.
Though in the mean time, we can still continue our work by modifying the local /etc/hosts file to resolve host names in our cluster to specific IPs. You can either directly edit /etc/hosts or using Yast -> Hostnames. Either way, your /etc/hostnames file should contains these lines:
Configure shared storage:
The iscsi machine will provide 2 block devices. One shared block device which will be use as a SBD device. The SBD device only require 1 MB. The other block device is used to store sentinel installation. We can point our iSCSI resource (LUN or Logical Unit Number) to any file or block device in our machine. The easy way is to create a file with all zero using the dd command and copying from /dev/zero, which obviously generates a infinite stream of 0. These two commands will create a 1 MB file and a 10 GB file. Change the value of count to create a file of your desired size.
dd if=/dev/zero of=/1024 count=1024 bs=1024
dd if=/dev/zero of=/sharedrive count=10240000 bs=1024
1. Go to Yast, iSCSI Target.
2. Install the required software.
3. Set service to start on booting.
4. Go to Global tab and disable authentication. This is because SLES-HA resource agent for iSCSI doesn't support authentication.
5. Go to Target tab. Add a target and accept the auto-generated name.
6. Add a LUN and point it to /sbd
7. Add another LUN and point it to /sharedrive.
Now we will mount it on the same machine and format our LUN as a ext3 partition:
1. Go to Yast, iSCSI Target.
2. Install the required software.
3. Set service to start manually.
4. Go to Discovered Targets tab and click Discovery.
5. Enter the IP address of iscsi host and press Next.
6. Click on the iSCSI target and login. Switch to Connected Targets to verify that we had login to the share storage.
7. Go to Yast, Partitioner.
8. Click + on the left of Hard Disks to expand. Select the new iSCSI disk with no partition.
9. Click add to add a new partition. Format the new partition as ext3 but do not mount the partition.
Configure HA cluster:
1. On node01: go to Yast, Cluster.
2. As this is our first time, we will be presented with a wizard window.
3. Bind network address: the network address of your cluster. Ours will be: 10.0.0.0
4. Multicast address: this multicast IP is needed to provide multicast communication for nodes in our cluster. Consult with your network administrator about what value you are allowed to use. For our purpose, we'll use "188.8.131.52" with port "694."
5. Redundant channel: if there is another networking channel available.
6. Check "Auto generate Note ID" then click "Next."
7. Check "Enable Security Auth" and click on "Generate Auth Key File." This will create an authentication key that allow other nodes to join your cluster. The key is store in /etc/corosync/authkey. We will need to copy this file to the other node later.
8. Check "On--Start openais at booting" and click "Start openais Now."
9. Make sure is that "Enable mgmtd..." is checked to allow the cluster to be managed by crm_gui.
10. On the sync host panel, we'll add hostnames of the cluster's nodes by clicking add.
11. Click "Generate Pre-Shared-Keys." This key is needed for syncing configuration file between nodes and we will also have to copy it to the other node. The key file is stored in /etc/csync2/key_hagroup.
12. On the sync file panel, click "Add Suggested Files" to automatically generate a list of common file to sync between nodes.
13. Click "Turn csync2 ON" then click "Next."
14. Now, the hacluster user should be created. Go to Yast, User and Group Management.
15. Set Filter to System Users. Click on hacluster user, then click on edit. Change the password and press OK. When configuring node02, we will also use this password for the hacluster user.
16. Now we want to copy configuration files and authentication key to the other node. This can be done using the scp command.
16.1. scp /etc/corosync/corosync.conf node02:/etc/corosync/corosync.conf
16.2. scp /etc/corosync/authkey node02:/etc/corosync/authkey
16.3. scp /etc/csync2/csync2.cfg node02:/etc/csync2/csync2.cfg
16.4. scp /etc/csync2/key_hagroup node02:/etc/csync2/key_hagroup
17. Install open iscsi-client on cluster's node
18. Go to Yast, iSCSI Initiator. Install the open-iscsi package. Set "Service Start" as "When Booting."
19. Go to Discovered Targets tab. Click Discovery. Enter iSCSI host's ip address.
20. Login to the iSCSI target. Set Login as automatic. If login's successfulled, the login column will report true.
21. We may as well create the directory for mounting the shared storage: mkdir -p /opt/novell/sentinel
1. Now we will go to node02:
2. Go to Yast, Cluster.
3. We won't be presented with the wizard window because the configuration file is already copied over.
4. Click on Service tab. Check On -- Start openais at booting then click on Start openais Now.
5. Click on Configure Csync2 tab. Click on Turn csync2 ON then click Finish.
6. Again, we'll go to Yast, User and Group Management to set the password for hacluster user.
7. Install open iscsi-client on cluster's node
8. Go to Yast, iSCSI Initiator. Install the open-iscsi package. Set "Service Start" as "When Booting.
9. Go to Discovered Targets tab. Click Discovery. Enter iSCSI host's ip address.
10. Login to the iSCSI target. Set Login as automatic. If login's successful, the login column will report true.
11. Again, remember to create the shared storage directory: mkdir -p /opt/novell/sentinel.
Our cluster should be up and running now. Enter crm_mon to the command line to see if it is. We should get something like this:
Last updated: Fri Aug 5 16:38:36 2011
Current DC: node01 - partition with quorum
2 Nodes configured, 2 expected votes
0 Resources configured.
Online: [ node01 node02 ]
Configure HA resource:
In this section we'll configure individual resource for our cluster. An resource is a service/application that is monitored by the cluster. All of our resource will be monitored by the cluster software stack so that if they stop running for any reason, the cluster will notice and start up the exact resource on the other node thus providing high availability.
1. From the command line, enter crm_gui.
2. Click Connection menu, Login. We should be able to login using the IP address of either node or the cluster's IP after it had been setup.
3. Click CRM Config tab.
4. Change Default Resource Stickiness to a positive value (1). This makes all the resources in the cluster to prefer to remain in the current location.
5. Change No Quorum Policy to ignore. Since our cluster consists of 2 node, losing a node is the same as losing quorum in the cluster. In this case, we want the cluster to keep going instead of shutting down the entire cluster.
6. Click Apply.
Configure stonith resource:
1. Use the command sbd -d /dev/sbd create to initialize the SBD device. Substitute /dev/sbd with the 1 MB block device provided by the iSCSI host. This can be done in any node.
2. Type sbd -d /dev/sbd dump to check what has been written to the device. Something like this should be displayed:
Header version : 2
Number of slots : 255
Sector size : 512
Timeout (watchdog) : 5
Timeout (allocate) : 2
Timeout (loop) : 1
Timeout (msgwait) : 10
3. The SBD daemon must be started before and stopped after the cluster software stack. This is because it constantly monitoring the stage of the cluster. To do this, create the file /etc/sysconfig/sbd with the followsing content:
4. Copy this file over to node02 using scp /etc/sysconfig/sbd node02:/etc/sysconfig/sbd..
Type in sbd -d /dev/sbd allocate node01 to allocate a slot in the SBD device to node01.
5. Type rcopenais restart to restart openais. A message will be displayed saying that SBD is starting.
6. Switch to node02, type sbd -d /dev/sbd allocate node02 to allocate node02.
7. Again, type rcopenais restart.
8. Go to crm_gui, Resource tab. Add a new primitive as follows:
8.1. ID: stonith_sbd
8.2. Class: stonith
8.3. Type: external/sbd
8.4. Attribute sbd_device: /dev/sbd.
9. Go to Management tab and start the stonith_sbd primitive.
LVM gives great flexibility to manage storage because it allows partitions and block devices to be managed dynamically (resizing and replacing). The addition of cLVM extension (Clustered LVM) allows LVM to operate in a cluster environment. First, we will set up a cLVM resource to start clvmd in every node. This requires the resource to be of type clone instead of primitive.
1. Go to crm_gui, Resources tab. Add a new clone resource named base-group-clone.
2. On Group tab, add a new group named base-group.
3. On Primitive tab, add a new primitive as follows:
3.1. ID: control
3.2. Class: ocf
3.3. Provider: pacemaker
3.4. Type: controld
4. Add another primitive:
4.1. ID: clvm
4.2. Class: lvm2
4.3. Type: clvmd
4.4. Instance Attributes: set daemon_timeout to 30
5. Hit Apply then Cancel when asked to add another primitive or group.
6. Go to Management type and start the base-group-clone resource. The clone resource will be started on both nodes.
7. Now we need to create create a LVM configuration on one node. cLVM will take care of distributing LVM config to the other node.
8. Go to Yast, Partitioner on node01.
9. Go to Volume Management tab. Add a new Volume Group named clustervg.
10. Select the bigger block device after the SBD device to add to the volume group.
11. Expand Volume Management tab, click on clustervp and add a new Logical Volume named clusterlv.
12. Use all available space and format the logical volume as ext. Check to not mount it.
13. Go back to crm_gui, Resouce tab. Add a new group named sentinel.
14. Click OK to add a new primitive:
14.1. ID: LVM
14.2. Class: ocf
14.3. Provider: heartbeat
14.4. Type: LVM
14.5. Instance Attributes: volgrpname is set to clustervg
14.6. Add another attribute: exclusive with the value of true. This will make the volume group available to only one node at a time.
15. Go to Management tab and start the sentinel group. Whichever node running the LVM resource will have a block device /dev/clustervg/clusterlv and this will be the partition to install Sentinel onto.
Configure file system resource:
1. Go back to the sentinel group and add another primitive resource:
1.1. ID: sentinelfs
1.2. Class: ocf
1.3. Provider: heartbeat
1.4. Type: Filesystem
1.5. Initial state of resource: default
1.6. Add monitor operation: checked.
2. Instance Attributes:
3. Go to Operations tab add an operation:
3.1. Name: start
3.2. Timeout: 60
3.3. Optional/Start Delay: 5
This will cause the resource to wait 5 seconds after the previous resource has started. This is because the iscsi drive doesn't seem to appear immediately. Go back to Management tab and start sentinelfs.
Configure IP resource:
1. Click Resources tab, edit sentinel group.
2. On Primitive tab add a new primitive:
3. Create clusterip resource as follows:
3.1. ID: clusterip
3.2. Class: ofc
3.3. Provider: heartbeat
3.4. Type: IPaddr
3.5. Initial state of resource: Default to "Started" or inherit from its parent
3.6. Add monitor operation is checked
4. On Instance Attribute tab, click on ip. Click Edit and type in our cluster IP 10.0.0.5 for Value.
5. Click Apply then click Cancel since we don't need to add another primitive for now.
6. Click Apply again. The sentinel group will appear on the screen now.
7. Click Management tab. We will see sentinel group and clusterip resource listed.
8. Click on sentinel group and click the start button to run it. The clusterip is shown as running. Trying pinging the cluster's IP, 10.0.0.5, or the cluster's hostname, cluster, to double check.
Install and configure Sentinel:
In this step, we will install sentinel onto the shared storage. This should be performed on whichever node that is currently running sentinel group since the shared storage is mounted there. We'll assume this will be node01.
1. First, download sentinel package onto node01.
2. We will use the --location parameter to tell the installer to set /opt/novell/sentinel as the root(/) directory for the installation. The general directory structure of sentinel is as follows:
2.1. /opt/novell/sentinel: executables and libraries.
2.2. /var/opt/novell/sentinel/data: data files.
2.3. /var/opt/novell/sentinel/log: log files are in the directory .
2.4. /var/run/sentinel/server.pid: the process ID (PID) file.
2.5. /etc/opt/novell/sentinel: configuration files.
2.6. /usr/bin and /usr/share: other binaries
3. So if we tell the installer to use /opt/novell/sentinel as the root directory (/), it will go ahead and create all those directory above inside /opt/novell/sentinel. The command to install Sentinel now should be: ./install-sentinel - -location=/opt/novell/sentinel/. After the installation, we should see: "etc opt usr var" the command ls /opt/novell/sentinel.
4. Verify that sentinel was installed and run successfully. We also want to setup any collector/connector or do any configuration that we want right now. The turn off sentinel server (service sentinel stop).
5. From the command line, type in chkconfig --del sentinel. This will delete sentinel server from the list of services that are started when booting. This is because the cluster will take care of starting and stopping sentinel itself.
6. After sentinel's stop, copying over the novell user's home directory from node01. This can be done using the scp command (scp -pr /home/novell/ node02:/home/).
7. Also, copy over the init script (scp /etc/init.d/sentinel node02:/etc/init.d/sentinel)
8. Type into the command line: grep novell /etc/passwd. Note down the output (something like this: novell:x:108:1000::/home/novell:/bin/bash) . This line add the novell user to the system which we will use for node02.
1. Switching to node02:
2. In this step, we will switch the shared storage to node02 and create a novell user so that we can run sentinel from the share storage on node02. To do this, append /etc/passwd with the line novell:x:108:1000::/home/novell:/bin/bash.
3. Enter vi /etc/passwd. Press “a” to start editing the text file. Add the line for novell user to the end of the file then press “esc,” then “:qw” and enter to save the file.
4. The home directory for novell user was already copied over. Change the owner of the home directory to novell user with this command: chown -R novell: /home/novell.
5. In crm_gui, do to Management tab, right click on sentinel group, Migrate Resource and migrate to node02. Check to see if the shared storage is mounted on node02.
6. From the command line, type /etc/init.d/sentinel start. This will start the sentinel server on node02. Check and make sure everything is running properly and all the configurations are preserved. Stop sentinel server using /etc/init.d/sentinel stop.
7. Now we are ready to add a primitive for sentinel server.
1. Get sentinel server resource agent here. (The resource agent script cannot be published atm. It will be added to the article in a later day).
2. Create a directory on node01 for the resource agent (mkdir -p /usr/lib/ocf/resource.d/sentinel)
3. Put this script on node01 as /usr/lib/ocf/resource.d/sentinel/sentinelserver.
4. Give it executable priviledge with chmod +x /usr/lib/ocf/resource.d/sentinel/sentinelserver.
5. Create the same directory on node02: mkdir -p /usr/lib/ocf/resource.d/sentinel. Copy resource agent script to node02: scp /usr/lib/ocf/resource.d/sentinel/sentinelserver node02:/usr/lib/ocf/resource.d/sentinel/sentinelserver
6. You may need to disconnect and reconnect in crm_gui for the sentinelserver resource agent to be available.
7. Go to crm_gui again and add a sentinelserver primitive to sentinel group:
7.1. ID: sentinelserver
7.2. Class: ocf
7.3. Provider: sentinel
7.4. Type: sentinelserver
7.5. Initial state of resource: default
7.6. Add monitor operation: checked
8. Go to Operations tab add a start operation:
8.1. Name: start
8.2. Timeout: 300
9. And a stop operation:
9.1. Name: stop
9.2. Timeout: 300
Now go to Management tab and start the sentinelserver primitive. After a couple of minutes, it will be showed as running. Right click on the sentinelgroup and try to migrate onto another node to test if fallback is working.
Disclaimer: As with everything else at Cool Solutions, this content is definitely not supported by Novell (so don't even think of calling Support if you try something and it blows up).
It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.