Novell Home

Software RAID: Beyond YAST

Novell Cool Solutions: Feature
By Kirk Coombs

Digg This - Slashdot This

Posted: 13 Oct 2005
 

Applies To:

  • SUSE Linux Enterprise Server

Purpose

SUSE Linux Enterprise Server (SLES) provides an easy mechanism to create software RAID arrays with YaST. After the initial creation, however, YaST provides no tools to manage the RAID array. Functions such as monitoring the array and managing failed disks must be done manually. This article provides a guide for creating RAID arrays with YaST and managing them with the provided command-line tools.

What is RAID?

Important Note: RAID is not a replacement for an effective backup strategy. It offers, at most, redundant copies of data to protect from hardware failures. It does not offer any protection against accidental deletion, viruses, hackers, etc.

RAID, or Redundant Array of Inexpensive Disks, is a system which allows two or more hard disks to be combined to increase storage space, use redundancy to protect against hardware failures, or both--while increasing overall disk performance. Traditionally RAID has been accomplished through custom hardware RAID controllers using SCSI disks. With SLES the same functionality can be achieved through software. These software RAID configurations can use a combination IDE, SCSI, or SATA disks and perform almost as well as hardware implementations.

A RAID array can be created in several configurations depending on the desired result: increased storage space, redundancy, or both. These basic configurations are called levels. Table 1 outlines the most common levels:

Table 1: RAID Levels

Level Description
RAID-0

RAID-0 is used to combine two or more disks into a single, larger RAID device. Data is written in small clusters to multiple devices. Thus, a file exists in pieces on multiple disks. This means that the disks are used in parallel to read and write a file, which leads to better performance than a single disk could provide.

RAID-0 does not implement any redundancy. If a single disks fails, the whole RAID device is corrupted.

RAID-1

RAID-1 is a purely redundant mode. Two or more disks of roughly the same size are combined to create an array which contains the capacity of the smallest of the disks. When data is written to the array, it is written identically to all disks. As long as all disks in the array do not fail the data is always intact.

Write performance is worse than it would be with a single drive because the same data must be written to each disk, and this data must be sent multiple times over the bus. Read performance is better than a single disk. The data are only read from one disk at a time, but it is always the disk whose head was closer the the data to begin with. This reduces seek time, which is the most costly operation in accessing data on a disk.

RAID-5

RAID-5 combines the previous levels, and requires three or more disks of roughly the same size. The size of the array is dependent on the size of the smallest disk, S. If there are N disks, the overall size is (N-1)*S. The 'wasted' space is used to store parity information. In a RAID-5 configuration a single disk can fail and the array is fine. If more than one disk fails, however, the entire array is corrupted.

Performance is generally better than a single disk would be, but is hard to predict. It depends largely on the array usage patterns and available memory in the system.

Levels 1 and 5 also allow for spare disks. These disks are not used in the array unless a disk in the array fails. When this happens, the spare disk is automatically used in place of the bad disk. When the bad disk is replaced, it becomes the spare.

The RAID Tools and Configuration

Several tools and files are used to create, monitor, and manage RAID arrays. Some of the most common are listed in Table 2. Note this list is not comprehensive, but is sufficient to mange most RAID tasks.

Table 2: RAID Tools

Tool Description
YaST

RAID arrays can be created with the Partitioner module in YaST.

/proc/mdstat

The RAID drivers are part of the Linux kernel. The file, /proc/mdstat, is maintained by these drivers to provide information about currently running arrays. Examples of the contents of /proc/mdstat are given in later examples.

/etc/raidtab

This file is used by the mkraid command to specify how a RAID array should be configured. If the array was created through YaST this file is automatically generated.

mkraid

This tool is part of the raidtools package. It is used to create and start a RAID array based on the configuration in /etc/raidtab.

Following is an example of its use:

  • mkraid -R /dev/md0 (Create the array /dev/md0 using the data in /etc/raidtab. The -R flag forces the operation if the array already exists and the operation can potentially destroy data.)

raid[stop|start]

These tools are part of the raidtools package. They are used to stop or start a RAID array.

Following are examples of their use:

  • raidstop /dev/md0 (stop /dev/md0)

  • raidstart /dev/md0 (start /dev/md0)

raidhot[add|remove]

These tools are part of the raidtools package. They are used to add or remove a disk from a RAID array.

Following are examples of their use:

  • raidhotremove /dev/sda1 /dev/md0 (hot remove the partition /dev/sda1 from /dev/md0)

  • raidhotadd /dev/sda1 /dev/md0 (hot add the partition /dev/sda1 to /dev/md0)

mdadm

This tool duplicates the functionality of the tools in raidtools. It has a uniform syntax for all functions, and can be used for monitoring, adding or removing devices, and simulating device failures.

Following are some examples of the most common mdadm commands:

  • mdadm -D /dev/md0 (list the details about /dev/md0)

  • mdadm -f /dev/md0 /dev/sda1 (mark the partition /dev/sda1 as failed in /dev/md0)

  • mdadm -r /dev/md0 /dev/sda1 (hot remove the partition /dev/sda1 from /dev/md0)

  • mdadm -a /dev/md0 /dev/sda1 (hot add the partition /dev/sda1 to /dev/md0)

A Note on Mount Points

The /boot mount point should never be placed within a RAID array. While it is possible that a system will work fine with /boot under a RAID array, chances are good that it will not. All other mount points, including SWAP, are okay to place in a RAID array.

Examples

The sample system used for all examples contains the following four disks:

Table 3: Disks

Device Size  Use
/dev/sda 1.5 GB swap, root
/dev/sdb 900 MB RAID
/dev/sdc 1 GB RAID
/dev/sdd 1 GB RAID
/dev/sde 1 GB RAID

SLES is already installed on /dev/sda and the extra four disks are to be configured for a data storage array. Notice that one of the RAID disks is not the same size as the others. It will be shown how this effects the various RAID levels.

For all RAID configurations, the first step is to partition the disks for RAID.

  1. Enter the YaST Partitioner module by selecting System > Partitioner, or by entering yast disk as root.

  2. Select Create.

  3. Select the desired disk (in this case /dev/sdb).

  4. Select Primary Partition.

  5. Select Do not format, then change the File system ID to 0xFD Linux RAID.

  6. By default, the entire disk should be selected.

  7. Repeat steps 2-6 for all the RAID disks.

In this example, the partition configuration looks like this:

Figure 1: Partition Configuration

partitioning

Now the system is ready to configure a RAID array.

  1. Select RAID.

  2. Select Create RAID.

  3. The RAID Wizard begins. Select which RAID level to create. Multipathing requires special hardware and is not covered here.

    Figure 2: RAID Level

    RAID Level
  4. A new RAID device /dev/md0 (md stands for multiple disk) is created. Step 2 of the RAID Wizard lists the type 0xFD partitions and allows them to be added to the device. The column indicating which devices are part of the array cannot be seen in the ncurses version of YaST. Scroll right to see it. As partitions are added, the total new size of the RAID device is shown in the upper right. For these examples, select all the partitions.

    Figure 3: RAID Partitions

    add part
  5. Step 3 of the wizard allows the new device to be formatted, and any extra settings for the array to be configured. 

    • The RAID Type is automatically filled based on the previous selection. 

    • Chunk size in KB is set to a default value based on the RAID Type. This can be customized to increase performance.

    • Parity algorithm is only available for RAID-5 configurations and is set the the optimal algorithm for disks with rotating platters, which constitutes most devices. 

    • Make sure that Persistent superblock is checked. This stores the RAID configuration within the array.

    • Format the array as Reiser, and make sure to change the mount point (in this case /data).

    Figure 4: Expert Settings

    expert
  6. After selecting Finish the partition table is listed again. This time a new device exists: /dev/md0. This is the RAID array. To commit the settings, select Apply. The array settings are written to /etc/raidtab, as well as stored in the persistent superblock. The array is formatted and mounted.

At this point the desired RAID array is up and running. From this point on, management of the array falls on the command-line tools.

Example 1: RAID-0

The RAID-0 level is very straightforward, it is simply a combination of the storage space of all its devices. Begin by verifying the size:

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda2             1.3G  437M  842M  35% /
tmpfs                 125M  8.0K  125M   1% /dev/shm
/dev/md0              3.9G   33M  3.9G   1% /data

Notice that the array is 3.9 GB, or the size of its constituent disks.

Now take a look at the array settings in /etc/raidtab:

# cat /etc/raidtab
# autogenerated /etc/raidtab by YaST2

raiddev /dev/md0
raid-level 0
nr-raid-disks 4
persistent-superblock 1
chunk-size 32
device /dev/sdb1
raid-disk 0
device /dev/sdc1
raid-disk 1
device /dev/sdd1
raid-disk 2
device /dev/sde1
raid-disk 3

Notice that all the RAID settings specified in YaST exist in this file. It lists the raid level, the number of disks, whether to have a persistent superblock, the data chunk size in KB, and the constituent disks. This file gets only slightly more complex as other RAID configurations are used.

Additional information about the array is found by probing the /proc/mdstat file:

# cat /proc/mdstat
Personalities : [raid0]
md0 : active raid0 sde1[3] sdd1[2] sdc1[1] sdb1[0]
4071936 blocks 32k chunks

unused devices: <none>

More detailed information is given by the mdadm command:

# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Wed Oct 12 05:09:10 2005
Raid Level : raid0
Array Size : 4071936 (3.88 GiB 4.17 GB)
Raid Devices : 4
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Oct 12 05:09:10 2005
State : clean, no-errors
Active Devices : 4
Working Devices : 4
Failed Devices : 0
Spare Devices : 0

Chunk Size : 32K

Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 49 2 active sync /dev/sdd1
3 8 65 3 active sync /dev/sde1
UUID : 57e04693:b948ff5f:0d6a8b8a:ee57653b
Events : 0.1

Notice that both commands list all four devices as active, with no apparent problems. There are no spare devices because RAID-0 does not support redundancy. If a device is removed, the whole array is corrupted. Let's test this.

Remove a device from the array by using the raidhotremove command (mdadm could be used as well):

# raidhotremove /dev/md0 /dev/sdc1
/dev/md0: can not hot-remove disk: disk busy!

It makes sense that a disk cannot be removed from a RAID-0 array as this corrupts the data. For more interesting results, another level is needed.

Example 2: RAID-1

Recall that RAID-1 simply mirrors the data to all disks in the array, except the spare disks. This means that the size should be equal to the smallest of the disks.

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 1.3G 437M 842M 35% /
tmpfs 125M 8.0K 125M 1% /dev/shm
/dev/md0 918M 33M 886M 4% /data

Notice that the array is 918 MB, the size of the smallest of the four disks, as expected.

Take a look at the hard drive lights for the RAID disks. They are probably steadily lit. If no data is being accessed on the drives right now, why would they be so active? To find out, look at /proc/mdstat:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sde1[3] sdd1[2] sdc1[1] sdb1[0]
939648 blocks [4/4] [UUUU]
[=>...................] resync = 5.4% (51456/939648) finish=5.1min speed=2858K/sec
unused devices: <none>

Notice that the devices are currently syncing with each other. The mdadm command should show a similar result. This sync is done in the background and should not significantly effect system performance. Feel free to make any file system changes while this sync is running incluing formatting. When the sync is complete the output should be:

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sde1[3] sdd1[2] sdc1[1] sdb1[0]
939648 blocks [4/4] [UUUU]

unused devices: <none>

Notice that all four devices are being used ([4/4]), and they are all up ([UUUU]).

It may seem like an overkill to devote three disks to full-time backups of this data. Constantly backing up the data only adds wear to the redundant devices. It is probably better to use at least one of these devices as a spare. Look at the configuration for the array:

# cat /etc/raidtab
# autogenerated /etc/raidtab by YaST2

raiddev /dev/md0
raid-level 1
nr-raid-disks 4
nr-spare-disks 0
persistent-superblock 1
chunk-size 4
device /dev/sdb1
raid-disk 0
device /dev/sdc1
raid-disk 1
device /dev/sdd1
raid-disk 2
device /dev/sde1
raid-disk 3

Two of the devices can easily be made into spares by changing /etc/raidtab and rebuilding the array.

# vi /etc/raidtab
(make changes to file)

# cat /etc/raidtab
# autogenerated /etc/raidtab by YaST2
# Modified later to add spare disks

raiddev /dev/md0
raid-level 1
nr-raid-disks 2 <- Changed
nr-spare-disks 2 <- Changed
persistent-superblock 1
chunk-size 4
device /dev/sdb1
raid-disk 0
device /dev/sdc1
raid-disk 1
device /dev/sdd1
spare-disk 0 <- Changed
device /dev/sde1
spare-disk 1 <- Changed

# umount /data

# raidstop /dev/md0

# mkraid -R /dev/md0
DESTROYING the contents of /dev/md0 in 5 seconds, Ctrl-C if unsure!
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sdb1, 939771kB, raid superblock at 939648kb
disk 1: /dev/sdc1, 1044193kB, raid superblock at 1044096kB
disk 2: /dev/sdd1, 1044193kB, raid superblock at 1044096kB
disk 3: /dev/sde1, 1044193kB, raid superblock at 1044096kB

# fsck.reiserfs /dev/md0
(make sure file system is okay)

# mount /data

Note that mkraid claims the data will be destroyed, but in this case it is not. However, it is always a good idea to make a backup first!

Take another look at /proc/mdstat. Notice that there are four devices listed as available, but only two are used.

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sde1[3] sdd1[2] sdc1[1] sdb1[0]
939648 blocks [2/2] [UU]

unused devices: <none>

The mdadm command gives a bit more information:

# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Wed Oct 12 05:08:03 2005
Raid Level : raid1
Array Size : 939648 (917.63 MiB 962.20 MB)
Device Size : 939648 (917.63 MiB 962.20 MB)
Raid Devices : 2
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Oct 12 05:13:07 2005
State : dirty, no-errors
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2


Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 8 33 1 active sync /dev/sdc1
2 8 65 -1 spare /dev/sde1
3 8 49 -1 spare /dev/sdd1
UUID : 4c53404d:ab97d3b6:9b5e8a88:360b1304
Events : 0.8

Now the used and spare devices are clearly shown.

With the spare disks configured, see what happens if one of the primary disks fails.

# mdadm -f /dev/md0 /dev/sdb1
mdadm: set /dev/sdb1 faulty in dev/md0

# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Wed Oct 12 05:08:03 2005
Raid Level : raid1
Array Size : 939648 (917.63 MiB 962.20 MB)
Device Size : 939648 (917.63 MiB 962.20 MB)
Raid Devices : 2
Total Devices : 4
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Wed Oct 12 05:37:31 2005
State : dirty, no-errors
Active Devices : 1
Working Devices : 3
Failed Devices : 1
Spare Devices : 2


Rebuild Status : 10% complete

Number Major Minor RaidDevice State
0 0 0 -1 removed
1 8 33 1 active sync /dev/sdc1
2 8 65 0 spare /dev/sde1
3 8 49 -1 spare /dev/sdd1
4 8 17 -1 faulty /dev/sdb1
UUID : 4c53404d:ab97d3b6:9b5e8a88:360b1304
Events : 0.9

# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sde1[2] sdd1[3] sdc1[1] sdb1[4](F)
939648 blocks [2/1] [_U]
[==========>..........] recovery = 50.9% (479616/939648) finish=7.3min speed=1045K/sec
unused devices: <none>

The array is rebuilding and /dev/sdb1 has been marked faulty and removed. Notice that /dev/sde1 is now marked as RaidDevice 0 by mdadm. Monitoring the drive lights shows that /dev/sdc1 and /dev/sde1 are syncing.

When the sync is complete, mdadm shows that there are two active, one spare, and one faulty devices:

# mdadm -D /dev/md0
...
Number Major Minor RaidDevice State
0 8 65 0 active sync /dev/sde1
1 8 33 1 active sync /dev/sdc1
2 8 49 -1 spare /dev/sdd1
3 8 17 -1 faulty /dev/sdb1
...

Replacing faulty devices is easy. First, physically replace the faulty device. Depending on the system's hardware this may be able to be accomplished while the system is still powered on. Make sure there is a RAID partition on it, then simply remove the device from the RAID configuration and add it again.

# raidhotremove /dev/md0 /dev/sdb1

# raidhotadd /dev/md0 /dev/sdb1

# mdadm -D /dev/md0
...
Active Devices : 2
Working Devices : 4
Failed Devices : 0
Spare Devices : 2


Number Major Minor RaidDevice State
0 8 65 0 active sync /dev/sde1
1 8 33 1 active sync /dev/sdc1
2 8 17 -1 spare /dev/sdb1
3 8 49 -1 spare /dev/sdd1
...

Example 3: RAID-5

After dealing with RAID-0 and RAID-1 configurations, RAID-5 is easy to handle. It is simply a combination of the two, providing combined disk space and redundancy through parity. First, take a look at how much storage the RAID-5 array has.

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 1.3G 437M 842M 35% /
tmpfs 125M 8.0K 125M 1% /dev/shm
/dev/md0 2.7G 33M 2.7G 2% /data

This size is exactly what is expected. Recall that the array should be (N-1)*S, where N is the number of disks and S is the size of the smallest. In this case N is 4 and S is 900 MB. This is exactly 2.7 GB. 

The contents are /etc/raidtab are very straightforward again. The only addition is a parameter specifying the parity algorithm.

# cat /etc/raidtab
# autogenerated /etc/raidtab by YaST2

raiddev /dev/md0
raid-level 5
nr-raid-disks 4
nr-spare-disks 0
persistent-superblock 1
parity-algorithm left-symmetric
chunk-size 128
device /dev/sdb1
raid-disk 0
device /dev/sdc1
raid-disk 1
device /dev/sdd1
raid-disk 2
device /dev/sde1
raid-disk 3

Recall that a RAID-5 array can only lose a single disk before the array becomes corrupted. Also, in this case, one of the existing disks cannot be converted to a spare disk. To get a spare disk don't include as many disks in the initial array so there are others that can be added as spares later, or add new disks to the system. Then edit /etc/raidtab to reflect the spare disks and re-make the array as was shown with the RAID-1 device. As always, remember to make a backup first!

Looking at the array details with mdadm or examining /proc/mdstat does not reveal anything too surprising; all the disks are devoted to the array. Recall that a RAID-5 array only survives losing one disk. See what happens when this is done.

# mdadm -f /dev/md0 /dev/sdb1
mdadm: set /dev/sdb1 faulty in dev/md0

# umount /data

# fsck.reiserfs /dev/md0
(note that the file system is fine)

At this point the faulty disk could be replaced, then removed and added back to the array with raidhot[remove|add]. Instead of doing that, however, see what happens when another disk is removed.

# mdadm -f /dev/md0 /dev/sdc1
mdadm: set /dev/sdc1 faulty in dev/md0

# fsck.reiserfs /dev/md0
(note that the file system is corrupt)

Because only one disk can be lost in a RAID-5 array it is highly recommended to have at least one spare disk in the array!

Conclusion

RAID offers a very flexible means to expand storage, increase performance, and ensure redundancy. This article gives a quick overview of the most common configurations and administrative tasks. For more information it is recommended to consult the official Linux RAID documentation at /usr/share/doc/packages/raidtools/Software-RAID.HOWTO.html. It describes more configurations and gives more examples. A topic of special note is running mdadm as a daemon to monitor the array and alert the system administrator if any problems arise.


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell