Effective Linux Resource Management
Use control groups to manage complexity and performance in SUSE Linux Enterprise systems
Written by Matthias G. Eckermann and Bill Tobey
When Linux servers under perform—particularly multi-purpose systems running multiple applications for multiple user groups—the root cause is frequently resource monopolization by one or more processes or users. Wouldn’t it be wonderful if you could set and enforce some ground rules to govern how much CPU, memory, disk I/O or network I/O each process or user could command?
Well you can! Control groups (cgroups) are a feature of the Linux kernel that provide mechanisms for partitioning sets of tasks into one or many hierarchical groups, and associating each group with a set of subsystem resource parameters that affect their execution performance. You might use control groups:
- To keep a Web server from using all the memory on a system that’s also running a data base
- To keep a backup system from using too much network I/O bandwidth and crashing the business apps running on the same system
- To allocate system resources among user groups of different priority (the faculty, staff and students of a university, for instance)
There are two types of control group subsystems. Isolation and special controls subsystems include five different controls: CPUset, Namespace, Freezer, Device and Checkpoint and Restart. Resource subsystems are a group of four controls: CPU, Memory, Disk and Network. Before we investigate the functions of each subsystem, it’s important to note that all are implemented in exactly the same manner, by mounting one or more subsystems as virtual file systems.
Subsystems can be mounted individually—in this case, the CPUset subsystem—as follows:
- mount -t cgroup -o cpuset none /cpuset
Or, all cgroup subsystems can be mounted at ounce:
- mount -t cgroup none / cgroup
When Linux servers under perform, the root cause is frequently resource monopolization by one or more processes or users.
The Isolation and Special Control Subsystems
- The CPUset subsystem ties processes to specific CPU and memory nodes (See Figure 1.). In an SMP system, CPUset may restrict a process to a specific set of CPUs, or, in a system with multi-core processors, to a specific set of CPU cores.
- The Namespace subsystem provides a private view of the system to the processes in a cgroup, and is used primarily for OS-level virtualization. It has no special functions other than to track changes in namespace.
- The Freezer subsystem stops all the processes in a cgroup from executing by removing them from the kernel task scheduler. Once you’ve mounted the Freezer subsystem you can stop any process completely by placing it in the cgroup, using the FROZEN command:
echo FROZEN > /freezer/freezer.state
When you’re ready, the frozen group of processes can be restarted using the THAW command:
echo THAWED > /freezer/freezer.state
The primary application for the Freezer subsystem is backing up write-intensive applications. First you freeze the application, then you freeze the file system. Create your snapshot or backup, then unfreeze the file system. Finally, unfreeze the process and resume normal operation.
- The Device subsystem provides device white lists for groups of processes, allowing or denying read/write access to listed devices or file systems.
- The Checkpoint / Restart subsystem supports process migration between machines by stopping all the processes in control group and saving their state information to a dump file for convenient relocation and restart.
The Resource Control Subsystems
- The CPU control subsystem uses the kernel’s CFS task scheduler to share CPU bandwidth among groups of processes. It’s an effective but somewhat mechanically complicated way to allocate CPU capacity.
- The Memory control subsystem limits memory usage in user-space processes, primarily by discarding least recently used pages (LRU) to reclaim memory when a group of processes exceeds a preset limit. This subsystem imposes no restrictions on memory use by the Linux kernel.
- The Disk I/O control subsystem allows or denies disk access to groups of tasks. Several approaches to this function have been proposed and are under active consideration by the Linux kernel community. A provisional controller subsystem is included in SUSE Linux Enterprise Server 11 Service Pack 1 that allows specific parameters of the CFQ I/O scheduler to be managed on a per cgroup basis.
- The Network I/O control subsystem allows or denies network access to groups of tasks. This control is also under continuing development and discussion by the kernel community. A provisional subsystem is included in SUSE Linux Enterprise Server 11 Service Pack 1.