Two Paths to Server Performance
I/O scheduler and file system selection can boost SUSE Linux Enterprise server performance
Written by Matthias G. Eckermann and Bill Tobey
The Novell approach to assembling a SUSE Linux server distribution has always been to provide a wide range of the best packages and tools available from the community. Our goal is to give IT organizations the most flexible and versatile resource set for configuring and optimizing high-performance servers for a complete range of data center applications.
This article will explore two often-overlooked areas where SUSE Linux Enterprise Server provides multiple options that administrators can exploit to enhance server performance: the I/O scheduler and the file system.
Meet Your I/O Scheduler
The I/O scheduler is the part of the kernel that handles read / write access to block storage devices—a USB stick, local disk, NAS filer, SAN, network file system and any other storage environment that holds data in blocks. A scheduler queues and sequences the execution of read-write requests in order to manage mechanical latency (the seek time related to head travel around the disk) and optimize data delivery performance. Its bag of tricks includes three techniques for manipulating the request queue:
- Request merging – Requests for data in adjacent blocks can be combined to improve throughput by reducing both seek time and the total number of syscalls required to service a request.
- Directional (elevator) reordering – Requests can be reordered based on location, to maintain head movement in one direction for as long as possible, using the same control methodology as an elevator to avoid service starvation at the disk peripheries.
- Priority reordering – Requests can be sequenced according to various priority schemes, such as a start-of-execution deadline assigned to each request at time of receipt.
The Four Types of Linux I/O Schedulers
There are four types of Linux I/O schedulers, each of which implements the basic sequencing techniques in different ways and combinations, providing significant variations in I/O performance with different application workloads.
The NOOP scheduler is the simplest of all Linux I/O schedulers. It merges requests to improve throughput but otherwise attempts no other performance optimization. All requests go into a single unprioritized first-in, first-out queue for execution. It’s ideal for storage environments with extensive caching, and those with alternate scheduling mechanisms—a storage area network with multipath access through a switched interconnect, for instance, or virtual machines, where the hyperviser provides I/O backend. It’s also a good choice for systems with solid-state storage, where there is no mechanical latency to be managed.
To activate the NOOP I/O scheduler for use with all applications and storage devices, edit your boot loader configuration settings to pass the kernel parameter: elevator=noop.
The Deadline scheduler applies a service deadline to each incoming request. This sets a cap on per-request latency and ensures good disk throughput. Service queues are prioritized by deadline expiration, making this a good choice for real-time applications, databases and other disk-intensive applications. To activate the Deadline I/O scheduler for use with all applications and storage devices, edit your boot loader configuration settings to pass the kernel parameter: elevator=deadline.
The Anticipatory scheduler does exactly as its name implies. It anticipates that a completed I/O request will be followed by additional requests for adjacent blocks. After completing a read or write, it waits a few milliseconds for subsequent nearby requests before moving on to the next queue item. Service queues are prioritized for proximity, following a strategy that can maximize disk throughput at the risk of a slight increase in latency.
The Anticipatory scheduler delivers best performance with Web and file servers, and desktops with single IDE/SATA disks. It is the default scheduler in the mainline Linux kernel, and can be activated by editing the boot loader configuration file to pass the kernel parameter: elevator=as.
The Completely Fair Queuing (CFQ) scheduler provides a good compromise between throughput and latency by treating all competing processes even-handedly. Each process is given a separate request queue and a dedicated time slice of disk access. CFQ provides the minimal worst-case latency on most reads and writes, making it suitable for a wide range of applications, particularly multi-user systems. Because of our unique desktop-to-data center strategy, CFQ is the default I/O scheduler in SUSE Linux Enterprise Server 11. It can be activated by editing the boot loader configuration file to pass the kernel parameter: elevator=cfq.
Making Per-Device I/O Scheduler Assignments
If you have multiple applications running on a server, using different storage environments, it’s possible to make separate I/O scheduler assignments to optimize the performance of each application-storage pair. These assignments can even be changed in production. You can check the I/O scheduler setting for individual storage devices with the following shell command: /sys/block/*DEV*/queue/iosched.
If desired, you can then re-set the I/O scheduler assignment for each device using this command: echo SCHEDNAME > /sys/block/*DEV*/queue/scheduler.
Integrity, Performance and the Barrier In Between
Barriers are a feature the kernel’s block I/O subsystem makes available to journaling file systems to protect data integrity. A barrier request temporarily locks the I/O scheduler’s execution queue, ensuring that a sequence of journal write requests are securely committed to physical media before any subsequent requests are served. Barriers protect metadata and ensure file system integrity in the event of a system crash, but they do so at the expense of a noticeable performance penalty. Novell assumes a higher value for data integrity than performance in enterprise computing environments, so barrier support is switched on by default in SUSE Linux Enterprise Server. It can be turned off to improve performance, but only by a knowledgeable administrator prepared to assume the risk.
- With reiserFS you can enable / disable barriers using the mount options: barrier=flush or barrier=none.
- With ext3 you can enable / disable barriers using the mount options: barriers=1 or barriers=0.
- With XFS you can enable / disable barriers using the mount options: barrier or nobarrier.
File System Selection for Server Performance
Another set-up decision that can significantly affect server performance is the choice of file systems. As is the case with I/O schedulers, SUSE Linux Enterprise Server ships with a number of file system alternatives, allowing administrators to match file systems and application workloads for optimum performance. Here are a few guidelines for making the right performance pick.
- Choose ReiserFS for small files – ReiserFS is best suited for applications that generate lots of small files, including mail, NFS and database servers, and for applications that use synchronous I/O.
- Choose ext3 for small file systems – Ext3, the default file system in SUSE Linux Enterprise Server 11, is best suited for small file systems of 100 gigabytes or less. If you’re planning to use ext3 with large numbers of files in a single directory, you should consider enabling btree support. This can be accomplished with the shell command: # mkfs.ext3 -O dir_index. Note that btree support is enabled by default in version 11 SP1.
- Choose XFS for large files and streaming media – XFS is an excellent choice for large files and medium to very large file systems (up to 16 terabytes on 32-bit systems, or a theoretical 8 exabytes on 64-bit systems). Its low latency transfer characteristics also make it an ideal selection for streaming media applications. SUSE Linux Enterprise has supported XFS since version 8, and Novell is working closely with SGI to optimize its performance with future releases of SUSE Linux Enterprise Server. It merits consideration for any file system likely to exceed 100 gigabytes, unless other factors (e.g. many small files) dictate another choice.
XFS also offers a number of special features that can be particularly useful, including dump / restore for backup and recovery, online file system check, and online defragmentation.
- Choose OCFS2 fo cluster performance or high availability – Oracle Cluster File System 2 (OCFS2) is a POSIX-compliant shared-disk cluster file system for Linux that is developed by the community under GPL. Because it provides local file system semantics, OCFS2 can be used with any application. Cluster-aware applications can leverage its parallel I/O support for higher performance, other applications can leverage its multi-node support to achieve higher availability through automated failover.
- Consider btrfs for the future – Btrfs is a new, copy on write file system for Linux aimed at bringing additional enterprise class file system features to the Linux kernel. Initially developed by Oracle, btrfs is licensed under the GPL and has quickly been adopted by the community. Long-awaited features include integrated volume management, copy on write, writable snapshots (and snapshots of snapshots), extents, dynamic inode allocation, checksums on data and metadata, online file system check and defragmentation, and integrated multiple device support.
Btrfs is still under intense development, but is included as a technology preview in SUSE Linux Enterprise Server 11 SP1.
Measuring I/O Scheduler and File System Performance
Once you’ve made your I/O scheduler and file system selections, there are many tools available to measure your configured system’s performance. A few favorites include:
- Bonnie is a simple but useful tool that provides a variety of benchmarks on the speed of your file system, OS caching, the underlying device and your libc. It is supported in the SUSE Linux Enterprise Server distribution.
- fio is an I/O tool meant to be used both for benchmark and stress/hardware verification. It has support for 13 different types of I/O engines. It is available at: http://freshmeat.net/projects/fio/. Packages for SUSE Linux Enterprise Server 11 are available at: http://download.opensuse.org/repositories/benchmark/SLE_11/.
- IOzone is a file system benchmark tool that generates and measures a variety of file operations. Iozone has been ported to many machines and runs under many operating systems. It is available at: http://www.iozone.org/ and http://download.opensuse.org/repositories/home:/mge1512:/benchmarking/SLE_11/.
Experiment to Find Your Optimum Configuration
I/O scheduler and file system selection can have major effects on the performance of SUSE Linux Enterprise Servers. We strongly recommend that you experiment with different configurations to gain first hand experience. And watch for our upcoming article on cgroups for kernel resource management.