27.3 Configuring or Tuning Group I/O

Group write is a technique of writing data to the volume at regular intervals in order to reduce the seek time on the drive. It also reduces the number of writes because more changes to the same block are made only to memory.

In OES 1 Linux, NSS writes are done on a block-based timer. A block is written one second after the block becomes dirty (modified by a user or process). This can cause lots of head movement because there is no control over the order of blocks being sent to disk.

In OES 2 Linux, NSS performs group writes in three categories: journal, metadata, and user data. By setting policies for group writes, you can improve the performance of the file system for your particular environment.

For information, see the following:

27.3.1 Viewing the Metadata Area Size

NSS for OES 2 Linux provides a logical read-ahead capability. NSS is designed to physically store logically related data near each other, such as files in the same directory. By reading ahead using the logical information, performance is increased. When a block is read, its logically related blocks are also read. The area read is determined by the default area size.

To improve performance for NSS on OES 2 Linux, metadata blocks use an area seed logic to make sure that related metadata blocks are physically stored near each other. The default area size for metadata blocks is 16 blocks that are 4 KB each, or 64 KB total.

For metadata blocks, the seed is set to the block number for the area. When metadata is written, the seed logic determines the closest free block in the area to use next. When the area is new, a new free area is found in a higher area in the pool, and a new seed marks this area. When the search for a free area reaches the end of the pool, it wraps back to start searching for free areas to use at the start of the pool. If no free space of sufficient size is found, the size is temporarily halved from 16 to 8, 4, 2, or 1 blocks progressively as needed until the temporary size is 1. A setting of 1 block indicates that the pool is essentially out of space. As space is freed or the pool increases in size, future space allocations use the default area size of 16 blocks.

The maximum number of dirty data blocks that are allowed to accumulate is governed by the Metadata Group Write Limit parameter. By default, the limit is 20000 dirty blocks. For information, see Section 27.3.3, Configuring the Metadata Group Write Timer and Limit.

You can view the metadata area size that is currently in use and the number of dirty blocks waiting to be written by viewing the Current Metadata Group Write Size parameter in the NSS status report. The information is reported in the following format:

Current Metadata Group Write Size  = areasize (number_dirty_blocks)

For example, with the default setting of 16 4-KB blocks, the metadata area is 64 KB. If 16000 dirty blocks are waiting to be written, the values are reported follows:

Current Metadata Group Write Size  = 64K (16000)

To view the Current Metadata Group Write Size information:

  1. Open a terminal console, then log in as the root user.

  2. At the console prompt, open the NSS Console by entering

    nsscon
    
  3. At the nsscon prompt, enter

    nss /status
    
  4. In the NSS status report, look for the Current Metadata Group Write Size parameter to view the current values:

    Current Metadata Group Write Size  = areasize (number_dirty_blocks)
    

27.3.2 Configuring the Journal Group Write Timer

For NSS, the journal keeps metadata consistent up to the time when its blocks are written to the device. The Journal Group Write Timer determines the elapsed time between writes of journal blocks. Thus, its timer policy determines how long ago that a consistent point is relative to a system crash.

Journal blocks are written by default as a group every second. Journal blocks might be written sooner than the one-second elapsed time if another timer policy triggers a write or if the journal gets full before the time elapses. Writing blocks as a group helps improve performance because it allows fewer writes, while ensuring that data is actually recorded to the device.

Use the following NSS command option to control the group write policy for journal blocks:

/JournalGroupWriteTime=seconds

Use the JournalGroupWriteTime parameter to specify the elapsed time to wait before group writes of journal blocks.

Journal Group Write Timer

Risk for Inconsistent Metadata (Time Elapsed Since Last Consistent Point)

File System Performance

1 second (default)

Minimized

Optimized for most scenarios

Greater than 1 second

Higher

Faster

To set the JournalGroupWriteTime parameter, issue the following command as the root user in the NSS Console (nsscon):

nss /JournalGroupWriteTime=seconds

Replace seconds with the maximum number of seconds to elapse before forcing journal blocks to be written to the volume. The default value of seconds is 1.

For example, to group write journal blocks every 2 seconds, enter

nss /JournalGroupWriteTime=2

27.3.3 Configuring the Metadata Group Write Timer and Limit

The metadata blocks are written by default as a group every 40 seconds, or when the MetadataGroupWriteLimit is reached, whichever occurs first. Metadata loss does not occur if the system crashes because all metadata changes are automatically recorded in the journal. However, increasing the timer setting increases the redo/undo time that is required to activate a pool (the mount time) after a crash because there is more unwritten metadata in the journal to be resolved.

IMPORTANT:Within a clustered environment, this means that the time to complete a failover is related to the setting of MetadataGroupWriteLimit parameter.

You can limit the amount of time it takes for a pool activation after a crash by decreasing the maximum number of metadata blocks that can be dirty in the MetadataGroupWriteLimit parameter. A group write is performed when the limit is reached.

You can increase performance of the file system by increasing the maximum number of metadata blocks that can be dirty.

Use the following NSS command options to control the group write behavior for metadata blocks:

/MetadataGroupWriteTime=seconds

Use the MetadataGroupWriteTime parameter to specify the elapsed time to wait before group writes of metadata blocks. Decreasing the metadata group write timer can help reduce the mount time for the volume after a crash.

Metadata Group Write Timer

Time to Mount After a System Crash

File System Performance

40 seconds (default)

Optimized for most scenarios

Optimized for most scenarios

Less than 40 seconds

Faster

Slower

More than 40 seconds

Slower

Faster

To set the MetadataGroupWriteTime parameter, issue the following command as the root user in the NSS Console (nsscon):

nss /MetadataGroupWriteTime=seconds

Replace seconds with the maximum number of seconds to elapse before forcing metadata blocks to be written to the volume. The default value of seconds is 40.

For example, to group write metadata blocks every 30 seconds, enter

nss /MetadataGroupWriteTime=30
/MetadataGroupWriteLimit=blocks

Use the MetadataGroupWriteLimit parameter to specify the maximum number of metadata blocks that can be dirty before a group write is performed. The following describes how the settings affect time to mount and file system performance:

Maximum Number of Dirty Metadata Blocks

Time to Mount After a System Crash

File System Performance

20000 blocks (default)

Optimized for most scenarios

Optimized for most scenarios

Less than 20000 blocks

Faster

Slower

More than 20000 blocks

Slower

Faster

To set the MetadataGroupWriteLimit parameter, issue the following command as the root user in the NSS Console (nsscon):

nss /MetadataGroupWriteLimit=blocks

Replace blocks with the maximum number of metadata blocks that can be dirty before forcing them to be written to the volume. The default value of blocks is 20000.

For example, to decrease the maximum number of dirty metadata blocks to 15,000 for the purpose of reducing the mount time, enter

nss /MetadataGroupWriteLimit=15000

For example, to increase the maximum number of dirty metadata blocks to 30,000 for the purpose of increasing the file system performance, enter

nss /MetadataGroupWriteLimit=30000

27.3.4 Configuring the User Data Group Write Timer

The user data blocks are written as a group every 3 seconds. This increases the risk of data loss on a crash compared to previous versions of NSS that write data blocks every 1 second. You can set the user data group write timer (UserDataGroupWriteTime) to 1 second to get the familiar NSS behavior for data writes.

Use the following NSS command option to control the group write behavior for user data blocks:

/UserDataGroupWriteTime=seconds

Use the UserDataGroupWriteTime parameter to specify the elapsed time to wait before group writes of user data blocks. Decreasing the user data group write timer can help reduce the risk of data loss for a volume after a crash.

User Data Group Write Timer

Risk of Data Loss After a Crash

File System Performance

3 seconds (default)

Optimized for most scenarios

Optimized for most scenarios

1 second

Lower, typical of NSS on NetWare and OES 1 Linux and NetWare

Slower

Greater than 3 seconds

Higher

Faster

To set the UserDataGroupWriteTimer parameter, issue the following command as the root user in the NSS Console (nsscon):

nss /UserDataGroupWriteTime=seconds

Replace seconds with the maximum number of seconds to elapse before forcing user data blocks to be written to the volume. The default value of seconds is 3.

For example, to group write user data blocks every 1 second, enter

nss /UserDataGroupWriteTime=1

27.3.5 Viewing Group Write Policies

  1. Open a terminal console, then log in as the root user.

  2. At the terminal console prompt, enter

    nsscon
    
  3. In nsscon, enter

    nss /status
    
  4. Look for the following settings in the Current NSS Status report:

      Journal Flush Timer                = 1 second
      Metadata Flush Timer               = 40 seconds
      User Data Flush Timer              = 3 seconds
      Current Metadata Group Write Size  = 64k (16)
      Metadata Block Group Write Limit   = 80000k (20000)