Novell is now a part of Micro Focus

My Favorites

Close

Please to see your favorites.

Low write performance on SLES 11/12 servers with large RAM

This document (7010287) is provided subject to the disclaimer at the end of this document.

Environment

SUSE Linux Enterprise Server 11 (SLES 11)
SUSE Linux Enterprise Server 11 Service Pack 1 (SLES 11 SP1)
SUSE Linux Enterprise Server 11 Service Pack 2 (SLES 11 SP2)
SUSE Linux Enterprise Server 11 Service Pack 3 (SLES 11 SP3)
SUSE Linux Enterprise Server 11 Service Pack 4 (SLES 11 SP4)

SUSE Linux Enterprise Server 12 (SLES 12)
SUSE Linux Enterprise Server 12 Service Pack 2 (SLES 12 SP1)
SUSE Linux Enterprise Server 12 Service Pack 2 (SLES 12 SP2)
SUSE Linux Enterprise Server 12 Service Pack 1 (SLES 12 SP3)

Situation

Low performance, especially involving writing of data to files over NFS, may occur on SLES servers with large amounts of RAM.

Resolution

For performance reasons, written data goes into a cache before being sent to disk.  The cache of data waiting to be written is called "dirty cache".  There are some tunable settings which influence how the Linux kernel deals with dirty cache.  The defaults for these settings are chosen for average workloads on average servers.  However, technology changes quickly and the amount of RAM in an "average" server is not easily predictable.  More and more modern systems have too much RAM for these settings to be reasonable.
 
If a server has more than 8 GB of RAM, there may be cases were these values should be decreased.  This may seem counter-intuitive, given than most caches give better performance as you increase their size.  That is often true of read caches, but for write caches there are trade-offs.  Write caches allow you to write (to memory) very quickly, but then at some point you have to "pay that debt" and actually get the work done.  Writing out all that data can take considerable time.  This is especially true when an application is writing large amounts of data to a file system which resides over a network.  For example, when an application is writing to an NFS mount point, a large dirty cache can take excessive time to flush to an NFS server.  High-RAM systems which are NFS Clients often need to be tuned downward.
 
Of course, it is also possible (but far less common) that NFS *Servers* (not just NFS clients) or any typical Linux machine might need these values tuned lower, if the amount of dirty cache is too large.  For dirty cache, "too large" simply means:  Any size that can't be flushed quickly and efficiently.  And of course, "quickly and efficiently" will vary depending on the hardware in use, how it is configured, whether it is functioning perfectly or having intermittent errors, etc.  Therefore, it is difficult to give a rule of thumb about when and where tuning is most needed.  The best that can be said is, "If you have problems that involve performance during large writes, try tuning these caches."
 
The following tunable settings should be considered.  Most administrators will benefit most / quickest from the "Alternative method" further below, at least to initially test a dramatic reduction in dirty cache, and evaluate the impact.  But it is best to become familiar with this entire discussion:
  
 
vm.dirty_ratio
Maximum percentage of dirty system memory (default on SLES 11 is 40, on SLES 12 the default is 20).
 
When this percentage of memory is hit, processes will not be allowed to write more until some of their cached data is written out.  This ensures that the ratio is enforced.  By itself, that can slow down writes noticeably, but not tremendously.  However, if an application has written a large amount of data which is still in the dirty cache, and then issues a "sync" command to have it all written to disk, this can take a significant amount of time to accomplish.  During that time, some applications may appear stuck or hung.  Some applications which have timers watching those processes may even believe that too much time has passed and the operation needs to be aborted, also known as a "timeout".
 
Therefore, on large memory servers, this setting may need to be reduced in order for the dirty cache to stay smaller.  This will allow a full sync (flush, or commit) without long delays.  A setting of 10% (instead of 40%) may sometimes be appropriate to test, but often it is necessary to go even lower.  A range of experimentation may be enlightening.
 
 
vm.dirty_background_ratio
Percentage of dirty system memory at which background writeback will start (default 10).
 
"Background" writes are kicked off to get writing done even when the application isn't forcing a sync, and even if the dirty_ratio has not yet been reached.   The goal of this setting is to keeps the dirty cache from growing too large.  When reducing dirty_ratio to 10, it can be common to reduce dirty_background_ratio to 5 or lower.  Rule of thumb:  dirty_background_ratio = 1/4 to 1/2 of the dirty_ratio.
 
 
These limits can be observed or modified with the sysctl utility (see man pages for sysctl(8), sysctl.conf(5)).  But simply put, these can be set (to come into effect upon boot) in /etc/sysctl.conf, as:
vm.dirty_ratio = 10
vm.dirty_background_ratio = 5
 
If desired, rather than setting the parameters "permanently" in sysctl.conf, they can be changed immediately but only in effect until reboot (or resetting), with the following example method:
 
echo 10 > /proc/sys/vm/dirty_ratio
echo 5 > /proc/sys/vm/dirty_background_ratio 
 
 
Alternative method, and lesser known rules:
 
There is a coded limit to how low dirty_ratio can be set.  Therefore, in dealing with larger amounts of RAM, percentage ratios might not be granular enough.  Some kernels won't allow dirty_ratio to be set below 5%.  When a smaller setting is needed, switch to setting dirty_bytes and dirty_background_bytes instead of the corresponding ratios.  Keep in mind that only one method (bytes or ratios) can be used at a time.  Typically, setting one type will automatically disable the other type by setting it to 0.  It is usually not necessary to have more than a few hundred Megabytes of memory in dirty cache, so good test settings may be:
 
echo 629145600 > /proc/sys/vm/dirty_bytes  #for 600 MB maximum dirty cache
echo 314572800 > /proc/sys/vm/dirty_background_bytes    #to spawn background write threads once the cache holds 300 MB
 
Or in /etc/sysctl.conf:
vm.dirty_bytes = 629145600
vm.dirty_background_bytes = 314572800
 
By the way, if the dirty_backgound_* setting (either bytes or ratio) is set equal to or beyond the dirty_* setting, the kernel will automatically use dirty_background_* = 1/2 dirty_*.

Disclaimer

This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

  • Document ID:7010287
  • Creation Date:09-MAR-12
  • Modified Date:02-NOV-17
    • SUSESUSE Linux Enterprise Server

Did this document solve your problem? Provide Feedback