Stale File Handle error while using NSS volume shared through NFS

  • 7014909
  • 14-Apr-2014
  • 22-Jun-2016

Environment

Novell Open Enterprise Server 11 (OES 11) Linux
Novell Open Enterprise Server 2015 (OES 2015) Linux

Situation

An OES server with one or more NSS volumes is exporting a path from the NSS volume via NFS.  An NFS client can mount this, but after about 15 minutes, the client begins receiving "Stale File Handle" errors from the NFS Server.

Resolution

The issue deals with the fact that many Linux processes are case-sensitive, yet NSS volumes are not.  (See the "Cause" section of this document for more details).  To make the Linux nfs export system able to get along with file systems that are case-insensitive, a change was made in SLES 11 SP3 and beyond, to better support this aspect of NSS.
 
If SLES 11 SP3 is in use, make sure that nfs-client and nfs-kernel-server packages are update to 1.2.3-18.37.1 or higher.
 
If SLES 11 SP4 is in use, it will have this change already.
 
If an update is not yet possible, or are on a SLES 11 support pack earlier than SP3, there is an easy workaround:
 
On the OES server (where the NSS volume physically resides) create a cron job:
 
As root user, execute "crontab -e" and add the following line to the crontab file:
 
*/5 * * * * /usr/sbin/exportfs -f > /dev/null 2>&1
 
This will cause the successfully functioning export information to be refreshed every 5 minutes.  This way, the event which causes the information to be lost (which occurs after 15 minutes) will not occur.

Cause

NSS volumes, in their default configuration, are not fully case sensitive. This is different from most file systems on Linux. Even though the NSS volume itself is not fully case sensitive, the NFS Server and NFS mount daemon might experience some confusion if there is a upper/lowercase discrepancy in an NFS export point.
 
For example, consider this situation:
An NSS volume exists at:
/media/nss/DATA
and if an "ls" is done there, a subdirectory called "users" is shown to be present.
 
NSS may not care whether that directory name is accessed by referencing "users" or "USERS" or "uSeRs", but other Linux processes might.  To further complicate matters, certain information about the directories may get put into memory in an unexpected upper/low case combination, depending on what case was used the first time the directory object was accessed after boot, by any process (not just NFS processes).  Therefore, even matching the /etc/export syntax to that shown in "ls" (a directory list) may not fully protect against this issue.
 
This case "confusion" will not happen on the volumename itself (i.e. "DATA") or on "/media/nss", as those are case sensitive.  If an incorrect case is used with one of those components, the path will not export or be mountable in the first place.  This confusion will only happen on subdirectories which exist inside the NSS volume, and which are specifically named within the export syntax in /etc/exports.
 
When this case confusion is happening, it can typically be confirmed by checking the path case in these two locations:
 
The output of the command:
exportfs
 
The contents of this file:
/proc/net/rpc/nfsd.fh/content
 
If the case of the directory path in those two locations doesn't match, the problem will likely occur.  While this mismatch can be corrected by editing /etc/exports and afterwards restarting NFS server with "rcnfsserver restart", that correction may only survive until the machine is restarted.  A new mismatch could occur again after reboot.