16.2 Verifying and Rebuilding an NSS Pool and Its Volumes

16.2.1 Mounting the Volume to Repair Journaled Errors

Volume errors are typically transactions left unfinished during a system crash of some kind. This type of error is fixed automatically during volume mount by the NSS journaling feature.

If errors persist after you mount the volume, or if you cannot mount the volume, first rule out hardware causes for the problems. For information, see Section 16.2.2, Ruling Out Hardware Causes.

16.2.2 Ruling Out Hardware Causes

If a volume cannot be mounted or problems persist after journaling errors are resolved, check the hardware for faulty media or controller problems.

  1. Make sure you have a good backup of the data.

  2. Use the latest diagnostic software and utilities from the manufacturer of your hard drives and controllers to troubleshoot the hard drives without destroying the data.

    For example, verify the media integrity and that devices are operating correctly.

  3. If necessary, repair the media or controllers.

If errors persist after you have ruled out hardware causes, and you do not have a viable backup to restore to the last known good state, you should check the pool for metadata inconsistencies. For information, see Section 16.2.3, Verifying the Pool to Identify Metadata Inconsistencies.

16.2.3 Verifying the Pool to Identify Metadata Inconsistencies

The verify process is a read-only assessment of the pool. The Pool Verify option searches the pool for inconsistent data blocks or other errors in the file system’s metadata and reports data in the verification log. For information on where to find the verification log and how to interpret any reported errors, see Section 16.2.4, Reviewing Log Files for Metadata Consistency Errors.

  1. For a 32-bit machine, make sure you have enough space available in the Linux kernel cache memory to run a pool verify.

    When running ravsui(8) for a pool verify or a pool rebuild on Linux, the utility needs contiguous space in kernel memory separate from the space allocated to the core NSS process. The larger the pool, the larger the space that is needed. To make space available, you might need to reduce the space used by other processes. You can optionally reduce the minimum number of buffers reserved for the core NSS process to as little as 10,000 4-KB buffers.

    1. Open a terminal console as the root user.

    2. At the console prompt, enter

      nsscon
      
    3. In nsscon, enter

      nss /MinBufferCacheSize=10000
      
  2. Place the pool in maintenance mode.

    1. At a terminal prompt, enter

      nsscon
      
    2. In nsscon, enter

      nss /PoolMaintenance=poolname
      
  3. Start the pool verify by entering the following at the terminal console prompt:

    ravsui verify poolname
    
  4. Use RAVVIEW to read the logs.

    For information about using RAVVIEW, see Section B.9, ravview.

  5. Do one of the following:

    • If the log reports no errors with the pool’s metadata, it is safe to activate the pool and mount the volumes.

    • If the log reports no errors with the pool’s metadata, but you still cannot create files or directories, run a Pool Rebuild with the ReZID option. For information, see Section 16.3, ReZIDing Volumes in an NSS Pool.

    • If the log reports errors with the pool’s metadata, the affected volumes remain in Maintenance mode. Decide whether to rebuild the pool based on the type of error and potential outcomes. For information about rebuilding the pool, see Section 16.2.5, Rebuilding NSS Pools to Repair Metadata Consistency.

  6. For a 32-bit machine, if you modified the MinBufferCacheSize setting in Step 1, you can change it back to its original setting now, unless you are continuing with a pool rebuild.

    1. Open a terminal console as the root user.

    2. At the console prompt, enter

      nsscon
      
    3. In nsscon, enter

      nss MinBufferCacheSize=value
      

      Replace value with the desired minimum number of 4-KB buffers. The default value is 30000.

16.2.4 Reviewing Log Files for Metadata Consistency Errors

Make sure to check the error log whenever an NSS volume does not come up in active mode after a verify or rebuild.

Log Files and On-Screen Display

Messages are written to the following logs:

Table 16-1 Location of Log Files for the NSS Pool Verify and Pool Rebuild Utilities

Log

Purpose

/var/opt/novell/log/nss/rav/filename.vbf

This is the default location, but you can specify the location and the filename.

Log of the pool verify process using ravsui verify.

If a volume has errors, the errors are displayed on the screen and written to this log file of errors and transactions.

On Linux, use the ravview utility to read logs. For information, see Section B.9, ravview.

/var/opt/novell/log/nss/rav/filename.rtf

Log of the pool rebuild process using ravsui rebuild.

This log contains information about data that has been lost during a rebuild by the pruning of leaves in the data structure.

Whenever you verify or rebuild a pool, the new information is appended at the end of the log file. If you want to keep old log files intact, rename the log file or move it to another location before you start the verify or rebuild process.

Warnings Reported

Warnings indicate that there are problems with the metadata, but that there is no threat of data corruption. Performing a data restore from a backup tape or rebuilding the pool’s metadata are optional. However, rebuilding a pools’s metadata typically results in some data loss.

Errors Reported

Errors indicate that there are physical integrity problems with the pool’s metadata, and data corruption will definitely occur, or it will continue to occur, if you continue to use the pool as it is.

If you decide to rebuild the pool, use the Pool Rebuild utility. For information, see Section 16.2.5, Rebuilding NSS Pools to Repair Metadata Consistency.

No Errors Reported, but Cannot Create Files or Directories

If the verify log does not report errors, but you continue to be unable to create files or directories on volumes in the pool, it might be because the files’ ID numbers have exceeded the maximum size of file numbering field. You might need to rebuild the pool with the ReZID option. For information about how to decide if a ReZID is needed, see Section 16.3, ReZIDing Volumes in an NSS Pool.

16.2.5 Rebuilding NSS Pools to Repair Metadata Consistency

The purpose of a pool rebuild is to repair the metadata consistency of the file system. Rebuild uses the existing leaves of an object tree to rebuild all the other trees in the system to restore visibility of files and directories. It checks all blocks in the system. Afterwards, the NSS volume remains in maintenance mode if there are still errors in the data structure; otherwise, it reverts to the active state.

WARNING:Data will be lost during the rebuild.

A pool rebuild depends on many variables, so it is difficult to estimate how long it might take. The number of storage objects in a pool, such as volumes, directories, and files, is the primary consideration in determining the rebuild time, not the size of the pool. This is because a pool rebuild is reconstructing the metadata for the pool, not its data. For example, it would take longer to rebuild the metadata for a 200 GB pool with many files than for a 1 TB pool with only a few files. Other key variables are the number of processors, the speed of the processors, and the size of the memory available in the server.

You do not need to bring down the server to rebuild a pool. NSS allows you to temporarily place an individual storage pool in maintenance mode while you verify or rebuild it. While the pool is deactivated, users do not have access to any of the volumes in that pool.

If you do not place the pool in maintenance mode before issuing the rebuild or verify commands, you receive NSS Error 21726:

NSS error: PoolVerify results
   Status: 21726
     Name: zERR_RAV_STATE_MAINTENANCE_REQUIRED
   Source: nXML.cpp[1289]

To rebuild the pool:

  1. Depending on the nature of the reported errors, you might want to open a call with Novell Support before you begin the rebuild process.

  2. For a 32-bit machine, make sure you have enough space available in the Linux kernel cache memory to run a pool rebuild.

    When running ravsui(8) for a pool verify or a pool rebuild on Linux, the utility needs contiguous space in kernel memory separate from the space allocated to the core NSS process. The larger the pool, the larger the space that is needed. To make space available, you might need to reduce the space used by other processes. You can optionally reduce the minimum number of buffers reserved for the core NSS process to as little as 10,000 4-KB buffers.

    1. Open a terminal console as the root user.

    2. At the console prompt, enter

      nsscon
      
    3. In nsscon, enter

      nss MinBufferCacheSize=10000
      
  3. Place the pool in maintenance mode.

    1. At a terminal prompt, enter

      nsscon
      
    2. In nsscon, enter

      nss /PoolMaintenance=poolname
      
  4. Start the pool rebuild. At the terminal console prompt, enter

    ravsui rebuild poolname
    

    For information, see Section B.8, ravsui for options to set the pruning parameters for the rebuild.

    Rebuilding can take several minutes to several hours, depending on the number of storage objects in the pool.

  5. Review the log on-screen or in the filename.rtf file to learn what data has been lost during the rebuild.

    For information, see Section 16.2.4, Reviewing Log Files for Metadata Consistency Errors.

  6. Do one of the following:

    • No Errors: If errors do not exist at the end of the rebuild, the pool’s volumes are available for mounting.

    • Errors: If errors still exist, the pool remains in the maintenance state. Repeat the pool verify to determine the nature of the errors, then call Novell Support for assistance.

  7. For a 32-bit machine, if you modified the MinBufferCacheSize setting in Step 2, you can change it back to its original setting.

    1. Open a terminal console as the root user.

    2. At the console prompt, enter

      nsscon
      
    3. In nsscon, enter

      nss MinBufferCacheSize=value
      

      Replace value with the desired minimum number of 4-KB buffers. The default value is 30000.