NLM Memory Troubleshooting

  • 7003770
  • 06-Jul-2009
  • 19-Nov-2012

Environment

Novell NetWare 6.5 Support Pack 5
Novell NetWare 6.5 Support Pack 6
Novell NetWare 6.5 Support Pack 7
Novell NetWare 6.5 Support Pack 8

Situation

NLM Memory problems can normally be divided into three areas:
1)  NLM Memory Hogs:  NLMs that consume a great deal of memory either by configuration or by design.
2)  NLM Memory Leaks:  NLMs that consume a great deal of memory by a coding defect.
3)  NLM Allocation Issues:  NLMs that try and allocate very large chunks of memory to the point of failure on the system.
 
These problems can go unnoticed to the administrators and users for a very long time.  However, customers can see the affects of these problems when the following messages are displayed on the server's console:
  • Cache memory allocator out of available memory
  • Short term memory allocator is out of memory

These messages can also be followed by text stating the how many total allocation failures there have been, the size of the last failure, and the NLM that requested the size.

If the NLM having a problem is known, please check the main support web site (https://support.novell.com) for technical documents and troubleshooting specific to that NLM.
 
Disclaimer:  This is a generalized document describing memory problems with NLMs on NetWare 6.5 servers.  As such it cannot include all instances of problems or all troubleshooting methods available.

Resolution

Tools that can help in troubleshooting these types of issues:
 
1)  Seg.nlm
 
With the use of SEG.NLM used on a NetWare 6.5 server, memory statistic log files are automatically recorded once every 30 minutes (by default).  By reviewing these log files (sys:\system\seg.csv and sys:\system\segstats.txt), customers can quickly identify problematic NLMs or areas within the server memory where there are issues. 
 
Analysis of the segstats.txt file can be found in this document:
 
SEG.NLM is free and available for download at this site:  https://www.novell.com/coolsolutions/tools/14445.html
 
 
2)  Novell Remote Manager
 
Using NRM (Novell Remote Manager), login as admin or admin equivalent.  On the left pane under Manage Applications is the item "List Modules."  Click on that.  Displayed now is a sorted list of the NLMs running on the server.  Make sure the NLMs are sorted according to NLM Total (click the red arrow at the top of the column).  While there is no log file available for this, the list and sorting are helpful in determining problems with NLM memory consumption.
 
 
General rule for NLMs to investigate:
Any NLM that consumes more that 25% of the server's RAM or more than 500 MB total should be explained.  NLMs that fit this profile do not automatically qualify as problematic.  Rather, these types of NLMs just need an explanation to understand why they have so much memory.
 
 
 
Troubleshooting NLM Memory Hogs
 
Memory hogs are NLMs that by design or by a configuration parameter consume a great deal of memory. 
 
A few of the NLMs that we've seen in this category are:
 
1.  DS.NLM
 
eDirectory can consume an extremely large amount of RAM if the database on the server is large, or if the configuration for the server allows this to happen.  To modify the eDirectory memory consumption, please see the following document:
 
 
Step #3 in this document details specifics for setting a limit to the memory for DS.NLM.
 
 
2.  NWMKDE.NLM
 
NWMKDE (NetWare Micro-Kernel Database Engine) is an NLM that ships with NetWare by default, but is not maintained by Novell.  This NLM belongs to Pervasive Software, Inc. (the old Btrieve company).  If this NLM is consuming a large amount of RAM, this can easily be changed with a configuration change in the BTI.CFG file.  Please see the following documents for further information: 
 
 
 
Be aware that some backup solutions have been known to change the cache setting in the BTI.CFG file.
 
 
3.  NSS.NLM
 
Memory consumed by NSS.NLM is typically confined to one of two configuration settings for NSS:  1)  Closed File Cache Size and 2)  Name Cache Size.  With ordinary useage, files are opened and closed constantly on the server.  NSS will cache these files and save their information in RAM so that the next time the same file is requested, the response from the server can be quicker. 
 
We have seen instances where extremely large NSS implementations can cause the amount of RAM used for caching to increase to levels that are not desireable.  Modification to the two settings mentioned above can and will help control NSS memory.  For more information about these settings, how to change them, what the effects of changing them, etc., see the following AppNote document:
 
Novell Storage Services (NSS) Performance Monitoring and Tuning
 
 
 
Troubleshooting NLM Memory Leaks
 
NLMs that have memory leaks will typically be identified through one of the applications listed above (SEG or NRM) or from the messages listed on the console screen ("cache memory allocator"), and will show one or more NLMs that constantly grow in memory.  These types of problems are normally attributed to a coding defect where over a period of time memory allocations are not released back to the kernel before new allocations are requested.  Depending upon the severity of the defect, the time to failure can be anywhere from hours to months.
 
Once an NLM is identified as being a potential memory leak candidate, the next course of action is to guarantee that the running code on the server is the very latest offered by the vendor, be it Novell or a 3rd party vendor.  For Novell products, please ensure that you have applied the latest available code updates.  After this, contacting the vendor to report the problem is the next step.  Further troubleshooting beyond this point may require coredumps, LAN traces, SEG log files, etc., to help narrow in on the problem displayed.
 
 
 
Troubleshooting NLM Allocation Issues
 
There are some issues where an NLM doesn't display "hog" activity or "leak" activity, but it still acts abnormally.  For instance, if an NLM requests a memory size of 50 MB, NetWare will try and grant that (in a single allocation response), and the possibility for failure is much higher than for requests of normal size (say around 10 KB or less).  If there is a memory allocation failure, the resultant action on the server would depend upon the robustness of the NLM and how it handles memory failures.
 
In addition to the messages seen on the console screen reporting allocation failures, customers should check the sys:\system\sys$log.err file for a pattern of failures.  Search the file for "cache memory allocator" and "short term memory allocator".  Analyzing this file can help determine if the problem is with one or more NLMs.  The file can also help to determine if the allocation size is extremely large (thus the reason for the failure) or the size is small.  With an extremely large allocation size, please contact the vendor of that NLM.  For very small sizes, it is possible that memory fragmentation has caused this issue.
 
For any type of memory fragmentation issues, please review the following document: