Troubleshooting High Utilization On A NetWare Server

  • 3790791
  • 08-Nov-2006
  • 26-Apr-2012

Environment

Novell NetWare 6.5
Novell NetWare 6.0
Novell NetWare 5.1
Novell NetWare 4.12
Novell NetWare Remote Manager

Resolution

Gather information:
What has changed?
Were patches applied recently?
Server rebooted or loss of power?
When did the problem start?
Does the utilization show up immediately or over time?

Narrow the possible causes for the utilization:
Unplug the server from the LAN.
Does the utilization stay?
The troubleshooting process has just been cut in half!

Utilization HIGH when disconnected from the LAN:
  1. Use MONITOR.NLM or NetWare Remote Manager (NoRM) to look at the server threads to see who is using all the CPU time. NetWare Remote Manager will give more information on the process than MONITOR.NLM. If you see a particular NLM as the constant busiest thread then you can guess that the NLM's associated program may be the problem. You can then unload the suspected NLM and see what effect it has on the utilization.
    • MONITOR: Load MONITOR.NLM>Kernel >Busiest Threads?
    • NoRM: https://:8009 >PROFILE/DEBUG (under Diagnose Server) >Watch the threads. Click on the THREAD NAME(s) that appear busiest to look at the stack. Check the stack for NLMs you can unload. If the NLMs appear generic (such as SERVER.NLM) then continue to the next step.

  2. If no relevant information is given in MONITOR.NLM or Portal, then you can "Vanilla” the server by commenting all 3rdparty NLMs and programs that are non essential to server operation (Groupwise, IFolder, Virus scanners, Backup solutions, etc.)

  3. Take the server down and bring it up with -NS (server -ns). Load STARTUP.NCF, loadstage 1, and loadstage 2. Load MONITOR.NLM and view utilization. If utilization is low, then load the next stage check utilization again and load the next stage and so on. This will tell you specifically which stage the utilization is occurring in. Let’s say for example that loadstage 4 caused the utilization to climb to 100%. We would take the server back down, bring it up -NS and do load stages 1-3. Then we would do a list stage 4 and load each line one at a time checking the utilization after each line loaded.

  4. If the server fully loads in the"Vanilla” state with low utilization then you can add the removed NLMs one at a time watching utilization after each one is loaded.

  5. Some instances may require you to load the entire AUTOEXEC.NCF line by line to determine the cause of the high utilization.
Utilization LOW when disconnected from the LAN:

Troubleshooting NDS High Utilization

  1. Check to see if DS Synchronization is causing the high utilization. To check this do the following: unload DS.NLM and reload DS.NLM *NDB, then SET DSTRACE = !D and open the database. (Users currently connected to the server WILL NOT lose their connections by doing this). If utilization stays low then we know we have heavy DS synchronization traffic.

  2. Troubleshoot bindery requests. SET DSTRACE = +EMU (Emulate Bindery) to track down Bindery calls to NDS which are known to cause high utilization. If there is a lot of information scrolling quickly across the trace screen, there may be a bindery issue.

  3. Check for high valued attributes in DS.DSBROWSE -CV[num], where [num] is the minimum number of multi-values to display. E.g. DSBROWSE -CV1000.Use the search request form and do a normal search. (To find everything with high values, enter * as the name of the object and hit F10.) The log files are found in the directory SYS:\SYSTEM directory and the name is valcnt[num].log

  4. Troubleshoot SVC TYPEID. SET DSTRACE = ON, SET TTF = ON, SET DSTRACE = +DSA, SET DSTRACE = +BUFFERS, Wait 5-10 minutes then SET DSTRACE = OFF, SET TTF = OFF. Next you will want to search the DSTRACE.DBG file for any 603 errors. Generally the -603 error will also report the SVC TYPEID. If you are seeing this problem, the SVC TYPEID will have many instances in the log with the same connection number. You will need to track down the connection and follow the following to workaround/resolve the issue.
Further troubleshooting steps to be added in the future.