Environment
Situation
When troubleshooting issues with the ndsd process, there are more things to consider than just eDirectory.
Here is a list of some of the pieces that could also have an effect on eDirectory:
- Novell Core Protocol (NCP)
-
Novell Storage Services (NSS)
-
Linux User Management (LUM)
- ZenWorks
- Novell Modular Authentication Service (NMAS)
- LDAP
- Novell Auditing (lcache)
Symptoms:
Can't login to server
Slow login times to server
Can't map drives to the server
Ndsd daemon is running but no activity in ndstrace.
Ndsd process pegged at high cpu utilization displayed in top.
Resolution
1. Apply the latest patches from the channel.
First step is to make sure the server has the latest patches. Tid 3426981 is a great reference which lists all fixes that have been included in each major and minor version of eDirectory. Check it to see if your issue has already been addressed.
2. Use supportconfig
You can use supportconfig to upload the config bundle and this data will automatically be processed by Novell's Support Advisor server. The report will then be automatically attached to the Service Request.
This guide provides information on tuning your eDirectory environment for improved performance.
This guide is broken up into the following sections:
eDirectory Subsystems - describes primary subsystems that influence eDirectory performance.
Analyzing System Bottlenecks - describes various system resources and their influence on eDirectory performance.
Tuning eDirectory Subsystems - describes how to analyze and tune eDirectory under various conditions and environments.
The eDirectory Configuration - describes how to configure various tunable parameters.
Gstack will dump out stack traces of each thread under the ndsd process. This can provide insight on what might be causing the utilization. Please refer to Tid 70071838 for more information.
ps -C ndsd -L -o pid,tid,nlwp,pcpu,pmem,vsz,stat
-C allows you to specify process by command name.
-L will include the threads under ndsd (LWP = light weight process = child threads)
-o allows custom display of desired info.
pid = process id
tid = thread id
nlwp = number of threads under the process
pcpu = percent of the cpu the process is using
pmem = percent of the total memory the process is using
vsz = virtual memory size of the process in kilobytes
stat = current state of the process (for more info, “man ps” and search for “PROCESS STATE”)
See additional information section below for more information and a sample script that could be used to gather this and other relevant data..
5. Force a core dump of the ndsd process for NTS to analyze.
Along the same lines as the stack traces, it may be useful to force a core of the ndsd process. The core contains more information that can be useful to NTS. The gcore script is a part of gdb on Suse and can be used to accomplish this. Gcore will not halt ndsd or cause ndsd to become less responsive than it already is. The gcore utility can be used as follows:
gcore `pgrep ndsd` (or you can find the pid # for ndsd using ps –eaf | grep ndsd
This will generate a core file in the current directory.
Note: There have been some cases with certain versions of gdb and the kernel that may not provide a useful core. See Tid 7004715&sliceId=1 for more information. At the very least grab the stack traces as the gstack utility does note have this issue.
6. Try to narrow the scope.
You can prevent some components of ndsd from auto loading on startup by modifying the ndsmodules.conf. This file is typically located at /etc/opt/novell/eDirectory/conf/ndsmodules.conf. Some of the components can not be remarked out as they are necessary for ndsd to function. These ones are labeled with system,fail in the second column. The following are some modules that you do not want to remark out:
dhlog auto,fail #DHost logger
ncpengine auto,system,fail #Core NCP Services
dsloader auto,system,fail #Loader
masv auto,system,fail #Modular Authentication Services
nds auto,system,fail #Core DS Services
Here are some components which could be remarked out in order to see if utilization or hang persists (and still retain basic eDirectory functionality.
#httpstk auto #DHost HTTP Stack
#hconserv auto #HConServ
#nldap auto #LDAP Server
#imon auto #iMon
#embox auto #eMBox
#pkiserver auto #PKI server
#ssncp auto #SecretStore
This methodogy would typically be used when the hang / utilization happens pretty quickly after startup. If after making these changes utilization / hang does not occur, start adding the modules back one-by-one until you know the culprit.
7. Bump up the log level and see what's going on in the ndsd.log.
The ndsd.log is typically located in /var/opt/novell/eDirectory/log/ndsd.log. The debug level for the ndsd.log can be increased by modifying / adding the following variable in the nds.conf file: (typically located in /etc/opt/novell/eDirectory/conf/nds.conf)
n4u.server.log-levels=Logxxxx
The default level is LogError. During troubleshooting, it would be a good idea to at least set this level to "LogWarn" This may help identify problem that wouldn't normally show up in the ndsd.log. This would then require a restart of ndsd for this change to take effect.
For more information on log levels see Novell's Documentation.
8. NCP debugging
The ncpengine on OES linux is divided up between filesystem and eDirectory threads. There are a finite number of threads defined for each by default. A new enhancement available in ncpcon can be used to identify the number of threads in use for each. If either thread pool has hit it's ceiling it could lead to a hang or high utilization. Based on this output, the ceiling could be increased to possibly help the situation. For more information see: Tid 7007134.
In addition to the thread information it would be a good idea to turn on ncp logging. This can be done by typing "ncpcon log everything". This will write to multiple logs in the /var/opt/novell/log/* directory that could be zipped up and sent to NTS.
If nss volumes are being used, you may want to consider tuning as sometimes the default settings are not optimal for all environments and could lead to the same symptoms. Please refer to TID
7004888 for more information on how to do this.
9. Linux User Management (LUM/namcd)
Depending on how LUM is configured, it could lead to issues with utilization, especially if many different OES boxes are pointing to the same preferred server. Then /etc/nam.conf is where this variable is stored. Typically it is best to point “preferred-server=” to it's own IP Address. It may also be a good idea to set “persistent-search from "yes" to "no" as has also causes high utilization in some cases. This variable is also in the /etc/nam.conf . For more information see: Tid 7006086
Check that persistent search is disabled on the ldap server object as well.
Use iManager or ldapconfig to check. For ldapconfig do:
ldapconfig get ldapEnablePSearch
At the prompt enter the FQN for admin in .x500 example: admin.novell
To set the persistent search to no do the following:
ldapconfig set "ldapEnabledPSearch=no"
10. Novell Modular Authentication Services (NMAS)
NMAS is heavily integrated with eDirectory these days. It is used to enforce/enable various login methods and provide enhanced password features such as Universal Password. There have been cases where nmas has caused utilization issues. This will likely effect Netware servers utilization, but worth mentioning. For more info see: Tid 7002047
11. Audit (lcache)
Novell Auditing has been known to cause ndsd to become unresponsive at times. If auditing is running consider taking it out of the picture to see if the problem goes away.
Additional Information
. /opt/novell/eDirectory/bin/ndspath &> /dev/null
## Variables
LOGDIR=/var/log
VARDIR=$(ndsconfig get n4u.server.vardir | awk -F"vardir=" '{print $2}')
NDSD_PID=`cat $VARDIR/ndsd.pid`
LOG_OUTPUT=$LOGDIR/ndsd_util.txt
NDSD_PS_COMM="/bin/ps -C ndsd -L -o pid,tid,nlwp,pcpu,pmem,vsz,stat"
NDSD_THREADS_COMM="ndstrace -c threads"
NCP_THREADS_COMM="ncpcon threads"
GSTACK_COMM="/usr/bin/gstack $NDSD_PID"
NCPCONBIN=/sbin/ncpcon
NCP_CONN_SUMMARY="$NCPCONBIN connections"
NCP_CONN="$NCPCONBIN connections list"
HEADER="-------------------------------------------------------------------"
FOOTER="==================================================================="
addspace() {
echo "">> $LOG_OUTPUT
}
addheader() {
echo $HEADER >> $LOG_OUTPUT
addspace
}
addfooter() {
addspace
echo $FOOTER >> $LOG_OUTPUT
}
##Grab ps output from ndsd
ndsd_ps() {
echo "BEGIN PS_OUTPUT at $(date)">> $LOG_OUTPUT
addheader
echo "Command: $NDSD_PS_COMM" 1>> $LOG_OUTPUT
addspace
$NDSD_PS_COMM >> $LOG_OUTPUT
addspace
echo "END PS_OUTPUT at $(date)">> $LOG_OUTPUT
addfooter
}
##Grab output from ncpcon threads
ncp_threads() {
echo "BEGIN NCP_THREADS at $(date)">> $LOG_OUTPUT
addheader
echo "Command: $NCP_THREADS_COMM" 2>/dev/null 1>> $LOG_OUTPUT
$NCP_THREADS_COMM 2>/dev/null 1>> $LOG_OUTPUT
addspace
echo "END NCP_THREADS at $(date)">> $LOG_OUTPUT
addfooter
}
##Grab output from ndstrace -c threads
ndsd_threads() {
echo "BEGIN NDSD_THREADS at $(date)">> $LOG_OUTPUT
addheader
echo "Command: $NDSD_THREADS_COMM" 2>/dev/null 1>> $LOG_OUTPUT
addspace
$NDSD_THREADS_COMM | grep -v Instance 2>/dev/null 1>> $LOG_OUTPUT
addspace
echo "END NDSD_THREADS at $(date)">> $LOG_OUTPUT
addfooter
}
##GSTACK
ndsd_gstack() {
echo "BEGIN GSTACK at $(date)">> $LOG_OUTPUT
addheader
echo "Command: $GSTACK_COMM" 1>> $LOG_OUTPUT
addspace
$GSTACK_COMM 1>> $LOG_OUTPUT
addspace
echo "END GSTACK at $(date)">> $LOG_OUTPUT
addfooter
}
##Grab NCP connection summary and connection information
ndsd_connections() {
echo "BEGIN CONNECTIONS at $(date)">> $LOG_OUTPUT
addheader
echo "Command: $NCP_CONN_SUMMARY">> $LOG_OUTPUT
addspace
$NCP_CONN_SUMMARY 2>/dev/null 1>> $LOG_OUTPUT
addspace
echo "Command: $NCP_CONN" 1>> $LOG_OUTPUT
$NCP_CONN 2>/dev/null 1>> $LOG_OUTPUT
addspace
echo "END CONNECTIONS at $(date)">> $LOG_OUTPUT
addfooter
}
## Grab output from each defined function ensuring each one completes before going on
ndsd_ps && ndsd_gstack && ndsd_threads && ncp_threads && ndsd_connections
echo "############### Finished at $(date) ##############">> $LOG_OUTPUT
### End of Script