NCP/NDSD becomes unresponsive

  • 7014354
  • 26-Dec-2013
  • 26-Dec-2013

Environment

Novell Open Enterprise Server 11 SP1 (OES11SP1)
eDirectory 8.8.7
NCP

Situation

NDSD becomes unresponsive.
Several ncp engine threads are seen stuck behind a mutex lock

Resolution

Resolved with November 2013 Maintenance Patch

Cause

A mutex lock "NCPStreamGroupMutex" is seen and is waiting for cache entry lock for a write... another thread shows it acquired cache entry lock for and entryID to read and waits for mutex lock "NCPStreamGroupMutex".

This causes a dead-lock situation which causing the hang.

Additional Information

Thread 92 (Thread 19613):
#0 0x00007fee79dd4eb0 in pthread_rwlock_wrlock () from
/root/Prasad/svn/828044/lib64/libpthread.so.0
#1 0x00007fee76d674cb in RemoveLockFromDirCacheEntry(NLockHandle*, int) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#2 0x00007fee76d6872e in CloseAllFileHandles(int, int) () from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#3 0x00007fee76d47579 in NCPResetConnection(int, int) () from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#4 0x00007fee76d475bf in NCPServFreeConnection(unsigned int) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#5 0x00007fee76d80ce3 in NCPEngine_DestroyConn () from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#6 0x00007fee76d430c6 in AddressManager::removeSocket(unsigned int, int) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#7 0x00007fee76d6fdcf in
INCP::ServiceStreamGroupConnections(StreamGroupStruct*) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#8 0x00007fee76d7069a in NCPPollerThread(StreamGroupStruct*) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#9 0x0000000000416e68 in ?? ()
#10 0x00007fee79dd17b6 in start_thread () from
/root/Prasad/svn/828044/lib64/libpthread.so.0
#11 0x00007fee79393c5d in clone () from /root/Prasad/svn/828044/lib64/libc.so.6
#12 0x0000000000000000 in ?? ()
This thread in function removeSocket(), calls LockStreamGroup() which acquired
the mutex lock "NCPStreamGroupMutex". Later in function
RemoveLockFromDirCacheEntry() it does call WriteLockCacheEntry() to acquire rw
lock on a cache entry...
Thread 4531 has the following call stack...
============================================
Thread 9 (Thread 4531):
#0 0x00007fee79dd8294 in __lll_lock_wait () from
/root/Prasad/svn/828044/lib64/libpthread.so.0
#1 0x00007fee79dd3619 in _L_lock_1008 () from
/root/Prasad/svn/828044/lib64/libpthread.so.0
#2 0x00007fee79dd342e in pthread_mutex_lock () from
/root/Prasad/svn/828044/lib64/libpthread.so.0
#3 0x00007fee7a85f2e7 in SAL_MutexAcquire () from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/libsal.so.1
#4 0x00007fee76d4402c in NCPServer::KillConnection(int) () from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#5 0x00007fee76d44152 in NCPKillConnection () from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#6 0x00007fee76d456fb in NCPServer::SendBroadcastPing(int, char, unsigned int)
()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#7 0x00007fee76d4f541 in BreakL2OpenCallBackByEntry(CacheEntry*) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#8 0x00007fee76d64f15 in LockDirCacheEntry(int, int, int, unsigned int, int,
unsigned char*, int, LockEntryInfo*) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#9 0x00007fee76d7864b in CreateOrOpenFile(unsigned int, int, int, unsigned
int, char*, int, int, int, int, int, int, int, int*, unsigned int*, unsigned
int*, pseudo_netware_direntry*, CacheEntryInfo*, stat*) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#10 0x00007fee76d86c9d in Case89(unsigned int, int, svc_request*, int) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#11 0x00007fee76d9c377 in ExecuteNCPPacket(unsigned int, svc_request*, int) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#12 0x00007fee76d6d3a3 in INCP::HandleNCPFileServiceRequest() ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#13 0x00007fee76d6f025 in INCP::Process(int, void (*)(void*, int, int, unsigned
long, void const*, int (*)(void*, int, unsigned char, unsigned int, ...))) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#14 0x00007fee76d6f31b in INCP::HandleNCPRequest(ReceiveBufferStruct*, int,
int*) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#15 0x00007fee76d6fffb in
INCP::ServiceStreamGroupConnections(StreamGroupStruct*) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#16 0x00007fee76d7069a in NCPPollerThread(StreamGroupStruct*) ()
from
/root/Prasad/svn/828044/opt/novell/eDirectory/lib64/nds-modules/libncpengine.so
#17 0x0000000000416e68 in ?? ()
#18 0x00007fee79dd17b6 in start_thread () from
/root/Prasad/svn/828044/lib64/libpthread.so.0
#19 0x00007fee79393c5d in clone () from /root/Prasad/svn/828044/lib64/libc.so.6
#20 0x0000000000000000 in ?? ()
The function BreakL2OpenCallBackByEntry() calls ReadLockCacheEntry() to
acquire a read lock on cache entry and succeeds and in further call stack in
function KillConnection(), it does call LockStreamGroup() which in-turn calls
SAL_MutexAcquire to acquire the mutex "NCPStreamGroupMutex".