DSFW: xadsd and rpcd hang when a cthread is canceled, causing the server to become unresponsive

  • 7013412
  • 04-Oct-2013
  • 04-Oct-2013

Environment

Novell Open Enterprise Server 11 SP1 (OES11SP1)
Domain Services for Windows
DSfW

Situation

xadsd and rpcd hang when a cthread is canceled
xadsd becomes unresponsive
DSfW becomes unresponsive

Resolution

Cause

2 daemons in DSfW - rpcd and xadsd - link to dcerpc libraries.
This results in call thread (cthread) creation in the respective daemon.
By default, the count of cthreads in a daemon is constant - 10 in xadsd, 5 in rpcd.
In some environments both the xadsd and rpcd hang situation has been observed.

When a hang is observed, it is noticed that the count of cthread of that particular daemon goes down by 1. The one instance of cthread that went down is holding the mutex - cthread_mutex. So the cthread_mutex is now in a locked state forever.

2 important mutexes are - cthread_mutex and rpc_g_global_mutex (global mutex).
One of the receiver thread after receiving all the queued data, will attempt to trigger a cthread (for the RPC execution). To do this, the receiver thread having already holding the global mutext (rpc_g_global_mutex), attempts to lock the cthread_mutex. Since the cthread_mutex is locked forever, the receiver thread ends up waiting on the mutex indefinitely which will hold the lock to the global mutex (rpc_g_global_mutex).
With both the mutexes getting into the locked state indefinitely, this results in the hang of respective daemon.