DSfW: Winbind crashes in dcerpc_lsa_lookup_sids () in ADC

  • 7014259
  • 10-Dec-2013
  • 10-Dec-2013

Environment

Novell Open Enterprise Server 11 SP1 (OES11SP1)
Domain Services for Windows
DSfW

Situation

Winbind crashes in dcerpc_lsa_lookup_sids () in ADC
Slow WAN links between the PDC and ADC servers
RPC Endpoint Mapper Service fails
WINBIND daemon cores
Back Trace for the WINBIND Core show the following:
#0  0x00007f8c7b282b35 in raise () from /lib64/libc.so.6
#1  0x00007f8c7b284111 in abort () from /lib64/libc.so.6
#2  0x00007f8c7e3c6adb in dump_core ()
#3  0x00007f8c7e3d6845 in smb_panic ()
#4  0x00007f8c7e3c7070 in ?? ()
#5  <signal handler called>
#6  0x00007f8c7e715499 in dcerpc_lsa_lookup_sids_generic ()
#7  0x00007f8c7e31464e in winbindd_lookup_sids ()
#8  0x00007f8c7e317ce2 in ?? ()
#9  0x00007f8c7e304200 in ?? ()
#10 0x00007f8c7e3222a3 in _wbint_LookupGroupMembers ()
#11 0x00007f8c7e32bb16 in ?? ()
#12 0x00007f8c7e320fe1 in winbindd_dual_ndrcmd ()
#13 0x00007f8c7e31fd42 in ?? ()
#14 0x00007f8c7e320865 in ?? ()
#15 0x00007f8c7bc003fa in tevent_common_loop_immediate () from /usr/lib64/libtevent.so.0
#16 0x00007f8c7e3e577b in run_events_poll ()
#17 0x00007f8c7e3e5eb2 in ?? ()
#18 0x00007f8c7bbff190 in _tevent_loop_once () from /usr/lib64/libtevent.so.0
#19 0x00007f8c7e2f802b in main ()

Resolution

The planned change is to make winbind rpc timeout configurable in the smb.conf

Example would be in the [Global] section of the /etc/samba/smb.conf:
winbind rpc timeout = 100000

Contact Novell Support for a possible FTF until the fix is available in a Maintenance Patch.

Cause

The core dump only happening when 'domain users' is in the response to the request for sid-to-name conversion.  Those sids are for user sids that are part of domain users group.  As seen in the core, winbindd_lookup_sids sends the request to dcerpc_lsa_lookup_sids_generic.  The logs report all sids are properly resolved and xadsd converts all the sids to the proper name.  The issue is rpc times out of after 35 seconds.
The setting for the rpc time out is hard coded.  Currently the setting  in the code is 35000 (35 seconds).
When several request are made over a slow wan link or a busy server (xadsd and ndsd are slow to respond), winbind might receive a response for some of the requested sids before it time out.  Since the request is partially fulfilled according to winbind, a core dump is created for winbind.

Status

Reported to Engineering