Novell Home

My Favorites

Close

Please to see your favorites.

Troubleshooting and Debugging CIFS on Open Enterprise Server

This document (7008956) is provided subject to the disclaimer at the end of this document.

Environment

Novell Open Enterprise Server 2 (OES 2) Linux
Novell Open Enterprise Server 11 (OES 11) Linux
Novell Cluster Services
CIFS

Situation

Cannot access CIFS shares
CIFS stops listening or communicating
CIFS becomes unresponsive or hangs
rcnovell-cifs stop | restart | start fails
Clustered CIFS resources cannot fail over
General CIFS troubleshooting and debugging tips

Resolution

Carry out as many of the following steps as possible to establish where the problem may lie.  It is useful to first follow these steps on a working system in order to compare the output to a failing system.
  • 1.  Note that the output of most commands can be sent to a file with > filename
    • e.g.  rcnovell-cifs status > /tmp/cifs_status.out

  • 2.  Collect and examine a Support Config health check report
    • supportconfig

  • 3.  If possible, ensure that cifsd debugging and info are enabled before the problem occurs
    • novcifs -b yes
    • novcifs -f yes

  • 4.  Make a note of approximately when the problem occurred to simplify reading very large log files

  • 5.  Collect and examine the following log files
    • /var/log/messages
      • e.g.  grep segfault /var/log/messages
    • /var/opt/novell/log/cifs.log
    • /var/opt/novell/log/ncpserv.log

  • 6.  Check for core files
    • Default /usr/sbin

  • 7.  Collect all CIFS configuration details
    • Configuration files from /etc/opt/novell/cifs
    • All values of nfapCIFS* attributes of the NCP server object

  • 8.  Run and examine the output from the following commands
    • Check cifsd is running
      • rcnovell-cifs status
    • Check there are two instances of /usr/sbin/cifsd
      • ps -eaf | grep cifsd
  • 9.  If cifsd is not actually running, continuing with the remaining steps in this TID may not be necessary
  • 10.  Establish the scale and scope of the problem
    • Can the file system be access via other enabled protocols such as NCP, AFP or FTP?
    • Can users with existing connections still access the file system?
    • Can users make new connections to the file system by name?
    • Can users make new connections to the file system by IP address?
    • If DFS (Distributed File Services) is involved, can a resource be accessed directly as opposed to via a DFS junction?
    • Does the problem affect all types of client?
      • Windows 7
      • Windows 2000/XP (it is often useful to have an older version of Windows to compare to)
      • Linux e.g.  smbclient //MYSERVER/MYSERVER_W/VOLUME1 -U myuser

  • 11.  Check if cifsd process shows as being extremely busy or is consuming large amounts of memory
    • top

  • 12.  Collect and examine the output from the following commands
    • smbstatus
    • novcifs -o
    • novcifs -sl
    • top -n 1 -b

  • 13.  Collect and examine the output from the following commands
    • Check that cifsd is listening on ports 137, 138, and 139 on all physical and, if running on a cluster, virtual server IP addresses
      • netstat -anp | grep cifs
    • Check that the cifsd NetBIOS Name Service still responds (where AA.BB.CC.DD is the IP address of the server)
      • nbtstat -A AA.BB.CC.DD (from a Windows workstation)
      • nmblookup -A AA.BB.CC.DD (from a Linux workstation with the samba-client rpm package

  • 14.  Collect gstack and core dump getcore packages of each cifsd process using gdb
    • Prepare the server
    • Get the Process ID (PID) for all cifsd processes
      • pgrep cifsd
    • Get a gstack for each PID from previous step (it may sometimes be necessary to take 6 to 8 gstacks a few seconds apart)
      • cd /diags
      • gstack [PID]> [PID].gstack[1,2...x]
    • Get a core dump for each PID
      • cd /diags
      • gdb
      • attach [PID]
      • gcore
      • detach
      • quit
    • Package core dumps with novell-getcore
      • novell-getcore -b /diags/core.[PID] /usr/sbin/cifsd

  • 15.  Collect and examine a LAN trace of all traffic after the problem has occurred
    • e.g.  Capture all traffic while a client tries to connect to the CIFS server by IP address and by NetBIOS Name
      • tcpdump -i any -s0 -w /tmp/capture.cap

  • 16.  Collect and examine a LAN trace of NetBIOS traffic before the problem has occurred and while it is occurring
    • e.g.  Capture all NetBIOS traffic in 20 trace files of 128MB each - when the 20th file is full, it will start storing the captured packets in the first file again, rotating storage of captured packets over the 20 files, and hence the total amount of disk space used by tcpdump will not exceed 2GB in this case
      • tcpdump -s 0 -i any -w /tmp/cifs.pcap -C 128 -W 20 port 137 or port 138 or port 139

  • 17.  After all of the necessary steps have been carried out, attempt to stop and start the cifs daemon, ensuring that it has fully shut down
    • rcnovell-cifs stop
    • rcnovell-cifs status
    • If cifsd is still shown as running attempt to stop it again; if it still will not stop, kill the process
      • pgrep cifsd
      • kill [PIDs in previous step]
    • In some cases it may be necessary to restart eDirectory before restarting CIFS
      • rcndsd restart
    • Restart CIFS
      • rcnovell-cifs start

  • 18.  If CIFS is clustered it may be necessary to offline the resource, stop/start cifsd and online the resource
    • Alternatively, fail over the CIFS resource to another node

    Disclaimer

    This Support Knowledgebase provides a valuable tool for NetIQ/Novell/SUSE customers and parties interested in our products and solutions to acquire information, ideas and learn from one another. Materials are provided for informational, personal or non-commercial use within your organization and are presented "AS IS" WITHOUT WARRANTY OF ANY KIND.

    • Document ID:7008956
    • Creation Date:07-JUL-11
    • Modified Date:03-NOV-14
      • NovellOpen Enterprise Server

    Did this document solve your problem? Provide Feedback