Troubleshooting and Debugging CIFS on Open Enterprise Server

  • 7008956
  • 07-Jul-2011
  • 09-Aug-2017

Environment

Open Enterprise Server 2015 (OES 2015) Linux
Novell Open Enterprise Server 2 (OES 2) Linux
Novell Open Enterprise Server 11 (OES 11) Linux
Novell Cluster Services
CIFS

Situation

Cannot access CIFS shares
CIFS stops listening or communicating
CIFS becomes unresponsive or hangs
rcnovell-cifs stop | restart | start fails
Clustered CIFS resources cannot fail over
General CIFS troubleshooting and debugging tips

Resolution

Carry out as many of the following steps as possible to establish where the problem may lie.  It is useful to first follow these steps on a working system in order to compare the output to a failing system.
  • 1.  Note that the output of most commands can be sent to a file with > filename
    • e.g.  rcnovell-cifs status > /tmp/cifs_status.out

  • 2.  Collect and examine a Support Config health check report
    • supportconfig

  • 3.  If possible, ensure that cifsd debugging and info are enabled before the problem occurs
    • novcifs -b yes
    • novcifs -f yes

  • 4.  Make a note of approximately when the problem occurred to simplify reading very large log files

  • 5.  Collect and examine the following log files
    • /var/log/messages
      • e.g.  grep segfault /var/log/messages
    • /var/opt/novell/log/cifs.log
    • /var/opt/novell/log/ncpserv.log

  • 6.  Check for core files
    • Default /usr/sbin

  • 7.  Collect all CIFS configuration details
    • Configuration files from /etc/opt/novell/cifs
    • All values of nfapCIFS* attributes of the NCP server object

  • 8.  Run and examine the output from the following commands
    • Check cifsd is running
      • rcnovell-cifs status
    • Check there are two instances of /usr/sbin/cifsd
      • ps -eaf | grep cifsd
  • 9.  Can CIFS be managed with iManager?
    • iManager -> Roles and Tasks
    • File Protocols -> CIFS
    • Select the relevant NCP Server object
      • Are there any error messages?

  • 10.  Ensure that the NCP Server object has the correct Object Extensions
    • iManager -> Roles and Tasks
    • Schema -> Object Extensions
    • Select the relevant NCP Server object
    • Check/add the following Auxiliary Class
      • nfapCIFSConfigInfo

  • 11.  Ensure that the NCP Server object has the correct Attributes
    • iManager -> Roles and Tasks
    • Directory Administration -> Modify Object
    • Select the relevant NCP Server object
    • General -> Other
    • Check/add the following Attribute and ensure it has a value of 4
      • nfapCIFSDialect
If cifsd is not actually running, continuing with the remaining steps in this TID may not be necessary
  • 12.  Establish the scale and scope of the problem
    • Can the file system be access via other enabled protocols such as NCP, AFP or FTP?
    • Can users with existing connections still access the file system?
    • Can users make new connections to the file system by name?
    • Can users make new connections to the file system by IP address?
    • If DFS (Distributed File Services) is involved, can a resource be accessed directly as opposed to via a DFS junction?
    • Does the problem affect all types of client?
      • Windows 7
      • Windows 2000/XP (it is often useful to have an older version of Windows to compare to)
      • Linux e.g.  smbclient //MYSERVER/MYSERVER_W/VOLUME1 -U myuser

  • 13.  Check if cifsd process shows as being extremely busy or is consuming large amounts of memory
    • top

  • 14.  Collect and examine the output from the following commands
    • smbstatus
    • novcifs -o
    • novcifs -sl
    • top -n 1 -b

  • 15.  Collect and examine the output from the following commands
    • Check that cifsd is listening on ports 137, 138, and 139 on all physical and, if running on a cluster, virtual server IP addresses
      • netstat -anp | grep cifs
    • Check that the cifsd NetBIOS Name Service still responds (where AA.BB.CC.DD is the IP address of the server)
      • nbtstat -A AA.BB.CC.DD (from a Windows workstation)
      • nmblookup -A AA.BB.CC.DD (from a Linux workstation with the samba-client rpm package

  • 16.  Collect gstack and core dump getcore packages of each cifsd process using gdb
  • 17.  Collect and examine a LAN trace of all traffic after the problem has occurred
    • e.g.  Capture all traffic while a client tries to connect to the CIFS server by IP address and by NetBIOS Name
      • tcpdump -i any -s0 -w /tmp/capture.cap

  • 18.  Collect and examine a LAN trace of NetBIOS traffic before the problem has occurred and while it is occurring
    • e.g.  Capture all NetBIOS traffic in 20 trace files of 128MB each - when the 20th file is full, it will start storing the captured packets in the first file again, rotating storage of captured packets over the 20 files, and hence the total amount of disk space used by tcpdump will not exceed 2GB in this case
      • tcpdump -s 0 -i any -w /tmp/cifs.pcap -C 128 -W 20 port 137 or port 138 or port 139

  • 19.  After all of the necessary steps have been carried out, attempt to stop and start the cifs daemon, ensuring that it has fully shut down
    • rcnovell-cifs stop
    • rcnovell-cifs status
    • If cifsd is still shown as running attempt to stop it again; if it still will not stop, kill the process
      • pgrep cifsd
      • kill [PIDs in previous step]
    • In some cases it may be necessary to restart eDirectory before restarting CIFS
      • rcndsd restart
    • Restart CIFS
      • rcnovell-cifs start

  • 20.  If CIFS is clustered it may be necessary to offline the resource, stop/start cifsd and online the resource
    • Alternatively, fail over the CIFS resource to another node

    Additional Information

    Both CIFS and SAMBA provide some of the same functionality and may both be on the same network (but not the same server), especially in environments where Domain Services for Windows (DSfW) has been deployed, so ensure that any server providing Windows shares is actually supposed to be running CIFS in the first place, rather than SAMBA.