iManager errors while trying to manage Cluster, Storage, User Quotas, CIFS or AFP

  • 7010295
  • 12-Mar-2012
  • 22-Oct-2013

Environment

Novell Cluster Services
Novell Open Enterprise Server 11 (OES 11) Linux
Novell Open Enterprise Server 11 (OES 11) Linux Support Pack 1
Novell iManager 2.7.4

Situation

There are quite a few different errors that are being reported back through iManager, depending on which plug-in is being accessed and the nature of the underlying problem. 

Error shown on iManager page:
  • Server error! Error 500 The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there was an error in a CGI script.
  • Error 503 Service unavailable!  The server is temporarily unable to service your request due to maintenance downtime or capacity problems.  Please try again later.  
  • Error: CIMOM error occurred: could not seek in the management file 
  • Error: CIMOM cannot read from given file
  • Error: CIMOM cannot write to given file

Resolution

The following steps can be taken to troubleshoot the problem and resolve the issue.
  • Make sure sfcb service is running and listening on port 5989
    • Command: rcsfcb status  -- should report "running".   
      • Results:  Checking for Small Footprint CIM Broker (SFCB):       running
    • Command: netstat -taupen | grep 5989   -- should show the sfcbd daemon in state of LISTEN
      • Results:    tcp        0      0 :::5989                 :::*                    LISTEN      0          37018706   6776/sfcbd 
Note: You can also see sfcb connections in different states.  This might indicate a problem with a stuck sfcb process and may need to restart sfcb service and continue troubleshooting steps below.
                       sysctl -w net.ipv4.conf.all.rp_filter=0
                            Note: To make this permanent, edit the /etc/sysctl.conf file
                                      and run command"sysctl -p" to read settings from /etc/sysctl.conf.
  •  Restart novell-tomcat6.  Command:  rcnovell-tomcat6 restart
  •  Restart sfcb.  Command: rcsfcb restart
  •  Test the iManager plug-in again.
  • Verify LUM and LDAP are functioning properly on the node being accessed.  

Note: User logged into iManager is the user which needs to be LUM enabled for the server in which it's making the CIMOM / SFCB connection to.

  • Verify Server certificates on the server along with certificates used by LUM. You can use iManager --> Novell Certificate Access --> Server Certificates to validate certificates.   If the certificate is recreated or modified and /etc/nam.conf file is pointing to this server, then you should also run the "namconfig -k" option to recreate the certificate file LUM uses in  /var/lib/novell-lum/*.der.  TID 7011790 - Various authentication problems with iManager, openwbem, CIMOM, LUM or LDAP
  • Use "id <username>" for username that is being used to login to iManager.  The user must be LUM enabled.  
  • The sfcb service must also be LUM enabled.  TID 3417215 - Section: Check to make sure OpenWBEM is a LUM enabled service
  • Make sure "preferred-server=  "  setting in the /etc/nam.conf is pointing to an eDirectory replica server with copies of objects being referenced.
    • Depending on the iManager plug in request, there can be many LUM --> LDAP authentication requests going on which can increase the time for each request to come back. 
    • If the user is part of many LUM groups this could add to the delay.
  •  Check how many SFCB threads are currently running. 
    • Use command "pgrep sfcb | wc -l"  If the result is higher than 13 on an OES server,  there is a high possibility of one of these threads being in a hung state.
    • Restart sfcb service and test.  Command: "rcsfcb restart"
    • If the problem persists at this point, temporarily unload Novell Remote Manager (NRM) and restart sfcb and test.   Commands: rcnovell-httpstkd stop  &  rcsfcb restart
    • If the server is patched with November 2012 OES Maintenance update patch, use instructions in Note below to disable NRM's health gathering processes.
Note regarding Novell Remote Manager (NRM): 
  • NRM makes several calls into different health providers every 60 seconds.   Eliminating NRM will drastically reduce the number of calls being made into the core sfcb processes.
  • Eliminating NRM seems to have stabilized some customer environments.  You can unload / load NRM as needed until a more permanent solution is provided via the update channel.

  • Note:  With release of the November 2012 OES Maintenance update,  a debug flag option can be used to disable the health gathering information provided in the NRM interface.
  • Required versions for flag: 
  • OES11 - novell-nrm-2.0.2-297.114.2
  • OES11 SP1 - novell-nrm-2.0.2-297.300.2

  • To disable the health information Novell Remote Manager (NRM) gathers through SFCB providers, follow the steps outlined below.
  • 1. Make sure the novell-nrm required version from OES November 2012 Maintenance update has been applied to the server.
  • 2. Stop the service.  rcnovell-httpstkd stop
  • 3. Edit the configuration file "/etc/opt/novell/httpstkd.conf" and add the following options to the bottom of this conf file without the quotes.
  • "DaemonDebugFlags hlthoff"
  • 4. Start the service. rcnovell-httpstkd start
  • 5. Test NRM by accessing the https://<localIPAddress>:8009 or http://<localIPAddress>:8008

  • Note: This will disable gathering of the health monitoring information so values on the main "Health Monitor" page will be zero and Health traffic light will be green.  You will also be missing the Services section where it provided the status of each service and ability to start|stop|restart through NRM.  If you still see these values getting populated with numbers, the options in the /etc/opt/novell/httpstkd.conf file are incorrect.
  •  
Note regarding different iManager plug-ins which use a CIMOM / SFCB connection: 
  • Cluster:  It will always connect via CIMOM (sfcb) to the cluster node hosting the Master IP resource.   Troubleshooting sfcb needs to be done on this node.
  • Storage:  Whichever NCP server object is selected to manage the pool | volume | user quotas is the node which sfcb troubleshooting needs to be done.
  • File Protocols (CIFS / AFP /Samba):  NCP server object that is selected is the node which sfcb troubleshooting needs to be done. 
  • Archive versioning:   NCP server object that is selected is the node which sfcb troubleshooting needs to be done.
If these steps do not resolve the issue, please contact Novell Support to continue troubleshooting.
  • Please gather debug.html ( /var/opt/novell/tomcat6/webapps/nps/WEB-INF/logs/debug.html) from iManager after changing the Logging level in configure iManager to "
  • Obtain a current supportconfig of the offending node.

Additional Information