Automatically Restart a Service if it Crashes
Novell Cool Solutions: Tip
By Daniel Hedblom
Digg This -
Posted: 9 Feb 2007
Sometimes we all experience services that die randomly. The ideal solution in those cases can take some time, like a patch, rebuild the server or wait for a service window. Being able to quickly implement a watchdog for that service makes our life as admins so much better. The following solution is simple, quick and really works in most cases. I have it in production use right now with very good results.
Solution:The solution I use isn't really my own invention but I really like its simplicity. It's basically just a shell script called from cron. The script watches the service and restarts it in case of a crash. Saves the users on our network from loads of grief.
This is what a sample script for LUM on SLED10 looks like:
#!/bin/bash MYPROC=namcd #The name of the process INITS=namcd #The name of the /etc/init.d/ file COUNT=$(UNIX95=1 ps -C $MYPROC -o pid= -o args= | wc -l) #This command gets the number of occurances of the command $MYPROC. If its running it gives 0. if [ $COUNT -lt 1 ] #Checks if the service seems like its running or not. then /etc/init.d/$INITS start # The command to start the service fi
If we want to check for an open port, we get a script that looks like this:
#!/bin/bash PORT=:445 #The port, the : makes it easy to snag only ports and not other numbers in the output. INITS=samba #The name of the service in /etc/init.d/ COUNT=$(netstat -lpn | grep $ | wc -l) if [ $COUNT -lt 1 ] then /etc/init.d/$INITS start fi
We can also change the actions taken when we find out the service isn't running. For example with GroupWise we probably want to add a command after "then" to remove the leftover pid file:
Where pidfile.pid is the name of the service that has crashed. Otherwise the agent won't restart.
This script should work everywhere on any SUSE version.
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com