Article
Problem:
Sometimes we all experience services that die randomly. The ideal solution in those cases can take some time, like a patch, rebuild the server or wait for a service window. Being able to quickly implement a watchdog for that service makes our life as admins so much better. The following solution is simple, quick and really works in most cases. I have it in production use right now with very good results.
Solution:
The solution I use isn't really my own invention but I really like its simplicity. It's basically just a shell script called from cron. The script watches the service and restarts it in case of a crash. Saves the users on our network from loads of grief.
Example:
This is what a sample script for LUM on SLED10 looks like:
#!/bin/bash MYPROC=namcd #The name of the process INITS=namcd #The name of the /etc/init.d/ file COUNT=$(UNIX95=1 ps -C $MYPROC -o pid= -o args= | wc -l) #This command gets the number of occurances of the command $MYPROC. If its running it gives 0. if [ $COUNT -lt 1 ] #Checks if the service seems like its running or not. then /etc/init.d/$INITS start # The command to start the service fi
If we want to check for an open port, we get a script that looks like this:
#!/bin/bash PORT=:445 #The port, the : makes it easy to snag only ports and not other numbers in the output. INITS=samba #The name of the service in /etc/init.d/ COUNT=$(netstat -lpn | grep $ | wc -l) if [ $COUNT -lt 1 ] then /etc/init.d/$INITS start fi
We can also change the actions taken when we find out the service isn't running. For example with GroupWise we probably want to add a command after "then" to remove the leftover pid file:
rm /var/run/novell/groupwise/pidfile.pid
Where pidfile.pid is the name of the service that has crashed. Otherwise the agent won't restart.
Environment:
This script should work everywhere on any SUSE version.
Disclaimer: As with everything else at Cool Solutions, this content is definitely not supported by Novell (so don't even think of calling Support if you try something and it blows up).
It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.
Related Articles
User Comments
- Be the first to comment! To leave a comment you need to Login or Register
- 7690 reads


0