Making Sure Self-healing Apps Work
Novell Cool Solutions: Tip
By Timothy Forde
Digg This -
Posted: 25 Jan 2001
Frequently people ask how self-healing apps work. They often get stymied when a program like Word has a problem distributing the Spellchecker files, but NAL doesn't catch the problem and heal the app. Here's an excellent explanation that will help you understand what's going on, from Novell Consultant Timothy Forde.
When you launch a Windows task you get one of two things
- A task handle, which is a largish 16- 0r 32-bit integer used to track the process, allocate resources to its execution, etc.
- An error code, which is typically a small integer (1 - 31?). These correspond to 'Bad command or file name", "Outa memory," etc.
The beauty of using NAL to launch things is that NAL checks the return code and self-heals if it gets one of the error codes. If it gets no error codes, it watches the task list like some hungry cat, waiting until the task no longer appears in the list. This is the cue to do an post-launch script execution, cleanup network resources, etc.
The trap is this - NAL can only track what it launches.
If the primary executable gets away OK (for example, WinWord.exe), establishes itself in a Windows VM, gets its own memory heap, finds other crucial files & DLL's it needs, and starts executing, NAL thinks everything is OK. (Incidentally many programmes will abort (tipping NAL off) if some other crucial DLL cannot be found.)
If, however, the missing files are not required to get a successful launch on the primary executable, things aren't so simple. For instance, if a component of Word like Spellchecker or MS Graph is broken, NAL will not self-heal. NAL will only track the primary EXE and not any other OLE server or EXE it launches in turn.
The classic trap is when you use NAL to launch a wrapper programme (a small loader programme that shells straight out to another programme.) NAL.exe itself an example. If the main programme files are damaged, the wrapper may not return an error, in which case NAL will not self-heal. That is why the Verify option is such a key part allowing the user to force NAL to check and redistribute.
The other irritating phenomenon you might see is when the task NAL launches finishes, NAL executes the post tasks such as cleaning up network resources, maybe deleting mapped drives and queues, etc., possibly while the programme (from the users perspective) is still going. Doh!
Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com