Novell Cool Solutions

Anatomy of an Abend.log



By:

August 31, 2006 1:40 pm

Reads:4,100

Comments:0

Score:Unrated

Print/PDF

Obviously not being an OS engineer I don’t know it all, but we get enough questions along the lines of “hey that FTF says it fixed ‘THE’ GWIA abend, but mine is still abending” that I thought I should share with you what we look for in an abend.log.

Yeah, abend.logs are not real easy reading and they normally don’t contain enough information for us to write a code fix (though I have seen it done, and I am in awe of that) but they can be very useful nonetheless.  There are a few things that we look for to get us closer to the source of the problem, or to identify if we have seen the problem before:

The first part of the log:

Server AEVANSOES halted Monday, February 27, 2006   3:14:34.434 pm
Abend 1 on P00: Server-5.70.04: Page Fault Processor Exception (Error code 00000000)

Important parts here:

Abend 1 – we only care about Abend 1 as any subsequent abends can be caused by the first abend.  So, if yours is Abend 8 it is probably not worth reporting to us.
The time – maybe you can correlate this with something in the logs, or some process that runs at that time

The abend type – Page Fault Processor Exception in this case, means that it is a hardware detected abend and is a pointer (though not proof) to a bug.

Then:

Registers:
CS = 0060 DS = 0068 ES = 0068 FS = 007B GS = 007B SS = 0068
EAX = 4E53D6A0 EBX = 4E530120 ECX = 4E530120 EDX = 00000001
ESI = 00000000 EDI = 00000000 EBP = 4E74B684 ESP = 4E74B670
EIP = 61B11B4B FLAGS = 00010202
61B11B4B 837E6900       CMP     [ESI+69]=?, 00000000
EIP in GWIA.NLM at code start +0002EB4Bh
Access Location: 0x00000069 

This is what is stored in all the CPU registers at the time of the abend.  An EIP (Extended? Instruction Pointer) is the point at which we abended.  The value of EIP can be different on different servers, however, the actual instruction should be the same, eg  CMP     [ESI+69]=?, 00000000  and the code start should be the same also, if the exact same module is loaded on the servers.   What’s a code start?  It is the HEX address of the line of code that we abended on, counted from the beginning of that module.  In the above example it is in GWIA.NLM at +0002EB4B. It’s important to compare the exact same module versions/dates because, as we make changes in the code, the code start for the same line of code can move – if you imagine that we add 10 lines of code somewhere earlier in the code for something else then the point at which we abend moves 10 lines further down or old code start + 10.
So far, if I was looking for an exiting TID or an existing bug, I would be searching on abend, page fault, gwia, and 0002EB4B (sometimes you need to include the leading + and/or the trailing h).

This is going to be a long post :)

The violation occurred while processing the following instruction:
61B11B4B 837E6900       CMP     [ESI+69], 00000000   this is where we abended
61B11B4F 7429           JZ      61B11B7A                     that it’s the same abend
61B11B58 6A0D           PUSH    0D
61B11B5A E831700800     CALL    GWIA.NLM|MMSSubmitCommand
61B11B5F 59             POP     ECX
61B11B60 31FF           XOR     EDI, EDI
61B11B62 EB0F           JMP     61B11B73
61B11B64 837E6D00       CMP     [ESI+6D], 00000000
61B11B68 7417           JZ      61B11B81 

Next comes the ‘stack’ :

Running process: GWIA-Main Process                            This is the name of the thread that abended.  It should match if the 
Thread Owned by NLM: GWIA.NLM                                                 
abend is the same
Stack pointer: 4E74B2BC
OS Stack limit: 4E743A60
Scheduling priority: 67371008
Wait state: 3030070  Yielded CPU
Stack: –4E53DB44  ?
–00000000  ?
–00000000  ?
–4E530120  ?
–4E53D6A0  ?
–4E74B90C  ?
61AE73F9  (GWIA.NLM|GweMainForNLM+1CB)        This bit is complicated to explain – pop to the bottom of the stack
–4E530120  ?                                                                 
for the rest
–440379CA  ?
–4D591880  ?
–0000000F  ?
–440377F8  ?
–49DD76A0  ?
–00000000  ?
–00000000  ?
–00000000  ?
–00000002  ?
–00000002  ?
–00000001  ?
–4E74B6E0  ?
62572810  (GWENN5.NLM|GWENN5@NgwThrdCreate+170)
–00000010  ?
–4DC4CDA0  ?
62572910  (GWENN5.NLM|GWENN5@NgwThrdCreate+270)
–00000000  ?
–00000000  ?
–4D591880  ?
–4E53D6A0  ?
–4E53D6A0  ?
–4E53D6A0  ?
–4E53D6A0  ?
–FFFFFFFF  ?
–4E53D6A0  ?
–00000246  ?
61AE367C  (GWIA.NLM|RegisterToIPMgmt+0)
–00000000  ?
BF4A5144  (THREADS.NLM|getcmd+5C)
–447C0580  ?
–00000000  ?
BF4A513C  (THREADS.NLM|getcmd+54)
–447C0580  ?
–00000000  ?
–4A2C72E0  ?
–00000000  ?
–4A2C72E0  ?
-BF4C0750  (THREADS.NLM|(Data Start)+2750)
00223CC8  (SERVER.NLM|TcoNewSystemThreadEntryPoint+40)
–4A2C72E0  ?
–00000000  ?
–00000000  ?
–00000000  ?
–00000000  ?
–05525245  ?
–0B000000  ?
–5540000D  ?
–7940000D  ?
–C5400066  ?
40400067  ?
–0F40006D  ?
–34343434  ?
–6F436E65  ?
–6E6F706D  ?
–02746E65  ?
–4D000000  ?
–93400032  ?
–10400032  ?
–6C434146  ?
–4365736F  ?
–6F706D6F  ?
–746E656E  ?
–00000002  ?
–40003400  ?
–40006D56  ?
–47414612  ?
–6F437465  ?
–6E6F706D  ?
–53746E65  ?
–01636570  ?
–70000000  ?
–14400066  ?
–65534146  ?
–6D6F4374  ?
–656E6F70  ?
–754F746E  ?
–1B0107D6  ?
–30060F01  ?
–40000000  ?
–47414613  ?
624F7465  ?
–7463656A  ?
–61636F4C  ?
–6E6F6974  ?
–00000001  ?
–40005E14  ?
–44414614  (LBURP.NLM|lburpExtensionHandler+4594)
–6D6F436F  ?
–656E6F70  ?
–7053746E  ?
–61696365  ?
–0000026C  ? 

Everywhere that you see (MODULE.NLM|FunctionName+###) is a place where the value in memory matches a point in code. Let me expand, everything stored in memory is either code or data.  If we start at memory address 0 and load a module that is 100Kb then (and I am over simplifying this) memory addresses 0 though 99 are occupied by this module, and the OS tracks this.  This is code space.

As a program executes it writes the data it needs and the code addresses to functions on the stack (eg, 0 to 99 as above), this is data space.  When we abend we write out the data part of memory as the stack like above and the abend.log tries to help by telling you when it finds an value that matches an address where it knows code is stored in memory (0 to 99 in my example).  The problem is that it’s not always accurate as the value stored may actually be data that just happens to match a code address.

At this point, if I was searching for tids or bugs I would possibly also be searching on some of the function names above, as they can get you to a relevent hit quicker – though the rest of the abend should match somewhat closely too.

And now the last bits:

Additional Information:
The CPU encountered a problem executing code in GWIA.NLM.  The problem may be in that module or in data passed to that module by a process owned by GWIA.NLM.


This is the module that abended and what passed the data to that module.  This one was definitely a GWIA abend :)

Loaded Modules:
GWIA.NLM         GroupWise Internet Agent (Beta release version)
Version 7.00.01   February 8, 2006
Code Address: 61AE3000h  Length: 002007EAh
Data Address: 5024C000h  Length: 00062B03h

The loaded modules section tells us two things – the version and build date of the modules and the order in which they were loaded, with the most recent at the top of the list and going backwards.  On my server the last module loaded was GWIA.NLM and it was abending on startup – I don’t remember the specific abend but I know it’s on startup due to the function names on the stack NgwThrdCreate, TcoNewSystemThreadEntryPoint and RegisterToIPMgmt are all things that a module does on startup.

If you are experiencing an abend that you can’t find anything about elsewhere then what we are going to need is a coredump.  Another pointer is, if you look in your abend.log, and the abends are all over the place then it is often a sign of a corrupt memory module.  And, as you can see, ‘THE’ GWIA abend doesn’t really cut the mustard as a problem description.

0 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 50 votes, average: 0.00 out of 5 (0 votes, average: 0.00 out of 5)
You need to be a registered member to rate this post.
Loading...Loading...

Categories: Uncategorized

0

Disclaimer: This content is not supported by Novell. It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test it thoroughly before using it in a production environment.

Comment

RSS