Novell Home

Set Up Linux Kernel Crash Dump on SLES

Novell Cool Solutions: Feature
By Aaron Gresko

Digg This - Slashdot This

Posted: 3 May 2005
 

Linux Kernel Crash Dump(LKCD) is a utility that provides the ability to gather, save, and analyze information about the kernel when the system dies due to software failure. LKCD is a SourceForge project and can be found at http://lkcd.sourceforge.net.

SLES8 and SLES9 come with the LKCD software package and the kernel compiled to support crash dumps. Getting LKCD configured on SLES requires the following:

  • Understand how LKCD works
  • Install the LKCD software
  • Configure LKCD
  • Test LKCD with a kernel panic
  • Verify LKCD dump files

Understand How LKCD Works

Once LKCD is installed and configured, a crash dump is created whenever a kernel panic or kernel oops occurs. A dump can also be initiated through the use of magic keys.

Once the panic or oops occurs, the following happens:

  1. System memory is saved to the dump device.
  2. The system is rebooted.
  3. LKCD configuration is run, which prepares the system for the next system failure.
  4. Dump files are created on the permanent dump storage area from the dump device.

The dump can then by analyzed with the lcrash tool or sent to developers/support representatives for analysis.

Note: Instructions given here will be based on SLES9. Differences in SLES8 will be pointed out wherever applicable.

Install The LKCD Software

Before installing the LKCD software on SLES8, make sure the system is patched to at least service pack 3. Significant changes occurred between the base release of SLES8 and service pack 3 and the instructions given here will be insufficient if the system is not updated. SLES9 will be fine patched with SP1 or in its base form.

LKCD should be installed using the YaST Software module. To install LKCD, first make sure any desired service pack is listed as an installation source in YaST and moved up the priority list, then complete the following:

  1. Start YaST by selecting SUSE > System > YaST.
  2. Select Software > Install And Remove Software.
  3. Change the Filter drop-down selection to Search and enter lkcd as the search string. Select Search to perform the search. Two packages are returned---lkcdutils and lkcdutils-netdump-server. lkcd-netdump-server provides the ability to receive dump files across a network.
  4. Configuring network dump capability is beyond the scope of this document.
  5. From the result list select lkcdutils for installation and then select Accept.

LKCD is now installed on the system. The principal files installed by lkcdutils include the following:

  • /sbin/lkcd---is the script used to configure or save a crash dump. lkcd takes either config or save as parameters.
  • /sbin/lkcd_config---is the binary file called by lkcd to configure crash parameters and set up the crash dump. Pass the -q parameter to lkcd_config to see the current crash setup.
  • /etc/sysconfig/dump---is the configuration file for LKCD. All configuration of the dump is done through this file.
  • /sbin/lcrash---is used to verify and analyze saved dump files.

Configure LKCD

Configuring LKCD on SLES8 requires the following steps:

  1. Set LKCD variables in /etc/sysconfig/dump
  2. Load and verify configuration options
  3. Set up boot loading options

Set LKCD variables in /etc/sysconfig/dump

All configurable options for LKCD are set in /etc/sysconfig/dump. Edit the configuration file in a text editor as root. For example:

#vi /etc/sysconfig/dump

The dump configuration file is commented very well. The configurable options include the following:

  • DUMP_ACTIVE---0 turns dump off; 1 turns dump on.
  • DUMPDEV---sets the partition where the dump will be placed when the crash occurs or the network interface the dump will be sent on. On SLES9, the default is eth0. On SLES8, it is /dev/vmdump.

Normally, /dev/vmdump is the first available swap partion. If not changed from default, /dev/vmdump should be symbolically linked to the swap partition.

If using /dev/vmdump, verify that /dev/vmdump is linked to a partition. On some systems, /dev/vmdump is a dead link when placed on the system. To see if /dev/vmdump is dead, enter ls -l /dev/vmdump at the console prompt.

If the link is dead, remove /dev/vmdump (i.e. rm /dev/vmdump) and then recreate the symbolic link so it points to the desired swap partition(e.g. ln -s /dev/sda1 /dev/vmdump). Another option is to specify the swap partition directly as the dump device.

If the swap partition is smaller than the size of memory, the crash dump may not fit. In this situation, consider creating a dedicated swap partition to receive the crash dump. Use YaST to configure the partition as swap and make sure the partition will not be loaded by fstab. The dedicated swap partition can then be listed as the dump device. For example DUMPDEV=/dev/sdc1. DUMPDIR---is the file system location where dump files will be created by lkcd save. The default is /var/log/dump. DUMP_LEVEL---determines the verbosity of the dump. Valid options include:

  • 0---do nothing, just return
  • 1---dump the dump header and first 128 bytes
  • 2---dump the dump header and only the kernel pages (only SLES9)
  • 4---everything except free kernel pages (only SLES9)
  • 8---everything

The default is SLES8 is 8. The default in SLES9 is 2.

DUMP_COMPRESS---sets the type of compression to use (1 for RLE; 2 for GZIP), or none (0).

DUMP_FLAGS---allow configuration of dump behavior, such as not rebooting after dumping and dumping to a network device. See the comments in the dump file for the valid flags. The default in SLES8 is 0x80000000, which is a local device dump. The default in SLES9 is 0x40000000, which is a network dump.

DUMP_SAVE---determines if the dump is saved to the DUMPDIR or not. Setting DUMP_SAVE to 1 creates the dump. Any other value will result in a crash report being created on the dump device, but the crash dump files will not be created on the dump directory.

PANIC_TIMEOUT---is the time in seconds to wait before rebooting after a kernel panic.

BOUNDS_LIMIT (SLES9 only)---specifies the number of dumps that can be retained in the dump directory. When the limit is reached, the first dump directory, 0, is overwritten during the next dump. The default is 10. Setting the limit to 0 will allow unlimited dumping.

KEXEC_IMAGE (2.6 kernels only)---specifies the boot image to boot after the dump.

KEXEC_CMDLINE---specifies the boot options to pass with the KEXEC_IMAGE boot image. The default value will not be valid. To get the default command line for the desire boot image, boot the image and enter cat /proc/cmdline at the prompt.

The other available options are for configuring network dumps, which is not covered here.

The examples given in the rest of this article will have SLES9 set to do a local dump to the first swap partition. The only changes from defaults in /etc/sysconfig/dump are as follows:

  • DUMPDEV="/dev/vmdump"
  • DUMP_FLAGS="Ox80000000"

Load and verify configuration options

With the configurable options set, run the configuration script lkcd:

#lkcd config

Note: If doing a local dump using /dev/vmdump, make sure that /dev/vmdump is linked to the desired swap partition and not dead before running lkcd config.

The configuration file is read and the dump options are set by lkcd_config. Verify the dump configuration by running:

#lkcd_config -q
Configured dump device: 0x801
Configured dump flags: KL_DUMP_FLAGS_DISKDUMP
Configured dump level: KL_DUMP_LEVEL_HEADER|KL_DUMP_LEVEL_KERN
Configured dump compression method: KL_DUMP_COMPRESS_GZIP

Additionally, verify the dump modules are present:

#lsmod | grep dump
dump_blockdev	10752	0
dump_gzip		7308	0
zlib_deflate		26776	1 dump_gzip
dump			30994	2 dump_blockdev,dump_gzip

Set up boot loading options

LKCD's default behavior after a panic is to dump to the dump device and then reboot the system. Saving the dump to the dump directory and reconfiguring have to be done manually when the system comes back up by doing the following:

  1. Open a terminal and su to root.
  2. Save the dump to the dump directory by entering lkcd save.
  3. Reconfigure LKCD by entering lkcd config.

LKCD comes with a boot script that will do the save and config when the system boots.

To use the boot script, first verify the boot.lkcd file exists, for example:

#ls /etc/init.d/boot.lkcd
/etc/init.d/boot.lkcd

A message indicating no file or directory means the file is not installed.

The boot.lkcd script will not run at boot unless the service is inserted, so enter the following at the prompt:

#insserv /etc/init.d/boot.lkcd

With boot.lkcd configured, lkcd will run and check for a dump on the dump device. If one is found a save will occur and then a configuration.

Test LKCD With A Kernel Panic

Creating a kernel panic without magic keys enabled can be difficult. Fortunately, SLES comes with magic key support built in and it just needs to be enabled.

To enable magic keys and create a kernel panic, do the following:

  1. Open a terminal and su to root.
  2. Enable magic keys by entering echo 1 > /proc/sys/kernel/sysrq.
  3. Verify the operation by entering cat /proc/sys/kernel/sysrq. The output should be 1.
  4. Remount all file systems as read only by pressing Alt-SysRq-u. This saves the system from running fsck when it reboots.
  5. Create a panic by pressing Alt-SysRq-d. On SLES8, the key combination is Alt-SysRQ-c.

The system will freeze up. lkcd will copy the appointed memory to the dump device and then reboot. If lkcd configuration did not complete, the system will remain frozen until rebooted manually.

Don't be alarmed if the grub menu is blank or only have one boot option listed. The kexec parameters allow for the boot options to be limited.

As the system comes back up, the status of lkcd save will be shown on the screen. If no dump image is found on the dump device lkcd save will fail. Either way, lkcd config will run and prepare the system for a future kernel panic.

On SLES9, you must login as root the first time up.

Verify LKCD Dump Files

When the system comes back up, log in and verify lkcd was successful.

Complete the following:

  1. Verify the dump was saved to the dump directory.
  2. Verify the dump validity with lcrash.
  3. Ensure lkcd configuration is complete.

Verify the dump was saved to the dump directory

Examine the contents of the dump directory to verify a crash dump was saved. Assuming the default dump directory, /var/log/dump, complete the following to find the dump files:

  1. Open a terminal a su to root.
  2. Enter cd /var/log/dump.
  3. Enter ls.

Each consecutive dump is given its own directory, signified by a single integer. The first dump will be placed in directory /var/log/dump/0. The second goes in /var/log/dump/1 and so on.

4.Use cd to change into the desired dump directory (e.g. cd 0).
5.Enter ls to list the dump files, as show in Figure 2.


Figure 2: Dump file listing

Verify dump validity with lcrash

The easiest way to verify the dump's validity is to load it into lcrash. Using lcrash to analyze dumps is beyond this article, but loading the dump is covered.

To load up a dump file in lcrash, just cd to the directory and then load lcrash.

One caveat of using lcrash on SLES8 regards the /boot/System.map file. This file is created on systems where lkcd is compiled instead of installed from a package. The System.map file is needed by lcrash to load the dump. System.map is a symbolic link to the kernel versioned System.map-kernelversion file in /boot. When passing the system map parameter to lcrash, substitute the kernel versioned System.map filename.

To run lcrash on /var/log/dump/0 on SLES8, do the following:

  1. Open a terminal and su to root.
  2. Enter cd /var/log/dump/0.
  3. Enter lcrash -d dump.0 -t kerntypes.0 -m /boot/System.map-2.4.21-138-smp.

Be sure to substitute the System.map file on your system. Use tab complete to avoid spelling mistakes.

If the dump is valid, lcrash loads and displays Figure 3.


Figure 3: lcrash

Ensure lkcd configuration is complete

The final step is to verify lkcd configuration ran successfully at boot. Enter lkcd_config -q to verify the configuration.


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell