HOWTO: Configure lkcd to capture a kernel core dump

  • 3044267
  • 07-Jan-2008
  • 30-Apr-2012

Environment

Novell SUSE Linux Enterprise Server 9
Novell Open Enterprise Server (Linux based)
lkcdutils
lcrash

Situation

The kernel is crashing or otherwise misbehaving ("Panic" message, "Oops" message, "BUG" message) and a kernel core dump needs to be captured for analysis using the lkcd (Linux Kernel Crash Dump) tools suite.

Resolution

Local disk dump configuration

Configuration
  1. Install the lkcdutils RPM.
  2. Make sure boot.lkcd is active at boot time:
    chkconfig boot.lkcd on
  3. Modify the /etc/sysconfig/dump configuration file with the following settings:
    DUMP_ACTIVE="1"
    DUMPDEV="/dev/vmdump"
    DUMPDIR="/var/log/dump"
    DUMP_LEVEL="2"
    DUMP_COMPRESS="2"
    DUMP_FLAGS="0x80000000"
    DUMP_SAVE="1"
    PANIC_TIMEOUT="5"
    BOUNDS_LIMIT="10"
    KEXEC_IMAGE="/boot/vmlinuz"
    KEXEC_CMDLINE="root console=tty0"
    DUMP_MAX_CONCURRENT=4
  4. Make sure the DUMPDEV /dev/vmdump is linked to the correct dump device (usually this will be a swap partition, but it can be any block device whose contents may be overwritten):
    ls -l /dev/vmdump
    lrwxrwxrwx 1 root root 9 Aug 10 08:33 /dev/vmdump ->/dev/sda1
    NOTE: When you run the lkcd command in the next step, it will create a/dev/vmdumpsymbolic link to the first swap partition found in /etc/fstab. If you do not want to use the first swap partition, then you must manually create the DUMPDEV symbolic link.
  5. Enable core dump capturing on the server:
    lkcd config
    lkcd query
    Configured dump device: 0x801
    Configured dump flags: KL_DUMP_FLAGS_DISKDUMP
    Configured dump level: KL_DUMP_LEVEL_HEADER|KL_DUMP_LEVEL_KERN
    Configured dump compression method: KL_DUMP_COMPRESS_GZIP
  6. Enable Magic Keys (This is useful in the event of a server hang. Only a kernel oops or panic will automatically create a core dump. However, for security reasons, you will want to disable Magic SysRq keys when there is no longer a need.):
    • Persistent across reboots:
      yast | Security and Users | Security settings | Miscellaneous Settings (Page 6), select Enable Magic SysRq Keys, select Finish, Quit
    • Temporary change:
      echo 1>/proc/sys/kernel/sysrq
  7. Following a system crash, create a tarball of the current kernel image:
    tar jcvf kernelcore.tar.bz2 /var/log/dump/n/* # where n represents the current core image

Note
Disk-based dumping may not always succeed in all panic situations. Dumping from interrupt context or on hung systems, for example, is a best-effort attempt. In situations where disk-based dumping fails, you may need to use network-based dumping or fall back to other debugging mechanisms.

Network server dump configuration

Example environment
  • Server name = core, Server IP = 10.0.0.10, Description = This is the server that will initiate a kernel core dump
  • Server name = netdump, Server IP = 10.0.0.20, MAC Addr = aa:bb:cc:dd:ee:ff, Description = This server is the netdump-server and will save the kernel image received from the core server

netdump Server Configuration
  1. Install the lkcdutils and lkcdutils-netdump-server RPMs
  2. Make sure netdump-server is active at boot time:
    chkconfig netdump-server on
  3. Modify the /etc/sysconfig/dump configuration file with the following settings:
    DUMPDIR="/var/log/dump"
    DUMP_FLAGS="0x40000000"
    SOURCE_PORT="6688"
    NETDUMP_VERBOSE="yes"
    NOTE: NETDUMP_VERBOSE="yes" is only needed for troubleshooting purposes in the event your netdumps are failing, otherwise set to "no". Verbose logging is recorded in the /var/log/messages file on the netdump server.
  4. Create a /var/log/dump directory, writable for the netdump user (which the netdump-server runs under by default):
    install -o netdump -g dump -m 777 -d /var/log/dump
  5. Start the netdump-server service:
    /etc/init.d/netdump-server start
  6. Following a system crash from the core server, create a tarball with the necessary image files. The vmcore will be on the netdump server, and the Kerntypes and System.map files will be on the core server. The kernel image file cannot be read without the Kerntypes andSystem.mapfiles.
    core:/boot/System.map-$(uname -r)
    core:/boot/Kerntypes-$(uname -r)
    netdump:/var/log/dump/ipaddr-date-time/vmcore # where ipaddr-date-time is a specific directory with the values of the core server's IP address and the date and time of the crash

Further details about the lkcd configuration are available in the README files of the lkcdutils package under:/usr/share/doc/packages/lkcdutils

core Server Configuration
  1. Install the lkcdutils RPM.
  2. Make sure lkcd-netdump is active at boot time:
    chkconfig lkcd-netdump on
  3. Modify the /etc/sysconfig/dump configuration file with the following settings -- modify to fit your environment:
    DUMP_ACTIVE="1"
    DUMPDEV="eth0"
    DUMPDIR="/var/log/dump"
    DUMP_LEVEL="2"
    DUMP_COMPRESS="2"
    DUMP_FLAGS="0x40000000"
    DUMP_SAVE="1"
    PANIC_TIMEOUT="5"
    BOUNDS_LIMIT="10"
    KEXEC_IMAGE="/boot/vmlinuz"
    KEXEC_CMDLINE="root console=tty0"
    TARGET_HOST="10.0.0.20"
    TARGET_PORT="6688"
    SOURCE_PORT="6688"
    ETH_ADDRESS="aa:bb:cc:dd:ee:ff"
    DUMP_MAX_CONCURRENT=4
  4. Enable core dump capturing on the server:
    lkcd config
    lkcd query
    Configured dump device: 0xbffff617
    Configured dump flags: KL_DUMP_FLAGS_NETDUMP
    Configured dump level: KL_DUMP_LEVEL_HEADER|KL_DUMP_LEVEL_KERN
    Configured dump compression method: KL_DUMP_COMPRESS_GZIP
  5. Enable Magic Keys (This is useful in the event of a server hang. Only a kernel oops or panic will automatically create a core dump. However, for security reasons, you may want to disable Magic SysRq keys when there is no longer a need.)
    • Persistent across reboots:
      yast | Security and Users | Security settings | Miscellaneous Settings (Page 6), select Enable Magic SysRq Keys, select Finish, Quit
    • Temporary change:
      echo 1>/proc/sys/kernel/sysrq

Validating the kernel core file

Before submitting a core image to Novell Technical Services (NTS) for evaluation, please confirm that the image is a valid one. To validate the core image, do the following:

  1. After the server has rebooted, login as root.
  2. Change to the directory that contains the kernel image:
    cd /var/log/dump/0
  3. Load the image with lcrash
    lcrash map.0 dump.0 kerntypes.0
  4. For network core images, the files would probably be:
    lcrash System.map-* vmcore Kerntypes-*
  5. If you are taken to the`>>'lcrash prompt, the kernel image is valid.
  6. Press "q" to quit.
  7. Tar up all the files and submit them to NTS.

Configuring a S390 Linux server for a coredump

The following package needs to be installed:s390-tools

Follow IBM's documentationLinux on System z - Using the Dump Tools
available as http://download.boulder.ibm.com/ibmdl/pub/software/dw/linux390/docu/l26cdt01.pdf
or via http://www.ibm.com/developerworks/linux/linux390/history.html

Additional Information

Support status per NIC driver for netdump

As of lkcdutils-4.2-193.57, the following devices have been tested and confirmed to work with LKCD netdump: 3c59x, e100, e1000, eepro100, smc-ultra, tg3, tlan, tulip


The following devices are supported by netpoll, but have not been tested with LKCD netdump: 8139too, amd8111e, pcnet32, via-rhine
8390-based Ethernet adapters: 3c503, ac3200, apne, e2100, es3210, hp, hp-plus, hydra, lne390, mac8390, ne, ne2, ne2k-pci, ne2k_cbus, ne3210, smc-mca, smc-ultra32, stnic, wd, zorro8390, oaknet

Further reading

TID 3456486, Configuring a Remote Serial Console for SLES
.A serial console may be necessary if errors during early system startup need to be captured, or if the magic SysRq functionality needs to be used and the regular Linux keyboard driver or console driver have become unresponsive.

SLE 10: kdump rather than lkcd

The process to capture kernel crash dumps has changed significantly for SUSE Linux Enterprise Server 10 and SUSE Linux Enterprise Desktop 10 on most architectures. See KB 3374462, Configure kernel core dump capture for details on taking kernel crash dumps on SLE 10.

TID history and keywords

keywords: coredump


Formerly known as TID# 10099561