System fails to boot after kernel update

  • 3494481
  • 07-Feb-2008
  • 30-Apr-2012

Environment

Novell SUSE Linux EnterpriseServer 9 Service Pack 4
Novell Linux Desktop 9 Service Pack 4
Novell Open Enterprise Server 1 (Linux based) Support Pack 2

Situation

Changes
  • Suse Linux Enterprise Server 9 Service Pack 4 was applied, or
  • Novell Linux Desktop 9 Service Pack 4 was applied, or
  • the kernel was updated through an online update (YaST Online Update, or Red Carpet/rug) to version 2.6.5-7.308 or newer.
Symptoms

After a reboot,
  • The system does not boot properly anymore. The boot process ends before the root filesystem is mounted, with messages similar to the following:
    Waiting for device /dev/sda6 to appear....
    [...]
    No root device found; exiting to /bin/sh
    sh: can't access tty; job control turned off
    and a subsequent "#" prompt from which no commands (other than shell builtins) are available.
or
  • The system boots, but some filesystems are not mounted. The affected filesystems are listed in /etc/fstab as residing on sdX devices.

Resolution

Solution

If the kernel was updated through an online update to version 2.6.5-7.311 (from patch-12106) or newer, boot the new kernel with the parameter

piix.ign_ich5_native_sata=1

Once the system is up, update the bootloader configuration to use this parameter permanently.

Workaround

This issue can also be dealt with by switching to using persistent device names for the affected storage. This workaround also applies to kernel version 2.6.5-7.308 as included on the SP4 media.

  1. If the system did not boot properly, follow these steps to regain access to the installed system:
    1. Locate media for the previous service pack (e.g. SLES9 SP3 CD1) as well as for the original product release (e.g. SLES9 GA CD1).
    2. Boot using the kernel of the previous service pack (e.g., boot from SLES9 SP3 CD1) and follow the "boot installed system" procedure documented in KB 3864925 - Troubleshooting Common Boot Issues to regain access to the installed system.
  2. From inside the installed system, switch to persistent device names for the problematic filesystems:
    1. Examine the boot loader configuration file /boot/grub/menu.lst and the filesystem table /etc/fstab. Identify the occurrences of /dev/sdX devices for the problematic filesystems (including the root filesystem, if the system has stopped booting properly).
    2. Store the mapping of /dev/disk/by-id symlinks to their targets in a scratch file:
      ls -l /dev/disk/by-id > /tmp/scratchpad.txt
    3. Remove all entries that involve storage that is not local to the system (such as SAN or iSCSI volumes) from the scratch file.
    4. Edit /boot/grub/menu.lst and /etc/fstab, replacing the problematic /dev/sd* references with the /dev/disk/by-id names pointing to them (as captured in the scratch file).
    5. Eject CDs/DVDs and reboot the system. The system should now boot up properly.

Status

Top Issue

Additional Information

Root cause

As of the SLES9 SP4 kernel, IDE drivers are built into the kernel while libata drivers are built as modules. This means that IDEdrivers always have higher priority than libata drivers - if a controller can be driven by both an IDE and a libata driver, the IDE one will always be selected. On affected systems, the specific libata driver for the storage controller uses the SCSI device naming scheme (/dev/sd*) for the disks which differs from the IDE device naming scheme (/dev/hd*). The parameter piix.ign_ich5_native_sata=1 instructs the IDE driver to ignore the HBA so the libata driver can drive it as before.

Affected hardware

This issue is known to affect Dell PowerEdge servers equipped with a "Dell 82801EB (ICH5) Serial ATA 150 Storage Controller" (PCI Vendor ID 0x8086 ("Intel Cooperation"); PCI device ID 0x24d). Based on the root cause, it is believed the issue may also occur with a limited set of systems and storage controllers from other vendors.


Note

Although the symptoms are quite similar, this issue is different from the one documented in KB 3357498 - SLES9 SP4 or NLD9 SP4 guest in VMware fails to boot. That issue affects virtualised systems rather than physical systems and is caused by hard disks not being detected, not by renamed devices.