Novell Home

Troubleshooting the SLES10 Boot Process

Novell Cool Solutions: Feature
By Jason Record

Digg This - Slashdot This

Posted: 21 Jun 2007
 

Overview

When a server fails to boot, a critical situation is at hand. The purpose of this document is to provide a quick reference guide to narrow down the cause of a failed boot and get the server back up as quickly as possible. It is based on SUSE Linux Enterprise Server 10 (SLES10).

Troubleshooting Procedure

  1. The primary troubleshooting objective is to narrow down where in the boot process the failure occurred.

  2. The boot process is summarized below. For more details, refer to the Troubleshooting Table below.
  3. BIOS -> MBR/stage1 -> stage2 -> kernel/initrd -> init -> boot -> rc -> login
  4. Look at the failed server's screen for the last on-screen landmark that matches the troubleshooting table's "On-Screen Landmarks".

  5. Once you determine how far in the boot process the failure occurred, look at the troubleshooting table's associated files and troubleshooting/potential fixes.

  6. The two most identifiable on-screen landmarks are:
    1. The grub boot menu screen (Troubleshooting Table, Line 3)
    2. Seeing the word "done" scrolling across the screen (Troubleshooting Table, Lines 8 and 11)

  7. The purpose of boot installed system, run level 1 and chroot installed system is to get the server in an operational maintenance state, so further problem resolution can be completed.

  8. Boot Installed System (BIS) Procedure
    1. If this procedure works, then the problem is most likely on lines 1-6 of the troubleshooting table.
    2. Boot from CD1
    3. Select "Installation"
    4. Select your Language
    5. Accept the License Agreement
    6. Click "Other"
    7. Select "Boot Installed System"
    8. Click "OK"

  9. Boot to Run Level 1
    1. Run level 1 is very similar to chroot installed system (CIS), but the kernel does it for you. You also have access to yast and the proc filesystem. So, run level 1 is preferred over CIS.
    2. Append "init 1" to the boot options line of the default boot kernel (ie SUSE Linux Enterprise Server 10)
    3. Type root's password
    4. If you need network access, just use yast to configure it
  10. yast lan > Next > Edit > Next > Next/Finish
  11. chroot Installed System (CIS) Procedure
    1. Used mostly in lines 7-14 of the troubleshooting table.
    2. Boot from CD1
    3. Select "Rescue System", Rescue login: root
    4. Your first goal is to find and mount the root "/" partition, so we can see /etc/fstab
      1. Run cat /proc/partitions to find the disk devices the OS sees
      2. For each device, display the partition table
      ls-boot:~ # parted -s /dev/sda print
      Disk geometry for /dev/sda: 0kB - 2147MB
      Disk label type: msdos
      Number  Start   End     Size    Type      File system  Flags
      1       32kB    214MB   214MB   primary   ext2         boot, type=83
      2       214MB   535MB   321MB   primary   linux-swap   type=82
      3       535MB   2147MB  1612MB  extended               lba, type=0f
      5       535MB   1012MB  477MB   logical   reiserfs     type=83
      6       1012MB  1596MB  584MB   logical   reiserfs     type=83
      7       1596MB  2147MB  551MB   logical   reiserfs     type=83
      
      1. You can ignore type 82 swap and type 0f extended partitions
      2. To find the root partition, you may need to just guess. For example,
        1. mount /dev/sda1 /mnt
        2. ls -l /mnt
        3. If the /mnt directory listing shows /etc and /root, then its the root partition
        4. Repeat these steps for each device until you find root. In this case, the root device is /dev/sda6
        5. mount /dev/sda6 /mnt

    5. Mount all additional file systems relative to /mnt
      1. Run cat /mnt/etc/fstab
      Rescue# cat /mnt/etc/fstab
      /dev/sda6            /                    reiserfs   acl,user_xattr        1 1
      /dev/sda1            /boot                ext2       acl,user_xattr        1 2
      /dev/sda7            /usr                 reiserfs   acl,user_xattr        1 2
      /dev/sda5            /var                 reiserfs   acl,user_xattr        1 2
      /dev/sda2            swap                 swap       defaults              0 0
      proc                 /proc                proc       defaults              0 0
      sysfs                /sys                 sysfs      noauto                0 0
      debugfs              /sys/kernel/debug    debugfs    noauto                0 0
      devpts               /dev/pts             devpts     mode=0620,gid=5       0 0
      /dev/fd0             /media/floppy        auto       noauto,user,sync      0 0
      1. This shows the system devices and their mount points.
      2. Mount all additional file systems, for example.
      mount /dev/sda1 /mnt/boot
      mount /dev/sda5 /mnt/var
      mount /dev/sda7 /mnt/usr
      
    6. chroot to the mounted installed system. The chroot command remaps /mnt as root "/".
      chroot /mnt
      
      1. If this command fails, then you need to confirm that /mnt/bin/bash and glibc on the installed system are valid.
      2. To return to the rescue system, type exit.

Troubleshooting Table

BIS = Boot Installed System Procedure
CIS = chroot Installed System Procedure

Boot Process Associated File(s) On-Screen Landmarks Troubleshooting/Potential Fixes
1 BIOS N/A BIOS Messages Update the firmware
Make sure a disk device is marked bootable
2 MBR /boot/grub/stage1 GRUB
loading stage2...
BIS
grub-install /dev/<disk> or lilo -v
3 GRUB /boot/grub/stage2
/boot/grub/menu.lst
GRUB menu or grub> prompt BIS
grub-install /dev/<disk> or lilo -v
Check /boot/grub/menu.lst
4 kernel /boot/vmlinuz Hardware info scrolling
RAMDISK driver initialized:
BIS
Reinstall kernel rpm
5 initrd /boot/initrd
/etc/sysconfig/kernel
RAMDISK: <relevant message> BIS
mkdir -p /tmp/ramdisk; cd /tmp/ramdisk; zcat /boot/initrd | cpio-ivd
mkinitrd
lilo -v
6 ramdisk:init /init in /boot/initrd
/etc/sysconfig/kernel
Starting udevd
Creating devices
Loading <module_name>

There will be a "Loading" statement for each module defined in the /etc/sysconfig/kernel INITRD_MODULES variable.

BIS
mkinitrd creates the ramdisk:init file.
7 sbin:init /sbin/init
/etc/inittab
INIT: version 2.85 booting init 1, then CIS

Use boot options init=/bin/bash or init=/bin/sash to bypass running /sbin/init.

8 sbin:init:boot /bin/bash
/etc/init.d/boot
/etc/init.d/boot.d/*
System Boot Control: Running /etc/init.d/boot
Each service shows: done,failed or skipped
System Boot Control: The system has been setup
init s or init 1 starts the minimum services
CIS start no services
To step through or stop the boot process from this point on, edit /etc/sysconfig/boot and change to:
PROMPT_FOR_CONFIRM="yes"
RUN_PARALLEL="no"
FLOW_CONTROL="yes"
  (Ctrl-S stops, Ctrl-Q resumes)
9 sbin:init:boot /etc/init.d/boot.local System Boot Control: Running /etc/init.d/boot.local init 1, then CIS
10 sbin:init /etc/inittab INIT: Entering runlevel: 3 init 1, then CIS
11 sbin:init:rc /bin/bash
/etc/init.d/rc
/etc/init.d/rc?.d/*
Master Resource Control: previous runlevel:N, switching to runlevel: 3
Each service shows: done, failed or skipped
Master Resource Control: runlevel 3 has been reached
Skipped services in runlevel 3:
init s or init 1, then CIS
12 sbin:init /etc/inittab N/A init 1, then CIS
init uses /etc/inittab to know how to run the login programs.
13 sbin:init:mingetty /etc/issue
/sbin/mingetty
Welcome to SUSE LINUX...
login:
init 1 bypasses mingetty
CIS
14 sbin:init:X
Graphical login screen init 1 bypasses X login
CIS

If you don't know what to do next, and BIS or CIS work, you can always run

rpm -Vf </path/to/file>

for each file listed in the "Associated File(s)" column.


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell