Sunday, August 17, 2008

How I repaired a corrupted grub menu.lst config file

Imagine the shock when I discovered my Debian Etch machine would not boot after I ran a "routine" apt-get upgrade. The upgrade involved quite a number of packages, including the kernel image.

GRUB, the default boot loader, came to an abrupt stop with the error message
Error 15: File Not found.

The culprit was the file /vmlinuz-2.6.18-6-k7.

The weird-looking file name suggested strongly that the GRUB config file, /boot/grub/menu.lst, got corrupted by the upgrade.

At this point, I had the following options:
  • Use a rescue CD/DVD like Knoppix to boot into the system, correct the menu.lst file, and reboot.

  • While in GRUB, repair the corrupted pre-set GRUB commands, boot up the system, then correct the menu.lst file, and reboot.

I chose the second option. Below was my experience, followed by some suggestions based on the hard lessons I learned.

I rebooted the system. At the GRUB menu, I selected the corrupted OS entry, and typed e to edit this entry's associated pre-set boot commands. (Note that the OS entries and their associated commands are taken from the menu.lst file.)

The boot commands associated with the OS entry were:
root  (hd0,0)
kernel /vmlinuz-2.6.18-6-k7 root=/dev/mapper/tiger-root ro
savedefault

That did not look right because of the funny looking characters in the kernel command and the fact that the boot commands from before were root, kernel, and initrd, but NOT savedefault.

ROOT

The root command specifies and mounts GRUB's root drive and partition where the boot directory is located (/boot). This is usually (hd0,0) which means the first partition of the first hard disk. Note, GRUB's notation for numbering drives and partitions starts from 0, not 1.

If you are not sure what to set the root drive, enter the GRUB batch command mode by pressing b, and enter the following find command.
grub> find /grub/stage1
(hd0,0)

The find command searches for the file named grub/stage1 and displays the root drive and partition which contains the file. Note that if the drive does not have a partition designated for /boot, you need to prepend /boot to the command argument (find /boot/grub/stage1)

So far so good: I did not have to modify the root drive.

KERNEL

The kernel command specifies and loads the Linux kernel image. The default image file name was corrupted. To correct, I selected the kernel command, and pressed e to edit the line.

What should the file name be? Different Linux distributions name the kernel image file differently. Don't fret if you can't remember its name. The GRUB command line offers file name completion. So just enter kernel / and then hit tab.
grub> kernel /
Possible files are: System.map-2.6.18-6-k7 config-2.6.18-5-k7 config-2.6.18-6-k7 initrd.img-2.6.18-5-k7 vmlinuz-2.6.18-5-k7 grub initrd.img-2.6.18-6-k7 System.map-2.6.18-5-k7 vmlinuz-2.6.18-6-k7

For Debian, Ubuntu, Fedora, and Mandriva, the kernel image file is named vmlinuz followed by the kernel release number and the machine architecture (e.g., vmlinuz-2.6.18-6-k7). From the options returned by the file name completion feature, choose the kernel image with the latest release number.

The rest of the kernel parameters looked OK, and required no change.
grub> kernel /vmlinuz-2.6.18-6-k7 root=/dev/mapper/tiger-root ro

INITRD

Next, I had to replace the savedefault command with the initrd command.

initrd specifies the ramdisk image file. The RAM disk is used for loading modules required to access the root filesystem.

I first selected the savedefault command and pressed e to edit the line. Again, you could use the Tab key to help you complete the filename for the RAM disk file.
grub> initrd   /initrd.img-2.6.18-6-k7 


Boot & Edit menu.lst

After I made the above changes, I went back to the GRUB main menu, and pressed b to boot.

This time, the machine booted up successfully, and everything worked just fine for me.

I was not done however. Unless I changed the source of the problem (the corrupted menu.lst), the machine would come up with the same boot error in the next reboot. So, as root, I opened the file /boot/grub/menu.lst and edited the commands.

Before I could modify the corrupted commands, I needed to first locate the corresponding OS entry in the file. Each OS entry occupies a separate section in the file, beginning with its own title line. So, I scrolled down the file until I reached the target title line.
title Debian GNU/Linux, kernel -2.6.18-6-k7
root (hd0,0)
kernel /vmlinuz-2.6.18-6-k7 root=/dev/mapper/tiger-root ro
savedefault


After correcting the commands, save the file, and reboot the machine.

Lessons Learned

The GRUB config file (menu.lst) can get corrupted, and when it does, it spells real trouble.

The commands to boot the OS are not something anyone tends to remember. So, it makes sense to have a print out of the menu.lst file or a backup copy.

What I do is backup the menu.lst file in the same directory as menu.lst (say, call it menu.lst.bak).

The advantage of saving it in the same directory (as opposed to somewhere over the network) is that if the menu.lst file ever gets corrupted again, you can still display the backup copy at the grub command prompt. During the GRUB boot up process, you can display the backup file by simply entering the following at the GRUB command prompt.
 grub> cat /grub/menu.lst.bak 


The above cat command displays the backup menu.lst file. Armed with the knowledge of the correct commands to use, you can then edit the commands as shown in this article.

StumbleUpon Toolbar

2 comments:

Anonymous said...

Just what I needed. Thanks for the info.

Rockwolf

Anonymous said...

Big thank you and god bless !!

I was just gonna tear my hair out before I read your post ..:)