Sunday, August 17, 2008

How I repaired a corrupted grub menu.lst config file

Imagine the shock when I discovered my Debian Etch machine would not boot after I ran a "routine" apt-get upgrade. The upgrade involved quite a number of packages, including the kernel image.

GRUB, the default boot loader, came to an abrupt stop with the error message
Error 15: File Not found.

The culprit was the file / [01;31mvmlinuz- [00m2.6.18-6-k7.

The weird-looking file name suggested strongly that the GRUB config file, /boot/grub/menu.lst, got corrupted by the upgrade.

At this point, I had the following options:
  • Use a rescue CD/DVD like Knoppix to boot into the system, correct the menu.lst file, and reboot.

  • While in GRUB, repair the corrupted pre-set GRUB commands, boot up the system, then correct the menu.lst file, and reboot.

I chose the second option. Below was my experience, followed by some suggestions based on the hard lessons I learned.

I rebooted the system. At the GRUB menu, I selected the corrupted OS entry, and typed e to edit this entry's associated pre-set boot commands. (Note that the OS entries and their associated commands are taken from the menu.lst file.)

The boot commands associated with the OS entry were:
root  (hd0,0)
kernel / [01;31mvmlinuz- [00m2.6.18-6-k7 root=/dev/mapper/tiger-root ro
savedefault

That did not look right because of the funny looking characters in the kernel command and the fact that the boot commands from before were root, kernel, and initrd, but NOT savedefault.

ROOT

The root command specifies and mounts GRUB's root drive and partition where the boot directory is located (/boot). This is usually (hd0,0) which means the first partition of the first hard disk. Note, GRUB's notation for numbering drives and partitions starts from 0, not 1.

If you are not sure what to set the root drive, enter the GRUB batch command mode by pressing b, and enter the following find command.
grub> find /grub/stage1
(hd0,0)

The find command searches for the file named grub/stage1 and displays the root drive and partition which contains the file. Note that if the drive does not have a partition designated for /boot, you need to prepend /boot to the command argument (find /boot/grub/stage1)

So far so good: I did not have to modify the root drive.

KERNEL

The kernel command specifies and loads the Linux kernel image. The default image file name was corrupted. To correct, I selected the kernel command, and pressed e to edit the line.

What should the file name be? Different Linux distributions name the kernel image file differently. Don't fret if you can't remember its name. The GRUB command line offers file name completion. So just enter kernel / and then hit tab.
grub> kernel /
Possible files are: System.map-2.6.18-6-k7 config-2.6.18-5-k7 config-2.6.18-6-k7 initrd.img-2.6.18-5-k7 vmlinuz-2.6.18-5-k7 grub initrd.img-2.6.18-6-k7 System.map-2.6.18-5-k7 vmlinuz-2.6.18-6-k7

For Debian, Ubuntu, Fedora, and Mandriva, the kernel image file is named vmlinuz followed by the kernel release number and the machine architecture (e.g., vmlinuz-2.6.18-6-k7). From the options returned by the file name completion feature, choose the kernel image with the latest release number.

The rest of the kernel parameters looked OK, and required no change.
grub> kernel /vmlinuz-2.6.18-6-k7 root=/dev/mapper/tiger-root ro

INITRD

Next, I had to replace the savedefault command with the initrd command.

initrd specifies the ramdisk image file. The RAM disk is used for loading modules required to access the root filesystem.

I first selected the savedefault command and pressed e to edit the line. Again, you could use the Tab key to help you complete the filename for the RAM disk file.
grub> initrd   /initrd.img-2.6.18-6-k7 


Boot & Edit menu.lst

After I made the above changes, I went back to the GRUB main menu, and pressed b to boot.

This time, the machine booted up successfully, and everything worked just fine for me.

I was not done however. Unless I changed the source of the problem (the corrupted menu.lst), the machine would come up with the same boot error in the next reboot. So, as root, I opened the file /boot/grub/menu.lst and edited the commands.

Before I could modify the corrupted commands, I needed to first locate the corresponding OS entry in the file. Each OS entry occupies a separate section in the file, beginning with its own title line. So, I scrolled down the file until I reached the target title line.
title Debian GNU/Linux, kernel  [01;31m- [00m2.6.18-6-k7
root (hd0,0)
kernel / [01;31mvmlinuz- [00m2.6.18-6-k7 root=/dev/mapper/tiger-root ro
savedefault


After correcting the commands, save the file, and reboot the machine.

Lessons Learned

The GRUB config file (menu.lst) can get corrupted, and when it does, it spells real trouble.

The commands to boot the OS are not something anyone tends to remember. So, it makes sense to have a print out of the menu.lst file or a backup copy.

What I do is backup the menu.lst file in the same directory as menu.lst (say, call it menu.lst.bak).

The advantage of saving it in the same directory (as opposed to somewhere over the network) is that if the menu.lst file ever gets corrupted again, you can still display the backup copy at the grub command prompt. During the GRUB boot up process, you can display the backup file by simply entering the following at the GRUB command prompt.
 grub> cat /grub/menu.lst.bak 


The above cat command displays the backup menu.lst file. Armed with the knowledge of the correct commands to use, you can then edit the commands as shown in this article.

Saturday, August 9, 2008

How to show apt log history

Users of Debian-based distributions (myself included) often brag about Debian's supposedly superior package management tool-set. This tool-set includes a choice of several excellent package managers such as dpkg, apt, synaptic, and aptitude. The various tools are all first class at what they are designed to do.

In my opinion, one major feature gap is a command-line interface for viewing the apt change log. After a recent routine Etch package upgrade, I discovered that the grub menu.lst file got corrupted. So, I wanted to check the recent apt activities to find out which packages were the possible suspects.

A google search revealed that viewing change history is not that simple. Aptitude writes change log info to /var/log/aptitude. Synaptic users can get the same info through its graphical user interface. But is there a standard change log file that the most common Debian package managers write to? The second question is whether there exists a command-line tool for accessing it.

It turns out that such a log exists, and it is /var/log/dpkg.log. This is a single log file that records all the apt activities, such as installs or upgrades, for the various package managers (dpkg, apt-get, synaptic, aptitude).

Regarding a command-line tool for accessing apt history, there used to be a package named apt-history for viewing apt-get activities. Its web site suggests that work in the project has been discontinued due to the fact that dpkg now does logging. It went on to recommend the use of a simple bash function (also named apt-history) to access the log file (/var/log/dpkg.log)

Below is the bash function from that web site.
function apt-history(){
case "$1" in
install)
cat /var/log/dpkg.log | grep 'install '
;;
upgrade|remove)
cat /var/log/dpkg.log | grep $1
;;
rollback)
cat /var/log/dpkg.log | grep upgrade | \
grep "$2" -A10000000 | \
grep "$3" -B10000000 | \
awk '{print $4"="$5}'
;;
*)
cat /var/log/dpkg.log
;;
esac
}


I've tried it, and it works.

To set it up, insert the above code into /root/.bashrc.

To run apt-history, you need to become root (for example, sudo -i). Entering apt-history with no parameter will simply dump the change log file. To select what activities you want to see, you can enter one of install, upgrade, remove, rollback as a single parameter to apt-history.

$ sudo -i
Password:
# apt-history install
2008-08-01 21:37:00 install googlizer 0.3-3
2008-08-09 08:20:54 install agrep 4.17-3
2008-08-09 08:28:26 install abuse-frabs 2.10-7
2008-08-09 08:28:27 install abuse 1:0.7.0-5
#

Monday, August 4, 2008

Learn more about a command when no man info page is available

To read the documentation about a command, the first thing we do is to read its manual (man) page.
$ man cd
No manual entry for cd


But what if no man page is available for that command?

This could be due to a number of different reasons. For example, the Linux system was originally installed without any man pages at all. I worked with a Linux-based mobile gateway device that has 0 man pages pre-installed. This is to save disk space on a 1 GB Solid State Drive (SSD). In most typical desktops or servers, man pages are pre-installed.

A man page can be missing for a particular command because the administrator, for whatever reason, did not install the man page for that command. If that is the case, google is your best friend. The challenge here is to narrow down the search to get to the man page quickly.

There is yet another reason why there is no man page for a command. If it is a bash shell built-in command, it will NOT have its own man page. Documentation is still available but it is a bit harder to get to it. It turns out that information about bash built-in commands can be found inside the bash man page. You simply enter man bash, and search for the SHELL BUILTIN COMMANDS section (using the command /^SHELL BUILTIN). Then scroll down until you reach the built-in command you are looking for.

$ man bash


How do you know if something is a bash built-in command in the first place?

Use the type command (another bash built-in).

$ type cd
cd is a shell builtin


Note that not all Linux distributions behave the same way when you man a bash built-in command. Debian Etch, the distro I use at home, reports no man page when you man cd. Some distributions, like the Red Hat-based Centos, will display the bash man page. In this case, you are one step ahead of the rest.