My pager went off Saturday morning with the following alert: “/dev/hda6 is read-only!”
I can’t remember the last time I received this alert. The filesystem in question is the home of this web site, as well as a few others. Being read-only isn’t a major problem since most of the data on the filesystem only needs to be read (images, static html, etc.). That is to say, anything of interest runs out of a database on another filesystem.
Since Saturday is generally a slow day for the site, it seemed like a great time to figure out what the problem was and get it fixed without having to wait for a late night maintenance. The first thing I did was check /var/log/messages to see if any error messages were logged. Here’s what I found:
kernel: EXT3-fs error (device hda6): ext3_free_blocks: Freeing blocks in system zones - Block = 65536, count = 1
kernel: Aborting journal on device hda6.
kernel: ext3_abort called.
kernel: EXT3-fs error (device hda6): ext3_journal_start_sb: Detected aborted journal
kernel: Remounting filesystem read-only
kernel: EXT3-fs error (device hda6) in ext3_free_blocks_sb: Journal has aborted
kernel: EXT3-fs error (device hda6) in ext3_reserve_inode_write: Journal has aborted
kernel: EXT3-fs error (device hda6) in ext3_truncate: Journal has aborted
kernel: EXT3-fs error (device hda6) in ext3_reserve_inode_write: Journal has aborted
kernel: EXT3-fs error (device hda6) in ext3_orphan_del: Journal has aborted
kernel: EXT3-fs error (device hda6) in ext3_reserve_inode_write: Journal has aborted
kernel: __journal_remove_journal_head: freeing b_committed_data
kernel: __journal_remove_journal_head: freeing b_committed_data
In a nutshell, the kernel had encountered an error with the filesystem which had resulted in an aborted journal. The kernel then remounted the filesystem read-only.
First, I shutdown anything that was using the read-only filesystem. Since this was mounted on /home, I used the lsof command to find anything using the filesystem.
# lsof | grep home
This showed me that Apache and one user were using the affected filesystem. I logged out the user first, and then shutdown Apache. That resulted in the site downtime mentioned in the title. I then unmounted the filesystem and ran fsck on it.
# umount /home
# fsck /dev/hda6
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/home: recovering journal
/home contains a file system with errors, check forced.
<snip for brevity>
/home: ***** FILE SYSTEM WAS MODIFIED *****
/home: 158016/4788672 files (12.6% non-contiguous), 3631550/4785354 blocks
#
# mount /home
After remounting the filesystem, I checked /var/log/messages again to make sure everything was fine. The kernel had mounted the filesystem properly and reported no errors. I then restarted Apache.
Total downtime for the few sites running on this server was around 10 minutes.
Photo by channah.

Get Slaptijack updates delivered to your Inbox or RSS Reader for free!