Saturday, January 29, 2011

How do I easily repair a single unreadable block on a Linux disk?

My Linux system has started throwing SMART errors in the syslog. I tracked it down and believe the problem is a single block on the disk. How do I go about easily getting the disk to reallocate that one block? I'd like to know what file got destroyed in the process. (I'm aware that if one block fails on a disk others are likely to follow; I have a good ongoing backup and just want to try to keep this disk working.)

Searching the web leads to the Bad block HOWTO, which describes a manual process on an unmounted disk. It seems complicated and error-prone. Is there a tool to automate this process in Linux? My only other option is the manufacturer's diagnostic tool, but I presume that'll clobber the bad block without any reporting on what got destroyed. Worst case, it might be filesystem metadata.

The disk in question is the primary system partition. Using ext3fs and LVM. Here's the error log from syslog and the relevant bit from smartctl.

smartd[5226]: Device: /dev/hda, 1 Currently unreadable (pending) sectors

Error 1 occurred at disk power-on lifetime: 17449 hours (727 days + 1 hours)
... Error: UNC at LBA = 0x00d39eee = 13868782

There's a full smartctl dump on pastebin.

  • I think all you have to do is:

    e2fsck -c /dev/hda1
    

    assuming /dev/hda1 is the (unmounted) partition. Or:

    e2fsck -c -c /dev/hda1
    

    to do a (slower) non-destructive read-write test. It will still have to be unmounted. I don't think this will give you details on any lost data, though.

  • If the disk is going bad, replace it. It's not worth the risk that it will fall apart more.

    Nelson : I was explicit about knowing the disk is bad and having backups to avoid the risk.
    Michael Graff : That just means you're willing to gamble. I don't think that means it should not be replaced, just that you're willing to ignore that advice. I doubt any backups can save your system from itself as the disk falls apart, and things will just get very flaky as things degrade.
  • Michael has it correct and under most cases I would say just replace the drive they are cheap. However if you don't have backups and can't get important data off the drive, or just want to attempt to repair the drive then you may want to try using spinrite, on the highest level.

    I had a laptop drive that started making some noises a few years ago. Badblocks showed that the drive had 118 or so bad blocks visible to the end user. Since I already had a copy of SpinRite I decided to give it a try before buying a new drive. After running spinrite on the drive badblocks showed 0 bad blocks and the noises stopped. The drive had been working for over two years since then.

    3dinfluence : Nelson are you just going to down vote every answer that isn't what you want to hear? A healthy drive will automatically remap a bad block. If you have to go out of your way to do anything to force this the drive is no longer healthy and should be replaced.
    Nelson : No, I only downvoted one response because it didn't answer my question. You suggested spinrite, thanks! My understanding is a healthy drive will *not* remap a bad sector until it's written to. I'm trying to find the simplest way to force a write. Going to Matthew's suggestion and see if fsck is smart enough to do it.
    3dinfluence : Sorry I jumped to conclusions there after seeing 2 answers voted down quickly and you respond to the other answer I assumed that was you.
    3dinfluence : You are correct that the bad sector remap happens when a write fails to a block. If you just have a corrupted block as far as the file system is concerned then fsck may sort out your issue if the block in question is a metadata block. fsck really just scans and corrects errors in the metadata. So it makes no guarantees on the data itself. The next gen filesystems like BTRFS and ZFS can detect and if you have redundancy correct data errors. Spinrite would also force this as it reads, then writes the inverted data, rereads, then inverts the data back on every block as part of its scan.
  • You could try hdparm --write-sector <LBA> /dev/ice.

    I don't know any other way of doing this - you need to manually convert the LBA into filesystem blocks (as you've already found)

    Nelson : Ooh, that's a new flag! That will definitely take care of reallocating the bad block. Now all I need is an easy way to find what it will clobber.
    Avery Payne : Having used this method to fix a disk, I can say this is the correct method. Forcing a write to the sector in question will force the drive to face up to the sector and either (a) obtain a successful write, or (b) end up with a permanent bad second along with a remap.
    From James

0 comments:

Post a Comment