Disk Survey

Surveying the disks of the world

Tales From an Adventure With Failing Disks

| Comments

I was going through old archive disks of mine trying to recover lost treasures from past life and load them on a new storage system I installed at home. In that journey I hit on the exact issues I’ve been thinking and working about when working with disks.

All of these disks exhibited one or more problems, all were media issues and all were recovered for all of the important data that I needed. There may have been areas of the disk that were not recoverable but it didn’t impact my work.

The first few disks were old (from 2000 through 2003) IDE disks that I connected through an IDE-to-USB adapter, they had troubles reading the data and reported unrecovered read errors and failed to fetch the data intitially. Based on my knowledge about disk error recovery I increased the timeout of the disks to 5 minutes by tweaking the appropriate setting on the block device in Linux. A retry of the read brought the data, it took time to get all the data since the reads took up to a minute or two in some cases but the data was recovered which was the main concern.

The last disk was a relatively new (2010) Western Digital Green drive of 2TB which I used as a sort of archival data store and it had a lot more problems. It’s latencies were off-the-chart and I increased the timeout to 10 minutes now there were also some unrecoverable read errors but those few were in unimportant spots and were only spotted when I used diskscan and corrected with the —fix option of diskscan. After a sweep of diskscan every spot that was previously with a high latency had come back down to normal and the entire disk drive became usable again. Disk self-test which previously failed had passed again and I lost the chance to experience an RMA process.

All in all, diskscan did it’s work and the increase of allowed disk timeouts got me my data.

I’m now using a udev rule to set the timeout on all my disks to 5 minutes by default:

ACTION=="add", SUBSYSTEM=="scsi", DRIVER=="sd", PROGRAM="/bin/sh -c 'echo 300 > /sys/$devpath/timeout'"