In a recent entry on Hard Drives and UREs the author depicts the right picture of how a disk may fail to read but I feel that he misses the punchline. Admittedly, it took me a few days to bring it to the front of my mind as well.
All disks give a spec for their Unrecoverable Read Error rate, this is normally $10^-15$ for consumer drives and $10^-16$ for enterprise drives. Many take it to be the overall chance to get a read error and I’m pretty much convinced that this is wrong. It seems to me that this specification is more about the random chance for a disk to fail to read data that was supposed to be written to it beforehand. This includes many possible errors during write and during read. The head may not be able to lock onto the right place, the data may have been overwritten by a later error or any dozen of other possible failures. Many of these failures are being tested during read and write and there are definitely attempts to correct them. A normal SAS drive indicates millions of minor corrective actions that were taken during its operation, most of them are not worthy to note.
The HDD mechatronics and the SSD physics are complex and hard to get right in all cases and that’s where the URE spec comes from, these random failures to read data every now and then.
There is a whole other class of problems where the failure is of a larger scope, the head crashed into the platter, contamination from outside is wreaking havoc, the manufacturing process imbued some contamination or other failure, or some other external force is the source of the problem. These are not covered in the URE spec.