Disk Survey

Surveying the disks of the world

SATA Handling of Medium Errors: Log_info(0x0x31080000)

| Comments

When running SATA disks behind an LSI SAS controller one may encounter an obscure error reporting in the kernel that says “mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)”. If you have more than one SAS controller it may also say mpt2sas1 and mpt2sas2 or even more. The most common errors of this form that the SAS controller will emit are about bad cables or bad ports but this specific one is actually not about a bad hardware, at least not bad SAS or SATA hardware.

This particular error is a side effect of the inability of the SATA NCQ protocol to report the specific IO that had a problem. In a proper SCSI environment such as SAS disks the disk can and does report any error about the specific IO that failed and can continue to handle the other outstanding IOs normally. The SATA NCQ however is unable to do that and once there is any error, and most commonly it will be a Medium Error, it will abort all the other IOs that are pending to the disk and they will need to be reissued by the OS after it had handled the failed IO request.

The result is that when using NCQ there is a severe performance impact caused by this error recovery pattern since not only did the user wait a long time to learn about the medium error, and invariably a medium error is a result of some internal timeout, all other pending requests were aborted and need to be reissued which means wasted time in which the disk could handle more requests.

If the disk is in a proper RAID system the RAID logic will regenerate the data from the parity and rewrite the offending location in order to correct this. If the RAID is not that smart you may want to consider removing the disk from the RAID group to force a rebuild and then reinsert the disk. Preferably after rewriting the entire disk surface and making sure the disk is still fine. It most often will still work just fine after a proper scrub.

It is rather unfortunate that the LSI log_info decoding guide is not provided freely but some hints can be peeked into by looking at the source of the mptbase, mpt2sas and mpt3sas drivers.

Comments