RAID Best Practice: Background Media Scan Monitoring

The BMS entry describes in detail what the Background Media Scan does, a corollary from that discussion is that a RAID device should be monitoring the BMS status of all of the SCSI disks (SAS & FC) and perform actions based on the status of that page.

  • If there is a recovered error it should be noted and with a certain probability scanned in the disk scrub to ensure correct data in the disk recovery,
  • If there is an unrecovered error the disk should as soon as possible perform a scrub of the RAID stripe to recover the data from that location,
  • If there are a large number of BMS entries generated the disk should be a candidate for replacement before a real trouble happens.

The BMS log page is of fixed size and if entries are added to it at a faster rate than the monitoring can detect the entries it is possible for entries to be lost and not noticed. There are options that may be used to stop the BMS from rolling over until the monitoring handles the new entries but so far I didn’t see a disk that implements these controls, the developer should be aware of this possibility and possibly alert the user that the disk is too troublesome to be maintainable.