PFC500 Preventive Maintenance

To stop higher drive failure raid.

SCSI hard disks, although very reliable, suffer from physical defects like any other magnetic media. These defects cause soft errors where data loss is recoverable, and hard errors where data loss is irrecoverable. Soft Errors can be ignored unless log is filled with this error. There are two types of hard errors: Hard read error and Hard write error. Hard write errors are extremely rare if ever occur we recommend replacing that drive. Hard Read errors are most common and have potential that it happened due to a bad block. The PFC500 Raid controller keeps counting hard read errors and when a drive hit a limit of thirty hard read errors. The Raid controller will automatically take that drive offline and mark it failed. Because PFC500 Raid controller can’t remap bad blocks on the fly, and requires a reboot to remap bad blocks. So we recommend you to reboot Raid system before any drive hit this limit. Rebooting raid system will remap all bad blocks and will reset this counting to zero.

The PFC500 raid system has higher rate of drive failure and most of these drives, return from field, also have higher rate of no problem found. We recommend you to use this preventive maintenance procedure to lower the drive failure rate. As a preventive maintenance we recommend you to reboot Raid system, which require rebooting profile XP also, before any drive hit this limit of thirty hard read errors.

The Document below explains how to read profile log or Net Central log, what messages to look for, when to replace a drive and when to reboot Raid system.

a) Messages – Profile logs a tremendous amount of information that our service and engineers can use to track down problems. Not all messages indicate a problem. Expect to see informational messages on an occasional (but on-going) basis. Hard drives are not deterministic even though in the video realm we attempt to use them as such.

It is the nature of a hard drive that access times will vary based on multiple factors such as cylinder location of the media, fragmentation, background tasks of the drive, read vs. write functions, head movement, age, etc. Profile will note any transient discrepancies of the drives as informational messages even though these errors are not harmful. This does not mean that a drive is problematic. Our approach to drive utilization is such that we inspect for and adjust to SCSI errors. Often times an absolute failure can be many months or even years after initial SCSI errors are reported

Net Central lite provides tremendous amount of information and can help you to troubleshoot a problem. But sometime it triggers a false alarm and recommends you to replace a drive even though that drive has only few soft SCSI errors. This issue will be fixed in new version of Net Central software. Until then we recommend you to contact customer service before ordering a replacement drive.

b) SCSI Errors –

1) Soft SCSI errors - All drives will produce soft SCSI errors. Typically these can be ignored unless the log is being filled with this error. On PFC500, a soft SCSI error means the drive has repaired the error.

2) Hard (Unrecoverable) Errors. Unrecoverable write errors are extremely rare. Read errors are the more typical error. Read errors typically denote the potential of a bad block on a drive.

On the PFC500, after 30 consecutive bad blocks the system will automatically take the drive offline. Rebooting before 30 bad blocks will remap the bad blocks

Occasional bad blocks will be found in the log. Customers should only be concerned if they see occurrences of a bad block all from the same drive over time. On the PFC500, if not addressed, eventually the system will take the drive off line.

c) When to replace a drive - If drives are left unchecked or un-replaced, at some point they will fail. Drives are mechanical in nature and as such have wearable parts. Due to the fact that drives are mechanical, errors will occur. A bad block is often characterized by a group of hard SCSI errors occurring in a row within the Profile logs.

Error / PFC500
Hard Write Error /

Always replace the drive

Hard Read Error / Before 30 bad blocks, Reboot Raid system. Otherwise Raid controller will take that drive offline and mark it fail.

d) Before replacing a Bad drive: Open profile log or Net Central log and check if there is more than one drive in a lun with hard Media errors or bad blocks. If you see more than one drive in a lun has hard media errors or bad blocks then you must reboot the RAID before you replace one of those drives. Otherwise you could wind up rebuilding the new drive with bad data, and that could lead to corrupt file system.