HP Smart Array Cluster Storage System

Troubleshooting
D-8 HP Smart Array Cluster Storage System User Guide
HP CONFIDENTAL
Writer: Rob Weaver File Name: j-appd Troubleshooting
Codename: Aurora Part Number: 240333-003 Last Saved On: 11/6/02 1:07 PM
Compromised Fault Tolerance
Compromised fault tolerance commonly occurs when more physical drives have
failed than the fault tolerance method can endure. In this case, the controller fails the
logical volume and returns unrecoverable disk error messages to the host. Data loss
is likely to occur.
An example of this situation would be a RAID 5 logical drive, where one drive on an
array fails while the controller is rebuilding another drive in the same array.
Fault tolerance may also be compromised because of non-drive problems, such as a
faulty cable, faulty storage system power supply, or a user accidentally turning off an
external storage system while the host system power was on. In such cases, the
physical drive replacement is not needed. However, data loss may have occurred,
especially if the system was busy at the time the problem happened.
IMPORTANT: To minimize the risk of data loss from compromised fault tolerance, make
frequent backups of all logical volumes.
Procedure to Attempt Recovery
Inserting replacement drives when fault tolerance has been compromised does not
improve the condition of the logical volume. Instead, if unrecoverable error messages
display on the screen, try the following procedure to recover data.
1. Check for loose, dirty, broken, or bent cabling and connectors on all devices.
2. Power down and remove power from the entire system. Remove and then reinsert
all hard drives and controllers.
CAUTION: Data can be lost if the drives are not firmly reseated.
3. Power up the system. In some cases, a marginal drive might work again for a
sufficient period enabling you to make copies of important files.
4. If an #02 or #04 controller display message is displayed, press the right button to
re-enable the logical volumes. Remember that data loss has probably occurred
and any data on the logical volume is suspect.
5. Make copies of important data, if possible.