HOW TO REBUILD A FAILED DRIVE WITH PERC2, 2/si, 3/si, and 3/di RAID CONTROLLERS.

First command to use before rebuilding any failed drives is the container list command. The drive id, if failed, will either be a missing member, or have an exclamation mark next to it. All drive syntax for SCSI ids are (bus[channel]:scsi id:lun[always zero]) The endstate necessary for drives to rebuild is MISSING MEMBER (remember this).

Original Drive Rebuilding

a. Quicker method (but more difficult) :

If drive is a missing member skip the next step but if an exclamation mark next to drive SCSI id is there:

  1. Use disk remove dead_partitions (bus, id, lun) using the id of the drive identified during the container list command. NEVER pull a drive that is not showing as a MISSING MEMBER. Further down I will add the instructions for preparing and removing a drive that is in an array and not failed or missing (ie SMART ALERTS). Whether the container is failed or not and container is not degraded or critical, it is still a member of container even if failed and if is PULLED or INITIALIZED it will DROP the CONTAINER and DATA LOSS will occur.

  2. Next command is controller rescan and then do another container list.

  3. If drive is not showing as a missing member repeat the controller rescan.

  4. Next command is container set failover x [x is container number found in container list] (bb,ii,ll) [bus, id, lun]. If the drive is part of more than 1 container, use the number of the lowest container and procede numerically on up all the containers the drive is part of.

For example: container set failover 0 (0,3,0) SCSI id3 for container 0

Should hear the drive array being hammered and command to check the status of the rebuild is task list.

b. Easier method but requires reboot:

You have to have 2.X firmware on the controller and 2.5 or higher is preferred. Do NOT flash firmware on controller while drive is failed!

To check if autorebuild feature of controller is enabled run controller show automatic_failover. If disabled do controller set automatic_failover for autorebuild to be turned on.

  1. Do the container list and identify which drive is failed with exclamation mark.

  2. Reboot the system and while it is reposting and before the raid controller initializes, pull the failed drive.

  3. It will then come up as missing member when raid initializes. After booted up insert the drive and the autorebuild will kick in and reinitialize the drive and start rebuild. The autorebuild will only work when the drive is in missing member status.

  4. task list -- will give status of rebuild

New Drive Rebuilding

  1. Follow previous instructions in section a steps 1 through 3 until you have drive showing as missing member or you can follow the procedure in section b as well as an easier solution.

  2. Insert new drive into system after missing member and raid controller should scan the bus and autospin the drive and autorebuild function will kick in on the controller.

  3. Use the task list command to monitor the rebuild progress.

Non-Failed Drive (SMART Alert)

  1. container list to see what drive is failed with exclamation mark.

  2. enclosure show slot -- to show slot versus scsi id

  3. enclosure prepare slot X (x is number of slot)

  4. enclosure show slot X again to see if slot is deactivated

  5. Remove drive and do enclosure prepare slot again to reactivate and be missing member.

  6. controller rescan

  7. Insert new drive and should auto initialize and start rebuild.

  8. task list to follow progress of rebuild

HOW TO RECONFIGURE CONTAINER ON PERC2, PERC2/si, PERC3/si, AND PERC3/di RAID CONTROLLERS WITH COMMAND LINE INTERFACE UTILITY.

Help for Container Reconfiguration Syntax and Problems Getting Command to Run

This document is going to provide you with the information that is neccessary to perform a successful reconfiguration of a container on Dell Perc2, 2/si, 3/si, and 3/di raid controllers. The most common errors and issues are listed below and the syntax of how to solve them.

  1. Reconfigure container of Raid5 and adding drives and Container is the same size as before reconfiguring.

  2. Error of needing an integer while trying to specify the 'partition_stripe size=xxG' .

  3. Resource conflict error while trying a reconfigure on a Unix/Linux based system.

Issue Number 1

Container same size after adding drive to Raid5.

This error is a result of not specifying the free space available on the new drive that you want to use for the container. Use the container reconfiguration command syntax under CLI Reference Guide to specify the drives you are pulling into the array and the free space that you need to add. If reconfigured incorrectly, it will be very difficult to get the remaining free space into the drive array.

Example:

You have a 3 drive Raid5 array with 36GB drives in it and are adding a 4th additional drive. The original size of the Raid5 is 2 X 36GB (depending on format size and also losing 1 drive to parity) = 72GB. If reconfigured without specifying the free space available, the array will wind up with 72GB spread across 4 drives instead of 108GB that is available.

Issue Number 2

Syntax error during Reconfiguring Raid5 of Expecting Integer on partition_size Switch

This is an error you can see while trying to specify the free space available on new drive that is being added to the container and was mentioned in issue 1. This is a result of the controller not being able to process non-integer numbers (decimals). Most of the 36GB drives show under disk list as 33.6GB available free space. You have to use the syntax that is found under the Syntax and Conventions for CLI Commands link of the CLI Reference Guide. Since you can't use a decimal, you have to use a divided number to pass the correct size to the controller.

Example:

'afa0>container reconfigure /partition_size=336G/10 /raid5=TRUE X (container number) (bb,ii,ll) {Bus,SCSI ID,LUN}. This is specifying that the drive size is 33.6GB and will get rid of the integer error as well as deal with using all the free space available on the drive to avoid issue 1.

Issue Number 3

Error on Unix/Linux System referring to Resource Conflict

This can be caused by the above issue number 2 drive size not being specified as the correct size, but can also be caused by the actual drive being added does not have enough space available to bring into the container while trying to use it's full size. This specific issue can only be resolved by getting a drive that is large enough by comparing sectors and actual space available on the drive.

Do NOT mix 7200rpm, 10000rpm, or 15000rpm drives in the same Raid array. They can be on the same controller if they are in separate containers, but cannot be in the same container.