How to replace an LSI raid disk with MegaCli

The MegaCli64 command has a lot of command line switches and the syntax is also cryptic. If you have identified a failed, or failing disk, it is possible to replace it using the MegaCli utility. The following commands I found useful when trying to physically identify a failed disk and replace it.

IMPORTANT NOTE: Make sure the array is "Optimal" first!

/opt/megaraid/bin/MegaCli64 -LDInfo -Lall -aALL | grep State
State : Optimal

List physical drives info:

/opt/megaraid/bin/MegaCli64 -PDList -a0
/opt/megaraid/bin/MegaCli64 -PDList -aALL | grep -e '^$' -e Slot -e Count -e Enclosure

Here is the magic command to silence the alarm:

/opt/megaraid/bin/MegaCli -AdpSetProp -AlarmSilence -aALL

Blink the LED on drive:

# to start
/opt/megaraid/bin/MegaCli64 -PdLocate -start -physdrv[E:S] -aALL
# to stop
/opt/megaraid/bin/MegaCli64 -PdLocate -stop -physdrv[E:S] -aALL

Take the disk offline:

/opt/megaraid/bin/MegaCli64 -PDOffline -PhysDrv '[E:S]' -a0

Mark the disk as missing:

/opt/megaraid/bin/MegaCli64 -PDMarkMissing -PhysDrv '[E:S]' -a0

Prepare the disk for removal:

/opt/megaraid/bin/MegaCli64 -PDPrpRmv -PhysDrv '[E:S]' -a0

Now you should replace the defective this!

Check the status:

/opt/megaraid/bin/MegaCli64 -PDList -aALL | grep "Firmware state"
Firmware state: Rebuild
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up
Firmware state: Online, Spun Up

Checking Drives Using SMARTCTL:

/usr/sbin/smartctl -a -d megaraid,2 -H /dev/sda
/usr/sbin/smartctl -a -d megaraid,2 -H /dev/sda
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

Vendor:               SEAGATE
Product:              ST3600057SS
Revision:             ES66
User Capacity:        600,127,266,816 bytes [600 GB]
Logical block size:   512 bytes
Logical Unit id:      0x5000c5005ef315eb
Serial number:        6SL5QTLM
Device type:          disk
Transport protocol:   SAS
Local Time is:        Wed May 20 05:56:05 2015 UTC
Device supports SMART and is Enabled
Temperature Warning Disabled or Not Supported
SMART Health Status: OK

Current Drive Temperature:     44 C
Drive Trip Temperature:        68 C
Elements in grown defect list: 0
Vendor (Seagate) cache information
  Blocks sent to initiator = 553885253
  Blocks received from initiator = 2109947912
  Blocks read from cache and sent to initiator = 774081696
  Number of read and write commands whose size <= segment size = 727594903
  Number of read and write commands whose size > segment size = 144
Vendor (Seagate/Hitachi) factory information
  number of hours powered up = 19261.48
  number of minutes until next internal SMART test = 11

Error counter log:
           Errors Corrected by           Total   Correction     Gigabytes    Total
               ECC          rereads/    errors   algorithm      processed    uncorrected
           fast | delayed   rewrites  corrected  invocations   [10^9 bytes]  errors
read:   235631546        0         0  235631546   235631546       8429.942           0
write:         0        0         0         0          0      27589.036           0
verify: 1419827501        9         0  1419827510   1419827511      64352.695           1

Non-medium error count:        0

SMART Self-test log
Num  Test              Status                 segment  LifeTime  LBA_first_err [SK ASC ASQ]
     Description                              number   (hours)
# 1  Background short  Completed                  32       2                 - [-   -    -]
# 2  Background long   Completed                  32       2                 - [-   -    -]
# 3  Background short  Completed                  32       1                 - [-   -    -]

Long (extended) Self Test duration: 6400 seconds [106.7 minutes]

Leave a Reply

Your email address will not be published. Required fields are marked *