The MegaCli64 command has a lot of command line switches and the syntax is also cryptic. If you have identified a failed, or failing disk, it is possible to replace it using the MegaCli utility. The following commands I found useful when trying to physically identify a failed disk and replace it.
IMPORTANT NOTE: Make sure the array is “Optimal” first!
/opt/megaraid/bin/MegaCli64 -LDInfo -Lall -aALL | grep State
State : Optimal
List physical drives info:
/opt/megaraid/bin/MegaCli64 -PDList -a0 /opt/megaraid/bin/MegaCli64 -PDList -aALL | grep -e '^$' -e Slot -e Count -e Enclosure
Here is the magic command to silence the alarm:
/opt/megaraid/bin/MegaCli -AdpSetProp -AlarmSilence -aALL
Blink the LED on drive:
# to start /opt/megaraid/bin/MegaCli64 -PdLocate -start -physdrv[E:S] -aALL # to stop /opt/megaraid/bin/MegaCli64 -PdLocate -stop -physdrv[E:S] -aALL
Take the disk offline:
/opt/megaraid/bin/MegaCli64 -PDOffline -PhysDrv '[E:S]' -a0
Mark the disk as missing:
/opt/megaraid/bin/MegaCli64 -PDMarkMissing -PhysDrv '[E:S]' -a0
Prepare the disk for removal:
/opt/megaraid/bin/MegaCli64 -PDPrpRmv -PhysDrv '[E:S]' -a0
Now you should replace the defective this!
Check the status:
/opt/megaraid/bin/MegaCli64 -PDList -aALL | grep "Firmware state"
Firmware state: Rebuild Firmware state: Online, Spun Up Firmware state: Online, Spun Up Firmware state: Online, Spun Up
Checking Drives Using SMARTCTL:
/usr/sbin/smartctl -a -d megaraid,2 -H /dev/sda
/usr/sbin/smartctl -a -d megaraid,2 -H /dev/sda smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-4-amd64] (local build) Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net Vendor: SEAGATE Product: ST3600057SS Revision: ES66 User Capacity: 600,127,266,816 bytes [600 GB] Logical block size: 512 bytes Logical Unit id: 0x5000c5005ef315eb Serial number: 6SL5QTLM Device type: disk Transport protocol: SAS Local Time is: Wed May 20 05:56:05 2015 UTC Device supports SMART and is Enabled Temperature Warning Disabled or Not Supported SMART Health Status: OK Current Drive Temperature: 44 C Drive Trip Temperature: 68 C Elements in grown defect list: 0 Vendor (Seagate) cache information Blocks sent to initiator = 553885253 Blocks received from initiator = 2109947912 Blocks read from cache and sent to initiator = 774081696 Number of read and write commands whose size <= segment size = 727594903 Number of read and write commands whose size > segment size = 144 Vendor (Seagate/Hitachi) factory information number of hours powered up = 19261.48 number of minutes until next internal SMART test = 11 Error counter log: Errors Corrected by Total Correction Gigabytes Total ECC rereads/ errors algorithm processed uncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 235631546 0 0 235631546 235631546 8429.942 0 write: 0 0 0 0 0 27589.036 0 verify: 1419827501 9 0 1419827510 1419827511 64352.695 1 Non-medium error count: 0 SMART Self-test log Num Test Status segment LifeTime LBA_first_err [SK ASC ASQ] Description number (hours) # 1 Background short Completed 32 2 - [- - -] # 2 Background long Completed 32 2 - [- - -] # 3 Background short Completed 32 1 - [- - -] Long (extended) Self Test duration: 6400 seconds [106.7 minutes]