Determining when Disks should be replaced on Oracle Exadata Database Machine
(Doc ID 1452325.1)
Last updated on AUGUST 30, 2023
Applies to:
Exadata X6-2 HardwareExadata Database Machine X2-2 Hardware - Version All Versions and later
Exadata X3-2 Hardware - Version All Versions and later
Exadata Database Machine X2-2 Half Rack - Version All Versions and later
Exadata Database Machine V2 - Version All Versions and later
Information in this document applies to any platform.
Purpose
This document explains which I/O errors require disk replacement, which do not, and which should be investigated further. I/O errors can be reported in different places for different reasons, and not every I/O error is due to a physical hard disk problem that requires replacement.
Troubleshooting Steps
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Purpose |
Troubleshooting Steps |
About Disk Error Handling: |
Errors for which Disk Replacement is Recommended: |
Case R1. Cell's alerthistory reports the drive has changed its S.M.A.R.T. status to "Predictive Failure": |
Case R2. Cell's alerthistory reports the drive lun has experienced a critical error for which it cannot recover from: |
Case R4. DB node's where the "Predictive Failure Count" is >0 even if the drive status shows as "Online". |
Case R5. Storage Cell's where the drive cell status is "Warning" and Megacli status is "Firmware State: (Unconfigured Bad)". The Cell's alerthistory may report the drive with a "not present" alert. |
Case R6. Storage Cell's where the drive cell status is "Warning - Poor Performance" even though the Megacli status is "Firmware State: Online" and there does not appear to be any error counts. |
Errors for which Disk Replacement is NOT Recommended: |
Case N1. The Media Error counters reported by MegaCli in PdList or LdPdInfo outputs in a sundiag. On Storage Servers, these are also reported by Cellsrv in the physical disk view: |
Case N2. The Other Error counters reported by MegaCli in PdList or LdPdInfo outputs in a sundiag. On Storage Servers, these are also reported by Cellsrv in the physical disk view: |
Case N3. ASM logs on the DB node show I/O error messages in *.trc files similar to: |
Case N4. Oracle Enterprise Manager users of the Exadata plug-ins may see alerts marked "Critical" for all I/O errors. |
Case N5. A disk with Firmware status "Unconfigured(good)". |
Case N6. A Storage Server disk reported with an alert as "Status: Warning - Confined Offline" followed by a 2nd alert of "Status: Normal" |
Conclusion: |
References |