My Oracle Support Banner

Determining when Disks should be replaced on Oracle Exadata Database Machine (Doc ID 1452325.1)

Last updated on AUGUST 30, 2023

Applies to:

Exadata X6-2 Hardware
Exadata Database Machine X2-2 Hardware - Version All Versions and later
Exadata X3-2 Hardware - Version All Versions and later
Exadata Database Machine X2-2 Half Rack - Version All Versions and later
Exadata Database Machine V2 - Version All Versions and later
Information in this document applies to any platform.

Purpose

This document explains which I/O errors require disk replacement, which do not, and which should be investigated further. I/O errors can be reported in different places for different reasons, and not every I/O error is due to a physical hard disk problem that requires replacement.

Troubleshooting Steps

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Purpose
Troubleshooting Steps
 About Disk Error Handling:
 Errors for which Disk Replacement is Recommended:
 Case R1. Cell's alerthistory reports the drive has changed its S.M.A.R.T. status to "Predictive Failure":
 Case R2. Cell's alerthistory reports the drive lun has experienced a critical error for which it cannot recover from:
 Case R3. DB node's where the Megacli status is shown as "Firmware state: (Unconfigured Bad)" or "Firmware state: Failed" preceded by logged errors indicating the drive was Failed or Predictive Failed:
 Case R4.  DB node's where the "Predictive Failure Count" is >0 even if the drive status shows as "Online".
 Case R5. Storage Cell's where the drive cell status is "Warning" and Megacli status is "Firmware State: (Unconfigured Bad)". The Cell's alerthistory may report the drive with a "not present" alert.
 Case R6. Storage Cell's where the drive cell status is "Warning - Poor Performance" even though the Megacli status is "Firmware State: Online" and there does not appear to be any error counts.
 Errors for which Disk Replacement is NOT Recommended:
 Case N1. The Media Error counters reported by MegaCli in PdList or LdPdInfo outputs in a sundiag. On Storage Servers, these are also reported by Cellsrv in the physical disk view:
 Case N2. The Other Error counters reported by MegaCli in PdList or LdPdInfo outputs in a sundiag. On Storage Servers, these are also reported by Cellsrv in the physical disk view:
 Case N3. ASM logs on the DB node show I/O error messages in *.trc files similar to:
 Case N4. Oracle Enterprise Manager users of the Exadata plug-ins may see alerts marked "Critical" for all I/O errors.
 Case N5.  A disk with Firmware status "Unconfigured(good)".
 Case N6.  A Storage Server disk reported with an alert as "Status: Warning - Confined Offline" followed by a 2nd alert of "Status: Normal" 
 Conclusion:
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.