Flashcache missing, in status critical after multiple "Flash disk removed" alerts (Doc ID 1383267.1)

Last updated on JULY 08, 2014

Applies to:

Oracle Exadata Storage Server Software - Version 11.2.2.3.2 to 11.2.2.4.0 [Release 11.2]
Information in this document applies to any platform.
Versions affected: 11.2.2.3.0 to 11.2.2.4.0


Symptoms

One or more storage cells will generate multiple critical alerts of Flash disk removed events, which are also logged in the "cellcli -e list alerthistory", one for each of the a cell's 16 flash disks. Sample:

     25_1     2012-12-01T20:12:24-05:00     critical     "Flash disk removed.  Status        : NOT PRESENT  Manufacturer  : MARVELL  Model Number  : SD88SA02
Size          : 23G  Serial Number : 1048M052XE  Firmware      : D20Y  Slot Number   : PCI
Slot: 5; FDOM: 2  Cell Disk     : FD_14_dmorlx8cel01  Grid Disk     : No grid disks exist
Flash Cache   : Present  Error Count   : 0  Last Failure  : unknown"



Running cellcli -e list flashcache detail on the affected cell will reveal a flashcache whose status is critical, with an effectiveCacheSize: 0 and 16 degradedCelldisks. Sample:

     name:                   SAGEcel01_FLASHCACHE
     cellDisk:               
     creationTime:           2010-06-14T16:43:34+02:00
     degradedCelldisks:      FD_05_SAGEcel01,FD_07_SAGEcel01,FD_06_SAGEcel01,FD_11_SAGEcel01,FD_08_SAGEcel01,
 FD_09_SAGEcel01,FD_15_SAGEcel01,FD_04_SAGEcel01,FD_13_SAGEcel01,FD_02_SAGEcel01,
 FD_10_SAGEcel01,FD_03_SAGEcel01,FD_14_SAGEcel01,FD_00_SAGEcel01,FD_12_SAGEcel01,
 FD_01_SAGEcel01
     effectiveCacheSize:     0
     id:                     b1cb903f-b04c-4abe-94f9-7181447662b8
     size:                   365.25G
     status:                 critical



While the above symptoms could be caused by other issues, the presence of the following error message in /var/log/oracle/diag/asm/cell/`hostname -s`/trace/ms-odl.trc will uniquely identify the issue to be the one described here:

[2011-11-29T19:31:33.192+01:00] [ossmgmt] [NOTIFICATION] [CELL-08517]
[ms.hwadapter.util.ExecCmdStream] [tid: 550] [ecid:
10.31.220.74:73419:1322591493192:6147,0] Error java.lang.Exception: Failed to
 parse the command. Output of the command is not as expected while parsing line
 00.0-sas-0x5080020000c88aa0:1 in the output of command type ls -l /dev/disk
/by-path/pci-0000*sas* | cut -d: -f3,4,7.

Changes

The storage cell's uptime has exceeded 180 days. This is the key precursor to the issue described in this document. Uptime can be obtained by running w or uptime commands.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms