Exadata cell has I/O errors and resync of some md devices never completes (Doc ID 1328727.1)

Last updated on MAY 06, 2015

Applies to:

Oracle Exadata Storage Server Software - Version 11.2.2.2.0 to 11.2.2.2.0 [Release 11.2]
Information in this document applies to any platform.

Symptoms

Issue is only known to affect storage cells running old Exadata Storage Server Software versions (prior to 11.2.2.3.0). The information presented here may however apply in part to isssues which do not share the same cause. 

The /var/log/messages may be reporting recurring I/O errors similar to:

scsi 0:2:0:0: rejecting I/O to dead device
scsi 0:2:0:0: rejecting I/O to dead device
raid1: sda: unrecoverable I/O read error for block 128
scsi 0:2:0:0: rejecting I/O to dead device
raid1: sda: unrecoverable I/O read error for block 256



The /proc/mdstat dynamic file giving an overview of the status of the RAID 1 arrays managed by the Multiple Device (MD) kernel driver will exhibit the following characteristics. Refer to numbered bold entries explained further below.

Personalities : [raid1]
1.>>> md4 : active raid1 sdad1[2] sda1[3](F) sdb1[1] <<<
120384 blocks [2/1] [_U]
2.>>>>> resync=DELAYED <<<

md5 : active raid1 sdad5[2] sda5[0] sdb5[3](F)
10482304 blocks [2/1] [U_]
resync=DELAYED

3.>>> md6 : active raid1 sdae6[1] sdad6[0] sda6[2](F) sdb6[3](F)<<<
10482304 blocks [2/2] [UU]

md7 : active raid1 sdad7[2] sda7[3](F) sdb7[1]
2096384 blocks [2/1] [_U]
4.>>> [>....................] recovery = 0.0% (192/2096384) finish=174.6min speed=192K/sec  <<<

md8 : active raid1 sdae8[1] sdad8[0] sda8[2](F) sdb8[3](F)
2096384 blocks [2/2] [UU]

md1 : active raid1 sdad10[2] sda10[0] sdb10[3](F)
714752 blocks [2/1] [U_]
resync=DELAYED

md11 : active raid1 sdae11[1] sdad11[0] sda11[2](F) sdb11[3](F)
2433728 blocks [2/2] [UU]

md2 : active raid1 sdad9[2] sda9[0] sdb9[3](F)
2096384 blocks [2/1] [U_]
resync=DELAYED

unused devices: <none>
>>
>>


1. All existing md devices are made up of 3 or 4 member disks, such as sda, sdb, sdad, sdae. Having certain md devices with 4 member disks is a requirement.
2. For certain devices, there is a "resync=DELAYED" flag, indicating a degraded or faulty device array
3. The devices made up of 4 member disks do not have the "resync=DELAYED" flag above. Instead "[2/2] [UU]" is present, indicating a healthy, clean, synced device.
4. An attempted recovery (resync) is ongoing for one of the devices. However, the speed is very low and the completed percentage likewise. This remains constant over time.
 

Changes

Both system disks have been replaced, either at close interval or following a certain period of time

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms