Oracle Linux: Failure in Activating Replaced Disk in RAID 1 "md1: recovery interrupted." "Spare Devices" (Doc ID 2678495.1)

Last updated on JUNE 18, 2020

Applies to:

Linux OS - Version Oracle Linux 6.8 with Unbreakable Enterprise Kernel [4.1.12] and later
Linux x86-64

Symptoms

Environment details:

RAID 1 with disk sdax and sdaw. sdaw was replaced due to hardware errors. This RAID is for OS partition.

After replacing the failed disk, the new disk status is showing as "S" in cat /proc/mdstat

#mdadm --detail /dev/md1;cat /proc/mdstat
/dev/md1:
Version : 0.90
Creation Time : Tue Feb 21 19:35:03 2012
Raid Level : raid1
Array Size : 488279488 (465.66 GiB 500.00 GB)
Used Dev Size : 488279488 (465.66 GiB 500.00 GB)
Raid Devices : 1
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Mon Jun 1 14:40:35 2020
State : clean
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1 <<<<<<=======
UUID : d7cfbdf2:05bd07fb:0e2dcd30:aabf8c4d
Events : 0.39655343
Number Major Minor RaidDevice State
0 67 18 0 active sync /dev/sdax2
1 67 2 - spare /dev/sdaw2 <<<<<<======= Not active sync.
Personalities : [raid1]
md1 : active raid1 sdaw2[1](S)(R) sdax2[0] <<<<<<===== Device state "(S)"
488279488 blocks [1/1] [U]
md0 : active raid1 sdaw1[0] sdax1[1]
104320 blocks [2/2] [UU]
unused devices: <none>

While adding the new disk to the raid with below command, there is no error reported in the command line.

#mdadm --manage /dev/md1 --add /dev/sdaw2

But below errors are reported in /var/log/messages for the sdax device.

Jun 1 14:25:43 <hostname> kernel: [23545.623935] ata2.00: exception Emask 0x0 SAct 0x600e2803 SErr 0x0 action 0x0
Jun 1 14:25:43 <hostname> kernel: [23545.631107] ata2.00: irq_stat 0x40000008
Jun 1 14:25:43 <hostname> kernel: [23545.635149] ata2.00: failed command: READ FPDMA QUEUED
Jun 1 14:25:43 <hostname> kernel: [23545.640406] ata2.00: cmd 60/00:e8:4d:4e:13/05:00:10:00:00/40 tag 29 ncq 655360 in
Jun 1 14:25:43 <hostname> kernel: [23545.640406] res 41/40:00:2f:52:13/00:00:10:00:00/40 Emask 0x409 (media error)
Jun 1 14:25:43 <hostname> kernel: [23545.656576] ata2.00: status: { DRDY ERR }
Jun 1 14:25:43 <hostname> kernel: [23545.660699] ata2.00: error: { UNC }
Jun 1 14:25:43 <hostname> kernel: [23545.670856] ata2.00: configured for UDMA/133
Jun 1 14:25:43 <hostname> kernel: [23545.675290] sd 4:0:0:0: [sdax] tag#29 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 1 14:25:43 <hostname> kernel: [23545.683952] sd 4:0:0:0: [sdax] tag#29 Sense Key : Medium Error [current] [descriptor]
Jun 1 14:25:43 <hostname> kernel: [23545.692095] sd 4:0:0:0: [sdax] tag#29 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 1 14:25:43 <hostname> kernel: [23545.701193] sd 4:0:0:0: [sdax] tag#29 CDB: Read(10) 28 00 10 13 4e 4d 00 05 00 00
Jun 1 14:25:43 <hostname> kernel: [23545.708900] blk_update_request: I/O error, dev sdax, sector 269701679
Jun 1 14:25:43 <hostname> kernel: [23545.715476] ata2: EH complete
Jun 1 14:25:44 <hostname> kernel: [23547.082941] ata2.00: exception Emask 0x0 SAct 0x1a0 SErr 0x0 action 0x0
Jun 1 14:25:44 <hostname> kernel: [23547.089680] ata2.00: irq_stat 0x40000008
Jun 1 14:25:44 <hostname> kernel: [23547.093720] ata2.00: failed command: READ FPDMA QUEUED
Jun 1 14:25:44 <hostname> kernel: [23547.101356] ata2.00: cmd 60/08:28:2d:52:13/00:00:10:00:00/40 tag 5 ncq 4096 in
Jun 1 14:25:44 <hostname> kernel: [23547.101356] res 41/40:00:2f:52:13/00:00:10:00:00/40 Emask 0x409 (media error)
Jun 1 14:25:44 <hostname> kernel: [23547.117271] ata2.00: status: { DRDY ERR }
Jun 1 14:25:44 <hostname> kernel: [23547.121397] ata2.00: error: { UNC }
Jun 1 14:25:44 <hostname> kernel: [23547.130899] ata2.00: configured for UDMA/133
Jun 1 14:25:44 <hostname> kernel: [23547.135297] sd 4:0:0:0: [sdax] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
Jun 1 14:25:44 <hostname> kernel: [23547.143874] sd 4:0:0:0: [sdax] tag#5 Sense Key : Medium Error [current] [descriptor]
Jun 1 14:25:44 <hostname> kernel: [23547.151925] sd 4:0:0:0: [sdax] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed
Jun 1 14:25:44 <hostname> kernel: [23547.160933] sd 4:0:0:0: [sdax] tag#5 CDB: Read(10) 28 00 10 13 52 2d 00 00 08 00
Jun 1 14:25:44 <hostname> kernel: [23547.168545] blk_update_request: I/O error, dev sdax, sector 269701679
Jun 1 14:25:44 <hostname> kernel: [23547.175106] ata2: EH complete
Jun 1 14:25:44 <hostname> kernel: [23547.175110] md/raid1:md1: sdax: unrecoverable I/O read error for block 269492736
Jun 1 14:25:44 <hostname> kernel: [23547.175137] md: md1: recovery interrupted.
Jun 1 14:26:34 <hostname> kernel: [23597.148569] sd 0:0:10:0: Mode parameters changed
Input/Output errors are reported during the sync for the existing disk sdax.

 
While trying to copy files using rsync to a new location, the following errors reported.

#rsync -a /opt /test

rsync: read errors mapping "/opt/oracle/oak/orachk/CollectionManager_App.sql": Input/output error (5)
rsync: read errors mapping "/opt/oracle/oak/orachk/CollectionManager_App.sql": Input/output error (5)
ERROR: opt/oracle/oak/orachk/CollectionManager_App.sql failed verification -- update discarded.
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]

Removed sdaw from the raid and tried to do copy with dd command. But dd also reported errors.

#dd if=/dev/sdax of=/dev/sdaw conv=noerror bs=1M
dd: reading '/dev/sdax': Input/output error
131690+1 records in
131690+1 records out
138087256064 bytes (138 GB) copied, 1175.41 s, 117 MB/s
dd: reading '/dev/sdax': Input/output error
131691+2 records in
131691+2 records out
138088931328 bytes (138 GB) copied, 1180.47 s, 117 MB/s
dd: reading '/dev/sdax': Input/output error
155361+3 records in
155361+3 records out
162909208576 bytes (163 GB) copied, 1619.62 s, 101 MB/s
dd: reading '/dev/sdax': Input/output error
164067+4 records in
164067+4 records out
172039131136 bytes (172 GB) copied, 1816.29 s, 94.7 MB/s

Cause

	To view full details, sign in with your My Oracle Support account.
	Don't have a My Oracle Support account? Click to get started!

In this Document

Symptoms

Cause

Solution

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.

Oracle Linux: Failure in Activating Replaced Disk in RAID 1 "md1: recovery interrupted." "Spare Devices" (Doc ID 2678495.1)

Applies to:

Symptoms

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!