Oracle Linux: Failure in Activating Replaced Disk in RAID 1 "md1: recovery interrupted." "Spare Devices"
(Doc ID 2678495.1)
Last updated on JUNE 18, 2020
Applies to:
Linux OS - Version Oracle Linux 6.8 with Unbreakable Enterprise Kernel [4.1.12] and laterLinux x86-64
Symptoms
Environment details:
RAID 1 with disk sdax and sdaw. sdaw was replaced due to hardware errors. This RAID is for OS partition.
After replacing the failed disk, the new disk status is showing as "S" in cat /proc/mdstat
#mdadm --detail /dev/md1;cat /proc/mdstat /dev/md1: Version : 0.90 Creation Time : Tue Feb 21 19:35:03 2012 Raid Level : raid1 Array Size : 488279488 (465.66 GiB 500.00 GB) Used Dev Size : 488279488 (465.66 GiB 500.00 GB) Raid Devices : 1 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Mon Jun 1 14:40:35 2020 State : clean Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 <<<<<<======= UUID : d7cfbdf2:05bd07fb:0e2dcd30:aabf8c4d Events : 0.39655343 Number Major Minor RaidDevice State 0 67 18 0 active sync /dev/sdax2 1 67 2 - spare /dev/sdaw2 <<<<<<======= Not active sync. Personalities : [raid1] md1 : active raid1 sdaw2[1](S)(R) sdax2[0] <<<<<<===== Device state "(S)" 488279488 blocks [1/1] [U] md0 : active raid1 sdaw1[0] sdax1[1] 104320 blocks [2/2] [UU] unused devices: <none>
While adding the new disk to the raid with below command, there is no error reported in the command line.
#mdadm --manage /dev/md1 --add /dev/sdaw2
But below errors are reported in /var/log/messages for the sdax device.
Jun 1 14:25:43 <hostname> kernel: [23545.623935] ata2.00: exception Emask 0x0 SAct 0x600e2803 SErr 0x0 action 0x0 Jun 1 14:25:43 <hostname> kernel: [23545.631107] ata2.00: irq_stat 0x40000008 Jun 1 14:25:43 <hostname> kernel: [23545.635149] ata2.00: failed command: READ FPDMA QUEUED Jun 1 14:25:43 <hostname> kernel: [23545.640406] ata2.00: cmd 60/00:e8:4d:4e:13/05:00:10:00:00/40 tag 29 ncq 655360 in Jun 1 14:25:43 <hostname> kernel: [23545.640406] res 41/40:00:2f:52:13/00:00:10:00:00/40 Emask 0x409 (media error) Jun 1 14:25:43 <hostname> kernel: [23545.656576] ata2.00: status: { DRDY ERR } Jun 1 14:25:43 <hostname> kernel: [23545.660699] ata2.00: error: { UNC } Jun 1 14:25:43 <hostname> kernel: [23545.670856] ata2.00: configured for UDMA/133 Jun 1 14:25:43 <hostname> kernel: [23545.675290] sd 4:0:0:0: [sdax] tag#29 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 1 14:25:43 <hostname> kernel: [23545.683952] sd 4:0:0:0: [sdax] tag#29 Sense Key : Medium Error [current] [descriptor] Jun 1 14:25:43 <hostname> kernel: [23545.692095] sd 4:0:0:0: [sdax] tag#29 Add. Sense: Unrecovered read error - auto reallocate failed Jun 1 14:25:43 <hostname> kernel: [23545.701193] sd 4:0:0:0: [sdax] tag#29 CDB: Read(10) 28 00 10 13 4e 4d 00 05 00 00 Jun 1 14:25:43 <hostname> kernel: [23545.708900] blk_update_request: I/O error, dev sdax, sector 269701679 Jun 1 14:25:43 <hostname> kernel: [23545.715476] ata2: EH complete Jun 1 14:25:44 <hostname> kernel: [23547.082941] ata2.00: exception Emask 0x0 SAct 0x1a0 SErr 0x0 action 0x0 Jun 1 14:25:44 <hostname> kernel: [23547.089680] ata2.00: irq_stat 0x40000008 Jun 1 14:25:44 <hostname> kernel: [23547.093720] ata2.00: failed command: READ FPDMA QUEUED Jun 1 14:25:44 <hostname> kernel: [23547.101356] ata2.00: cmd 60/08:28:2d:52:13/00:00:10:00:00/40 tag 5 ncq 4096 in Jun 1 14:25:44 <hostname> kernel: [23547.101356] res 41/40:00:2f:52:13/00:00:10:00:00/40 Emask 0x409 (media error) Jun 1 14:25:44 <hostname> kernel: [23547.117271] ata2.00: status: { DRDY ERR } Jun 1 14:25:44 <hostname> kernel: [23547.121397] ata2.00: error: { UNC } Jun 1 14:25:44 <hostname> kernel: [23547.130899] ata2.00: configured for UDMA/133 Jun 1 14:25:44 <hostname> kernel: [23547.135297] sd 4:0:0:0: [sdax] tag#5 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE Jun 1 14:25:44 <hostname> kernel: [23547.143874] sd 4:0:0:0: [sdax] tag#5 Sense Key : Medium Error [current] [descriptor] Jun 1 14:25:44 <hostname> kernel: [23547.151925] sd 4:0:0:0: [sdax] tag#5 Add. Sense: Unrecovered read error - auto reallocate failed Jun 1 14:25:44 <hostname> kernel: [23547.160933] sd 4:0:0:0: [sdax] tag#5 CDB: Read(10) 28 00 10 13 52 2d 00 00 08 00 Jun 1 14:25:44 <hostname> kernel: [23547.168545] blk_update_request: I/O error, dev sdax, sector 269701679 Jun 1 14:25:44 <hostname> kernel: [23547.175106] ata2: EH complete Jun 1 14:25:44 <hostname> kernel: [23547.175110] md/raid1:md1: sdax: unrecoverable I/O read error for block 269492736 Jun 1 14:25:44 <hostname> kernel: [23547.175137] md: md1: recovery interrupted. Jun 1 14:26:34 <hostname> kernel: [23597.148569] sd 0:0:10:0: Mode parameters changed Input/Output errors are reported during the sync for the existing disk sdax. While trying to copy files using rsync to a new location, the following errors reported. #rsync -a /opt /test
rsync: read errors mapping "/opt/oracle/oak/orachk/CollectionManager_App.sql": Input/output error (5) rsync: read errors mapping "/opt/oracle/oak/orachk/CollectionManager_App.sql": Input/output error (5) ERROR: opt/oracle/oak/orachk/CollectionManager_App.sql failed verification -- update discarded. rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
Removed sdaw from the raid and tried to do copy with dd command. But dd also reported errors. #dd if=/dev/sdax of=/dev/sdaw conv=noerror bs=1M dd: reading '/dev/sdax': Input/output error 131690+1 records in 131690+1 records out 138087256064 bytes (138 GB) copied, 1175.41 s, 117 MB/s dd: reading '/dev/sdax': Input/output error 131691+2 records in 131691+2 records out 138088931328 bytes (138 GB) copied, 1180.47 s, 117 MB/s dd: reading '/dev/sdax': Input/output error 155361+3 records in 155361+3 records out 162909208576 bytes (163 GB) copied, 1619.62 s, 101 MB/s dd: reading '/dev/sdax': Input/output error 164067+4 records in 164067+4 records out 172039131136 bytes (172 GB) copied, 1816.29 s, 94.7 MB/s
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |