Exadata Compute Nodes Crashed with many IO errors in CELL alert logs
(Doc ID 1531673.1)
Last updated on JUNE 05, 2019
Applies to:
Oracle Exadata Hardware - Version 11.2.0.3 to 11.2.0.3 [Release 11.2]Oracle Exadata Storage Server Software - Version 11.2.3.2.0 to 11.2.3.2.0 [Release 11.2]
Oracle Database - Enterprise Edition - Version 11.2.0.3 to 11.2.0.3 [Release 11.2]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Information in this document applies to any platform.
Symptoms
DB instance crashed when running high load.
Many IO errors in CELL alert logs : IO errors for flash and storage cell disks.
IO Error: dev=/dev/sdz cdisk=FD_04_XXXXX3c02 op=RD 32768 bytes at sector 124214016 failed with errno=-2
IO Error: dev=/dev/sdy cdisk=FD_03_XXXXX3c02 op=RD 65536 bytes at sector 120966784 failed with errno=-2
IO Error: dev=/dev/sdz cdisk=FD_04_XXXXX3c02 op=RD 65536 bytes at sector 75332736 failed with errno=-2
IO Error: dev=/dev/sdq cdisk=FD_15_XXXXX3c02 op=RD 65536 bytes at sector 122211584 failed with errno=-2
IO Error: dev=/dev/sdaa cdisk=FD_05_XXXXX3c02 op=RD 65536 bytes at sector 26423168 failed with errno=-2
IO Error: dev=/dev/sdab cdisk=FD_06_XXXXX3c02 op=RD 24576 bytes at sector 123508608 failed with errno=-2
IO Error: dev=/dev/sdv cdisk=FD_00_XXXXX3c02 op=RD 65536 bytes at sector 123316352 failed with errno=-2
IO Error: dev=/dev/sdw cdisk=FD_01_XXXXX3c02 op=RD 16384 bytes at sector 124288432 failed with errno=-2
IO Error: dev=/dev/sdb3 cdisk=CD_01_XXXXX3c02 op=RD 90112 bytes at sector 184342352 failed with errno=-2
IO Error: dev=/dev/sdaa cdisk=FD_05_XXXXX3c02 op=RD 65536 bytes at sector 74984576 failed with errno=-2
IO Error: dev=/dev/sdx cdisk=FD_02_XXXXX3c02 op=RD 65536 bytes at sector 72410112 failed with errno=-2
IO Error: dev=/dev/sdaa cdisk=FD_05_XXXXX3c02 op=RD 65536 bytes at sector 75525376 failed with errno=-2
IO Error: dev=/dev/sdaa cdisk=FD_05_XXXXX3c02 op=RD 65536 bytes at sector 75536512 failed with errno=-2
IO Error: dev=/dev/sdy cdisk=FD_03_XXXXX3c02 op=RD 65536 bytes at sector 24994304 failed with errno=-2
IO Error: dev=/dev/sdz cdisk=FD_04_XXXXX3c02 op=RD 32768 bytes at sector 124214016 failed with errno=-2
IO Error: dev=/dev/sdy cdisk=FD_03_XXXXX3c02 op=RD 65536 bytes at sector 120966784 failed with errno=-2
IO Error: dev=/dev/sdz cdisk=FD_04_XXXXX3c02 op=RD 65536 bytes at sector 75332736 failed with errno=-2
IO Error: dev=/dev/sdq cdisk=FD_15_XXXXX3c02 op=RD 65536 bytes at sector 122211584 failed with errno=-2
IO Error: dev=/dev/sdaa cdisk=FD_05_XXXXX3c02 op=RD 65536 bytes at sector 26423168 failed with errno=-2
IO Error: dev=/dev/sdab cdisk=FD_06_XXXXX3c02 op=RD 24576 bytes at sector 123508608 failed with errno=-2
IO Error: dev=/dev/sdv cdisk=FD_00_XXXXX3c02 op=RD 65536 bytes at sector 123316352 failed with errno=-2
IO Error: dev=/dev/sdw cdisk=FD_01_XXXXX3c02 op=RD 16384 bytes at sector 124288432 failed with errno=-2
IO Error: dev=/dev/sdb3 cdisk=CD_01_XXXXX3c02 op=RD 90112 bytes at sector 184342352 failed with errno=-2
IO Error: dev=/dev/sdaa cdisk=FD_05_XXXXX3c02 op=RD 65536 bytes at sector 74984576 failed with errno=-2
IO Error: dev=/dev/sdx cdisk=FD_02_XXXXX3c02 op=RD 65536 bytes at sector 72410112 failed with errno=-2
IO Error: dev=/dev/sdaa cdisk=FD_05_XXXXX3c02 op=RD 65536 bytes at sector 75525376 failed with errno=-2
IO Error: dev=/dev/sdaa cdisk=FD_05_XXXXX3c02 op=RD 65536 bytes at sector 75536512 failed with errno=-2
IO Error: dev=/dev/sdy cdisk=FD_03_XXXXX3c02 op=RD 65536 bytes at sector 24994304 failed with errno=-2
ASM alert log shows disk offline messages but there is no disk offline in ASM metadata out : There are many warning messages of same type but there is no output shows that disk became offline.
Thu Jan 24 17:20:16 2013
NOTE: process _user44167_+asm1 (44167) initiating offline of disk 14.3915937104 (DATA_XXXXX3_CD_02_XXXXX3C02) with mask 0x7e[0x7f] in group 1
WARNING: Disk 14 (DATA_XXXXX3_CD_02_XXXXX3C02) in group 1 in mode 0x7f is now being taken offline on ASM inst 1
NOTE: initiating PST update: grp = 1, dsk = 14/0xe9687550, mask = 0x6a, op = clear
Thu Jan 24 17:20:16 2013
GMON updating disk modes for group 1 at 10 for pid 54, osid 44167
Cell Top output shows cellsrv process with high CPU: cellsrv is showing %CPU is 304%.
top - 17:21:41 up 49 days, 6:00, 0 users, load average: 0.89, 0.85, 0.57
Tasks: 425 total, 1 running, 424 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.7%us, 4.1%sy, 0.0%ni, 86.3%id, 0.7%wa, 0.0%hi, 1.1%si, 0.0%st
Mem: 65963336k total, 25384020k used, 40579316k free, 55848k buffers
Swap: 2097080k total, 0k used, 2097080k free, 4223448k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29640 root 20 0 22.6g 7.9g 12m S 304.4 12.5 3294:25 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/cellsrv/bin/cellsrv 100 5000 9 5042
top - 17:21:41 up 49 days, 6:00, 0 users, load average: 0.89, 0.85, 0.57
Tasks: 425 total, 1 running, 424 sleeping, 0 stopped, 0 zombie
Cpu(s): 7.7%us, 4.1%sy, 0.0%ni, 86.3%id, 0.7%wa, 0.0%hi, 1.1%si, 0.0%st
Mem: 65963336k total, 25384020k used, 40579316k free, 55848k buffers
Swap: 2097080k total, 0k used, 2097080k free, 4223448k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29640 root 20 0 22.6g 7.9g 12m S 304.4 12.5 3294:25 /opt/oracle/cell11.2.3.2.0_LINUX.X64_120713/cellsrv/bin/cellsrv 100 5000 9 5042
Cell IO stat shows high utilization : in following output %util is 100%. r/s is showing 350 which is very high. In this case High Capacity disk is used.
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 26.60 0.20 406.00 0.20 97923.20 12.80 241.10 521.24 1341.08 2.46 100.00
sdd 25.20 0.00 414.00 0.00 101968.00 0.00 246.30 349.02 1155.30 2.42 100.00
sde 26.20 0.00 428.80 0.40 98128.00 12.80 228.66 892.45 2771.19 2.33 100.00
sdf 35.60 0.00 412.40 0.20 104924.80 6.40 254.32 142.74 362.83 2.42 100.00
sdg 33.60 0.00 408.80 0.00 99046.40 0.00 242.29 496.82 1450.82 2.45 100.00
sdh 21.20 0.00 399.00 0.00 97641.60 0.00 244.72 388.33 1262.35 2.51 100.00
sdi 22.20 0.40 420.20 0.60 96688.00 14.40 229.81 603.19 1833.91 2.38 100.00
sdj 34.80 0.00 404.20 0.00 98083.20 0.00 242.66 386.81 1028.80 2.47 100.00
sdk 33.40 0.00 383.80 0.00 98142.40 0.00 255.71 144.67 456.54 2.61 100.00
sdl 31.40 0.00 445.60 0.00 102550.40 0.00 230.14 775.37 2251.88 2.24 100.00
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sdc 26.60 0.20 406.00 0.20 97923.20 12.80 241.10 521.24 1341.08 2.46 100.00
sdd 25.20 0.00 414.00 0.00 101968.00 0.00 246.30 349.02 1155.30 2.42 100.00
sde 26.20 0.00 428.80 0.40 98128.00 12.80 228.66 892.45 2771.19 2.33 100.00
sdf 35.60 0.00 412.40 0.20 104924.80 6.40 254.32 142.74 362.83 2.42 100.00
sdg 33.60 0.00 408.80 0.00 99046.40 0.00 242.29 496.82 1450.82 2.45 100.00
sdh 21.20 0.00 399.00 0.00 97641.60 0.00 244.72 388.33 1262.35 2.51 100.00
sdi 22.20 0.40 420.20 0.60 96688.00 14.40 229.81 603.19 1833.91 2.38 100.00
sdj 34.80 0.00 404.20 0.00 98083.20 0.00 242.66 386.81 1028.80 2.47 100.00
sdk 33.40 0.00 383.80 0.00 98142.40 0.00 255.71 144.67 456.54 2.61 100.00
sdl 31.40 0.00 445.60 0.00 102550.40 0.00 230.14 775.37 2251.88 2.24 100.00
Changes
There is no change in setup. Instance face a High IO intensive load.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |