During ASM Reblance/Add or Drop a Oracle VM Guest Hangs and Oracle Pids Going into D-State
(Doc ID 2457881.1)
Last updated on OCTOBER 10, 2022
Applies to:
Oracle VM - Version 3.4.1 to 3.4.3 [Release OVM34]Linux x86
Linux x86-64
Symptoms
ASM Re-balance, disk add or drop operation cause oracle vm guests to hang on IO layer leading to d-state processes
PIDs:
# ps axl | awk '$10 ~ /D/'
0 0 521 23827 20 0 105996 888 sleep_ D+ pts/0 0:00 awk $10 ~ /D/
1 0 5880 2 20 0 0 0 KsWait D ? 0:00 [oks_plog]
0 501 6431 1 20 0 1365040 22880 dio_aw Ds ? 0:11 asm_rbal_+ASM1
0 501 8369 1 20 0 1341924 18612 dio_aw Ds ? 0:00 oracle+ASM1 (LOCAL=NO)
0 501 13158 1 20 0 1338844 17580 dio_aw Ds ? 0:00 oracle+ASM1 (LOCAL=NO)
0 501 16999 1 20 0 10730584 12300 dio_aw Ds ? 0:00 ora_m000_BGC0TT1
0 501 18262 1 20 0 105232 444 sleep_ D ? 0:00 dd if=/dev/xvdbo1 of=xvdbo1output
0 501 23466 1 20 0 1337620 15788 dio_aw Ds ? 0:00 asm_pz99_+ASM1
0 501 23889 1 20 0 1338824 21372 dio_aw Ds ? 0:00 oracle+ASM1 (LOCAL=NO)
Stack is visible on actual guest:
Sep 13 11:19:07 <HOST> kernel: INFO: task oracle:8520 blocked for more than 120 seconds.
Sep 13 11:19:07 <HOST> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 13 11:19:07 <HOST> kernel: oracle D ffff8803fe494040 0 8520 1 0x00000080
Sep 13 11:19:07 <HOST> kernel: ffff880019cf3a28 0000000000000286 ffff880019cf3fd8 0000000000014040
Sep 13 11:19:07 <HOST> kernel: ffff880019cf2010 0000000000014040 0000000000014040 0000000000014040
Sep 13 11:19:07 <HOST> kernel: ffff880019cf3fd8 0000000000014040 ffff8803e5de8300 ffff880007128140
Sep 13 11:19:07 <HOST> kernel: Call Trace:
Sep 13 11:19:07 <HOST> kernel: [<ffffffff81596f39>] schedule+0x29/0x70
Sep 13 11:19:07 <HOST> kernel: [<ffffffff8159700c>] io_schedule+0x8c/0xd0
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811ccbd4>] dio_await_completion+0x54/0xd0
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811cee2a>] do_blockdev_direct_IO+0x5fa/0x860
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811ca2e0>] ? I_BDEV+0x10/0x10
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811cf0dc>] __blockdev_direct_IO+0x4c/0x50
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811ca2e0>] ? I_BDEV+0x10/0x10
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811cb2bf>] blkdev_direct_IO+0x4f/0x60
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811ca2e0>] ? I_BDEV+0x10/0x10
Sep 13 11:19:07 <HOST> kernel: [<ffffffff81130fb2>] generic_file_read_iter+0x122/0x130
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811ca51b>] blkdev_read_iter+0x4b/0x70
Sep 13 11:19:07 <HOST> kernel: [<ffffffff81191d2c>] do_aio_read+0xbc/0xd0
Sep 13 11:19:07 <HOST> kernel: [<ffffffff81191de7>] do_sync_read+0xa7/0xe0
Sep 13 11:19:07 <HOST> kernel: [<ffffffff811920f1>] vfs_read+0xb1/0x130
During this issue secondary RAC node also see disks in state-d and re-balance task also hung
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |