My Oracle Support Banner

IO Stalls and Multipath Command Gets Hang with "blk_cleanup_queue" in Its Call Stack (Doc ID 2533137.1)

Last updated on JUNE 09, 2021

Applies to:

Oracle VM - Version 3.4.5 and later
Linux x86-64

Symptoms

Due to possible HBA firmware bug or other issue, IO stalls on mulitpath devices and even multipath command gets hang,

 

Apr 6 14:13:35 <Host Name>kernel: [16482949.447623] INFO: task multipath:24670 blocked for more than 120 seconds.
Apr 6 14:13:35 <Host Name>kernel: [16482949.447892] Not tainted 4.1.12-124.14.5.el6uek.x86_64 #2
Apr 6 14:13:35 <Host Name>kernel: [16482949.448150] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 6 14:13:35 <Host Name>kernel: [16482949.448666] multipath D ffff88108db58440 0 24670 24661 0x00000080
Apr 6 14:13:35 <Host Name>kernel: [16482949.448670] ffff880f4e6a3898 0000000000000286 ffff880f4e6a3868 ffff880f69962a00
Apr 6 14:13:35 <Host Name>kernel: [16482949.448673] ffff880f4e6a3888 ffff880f4e6a4000 ffff880f4e6a3a00 7fffffffffffffff
Apr 6 14:13:35 <Host Name>kernel: [16482949.448675] ffff880f69962a00 ffff880f69962a00 ffff880f4e6a38b8 ffffffff816e7df7
Apr 6 14:13:35 <Host Name>kernel: [16482949.448678] Call Trace:
Apr 6 14:13:35 <Host Name>kernel: [16482949.448688] [<ffffffff816e7df7>] schedule+0x37/0x90
Apr 6 14:13:35 <Host Name>kernel: [16482949.448692] [<ffffffff816eb63c>] schedule_timeout+0x24c/0x2d0
Apr 6 14:13:35 <Host Name>kernel: [16482949.448695] [<ffffffff816ecf9d>] ? _raw_spin_unlock_irqrestore+0x2d/0x60
Apr 6 14:13:35 <Host Name>kernel: [16482949.448701] [<ffffffff810b29c7>] ? try_to_wake_up+0x47/0x3b0
Apr 6 14:13:35 <Host Name>kernel: [16482949.448703] [<ffffffff816e954c>] wait_for_completion+0x11c/0x180
Apr 6 14:13:35 <Host Name>kernel: [16482949.448712] [<ffffffff810b2da0>] ? wake_up_state+0x20/0x20
Apr 6 14:13:35 <Host Name>kernel: [16482949.448720] [<ffffffff8109e95f>] flush_work+0xff/0x1f0
Apr 6 14:13:35 <Host Name>kernel: [16482949.448727] [<ffffffff8109b480>] ? destroy_worker+0xa0/0xa0
Apr 6 14:13:35 <Host Name>kernel: [16482949.448734] [<ffffffff8109eb58>] __cancel_work_timer+0x98/0x1b0
Apr 6 14:13:35 <Host Name>kernel: [16482949.448739] [<ffffffff8147ace8>] ? kobj_lookup+0x118/0x180
Apr 6 14:13:35 <Host Name>kernel: [16482949.448743] [<ffffffff81307530>] ? disk_map_sector_rcu+0x80/0x80
Apr 6 14:13:35 <Host Name>kernel: [16482949.448745] [<ffffffff8109eca3>] cancel_delayed_work_sync+0x13/0x20
Apr 6 14:13:35 <Host Name>kernel: [16482949.448747] [<ffffffff81309581>] disk_block_events+0x81/0x90
Apr 6 14:13:35 <Host Name>kernel: [16482949.448756] [<ffffffff8124368b>] __blkdev_get+0x5b/0x490
Apr 6 14:13:35 <Host Name>kernel: [16482949.448758] [<ffffffff81243b01>] blkdev_get+0x41/0x390
Apr 6 14:13:35 <Host Name>kernel: [16482949.448760] [<ffffffff81243f0f>] blkdev_open+0x5f/0x90
Apr 6 14:13:35 <Host Name>kernel: [16482949.448765] [<ffffffff8120557a>] do_dentry_open+0x1ea/0x340
Apr 6 14:13:35 <Host Name>kernel: [16482949.448767] [<ffffffff81243eb0>] ? blkdev_get_by_dev+0x60/0x60
Apr 6 14:13:35 <Host Name>kernel: [16482949.448769] [<ffffffff81206a47>] vfs_open+0x57/0x80
Apr 6 14:13:35 <Host Name>kernel: [16482949.448772] [<ffffffff81212b68>] ? may_open+0x68/0x110
Apr 6 14:13:35 <Host Name>kernel: [16482949.448775] [<ffffffff81217a3a>] do_last+0x21a/0x1210
Apr 6 14:13:35 <Host Name>kernel: [16482949.448778] [<ffffffff81218ab0>] path_openat+0x80/0x600
Apr 6 14:13:35 <Host Name>kernel: [16482949.448782] [<ffffffff8122d9fe>] ? seq_read+0xae/0x3f0
Apr 6 14:13:35 <Host Name>kernel: [16482949.448790] [<ffffffff8121a5ca>] do_filp_open+0x3a/0xb0
Apr 6 14:13:35 <Host Name>kernel: [16482949.448798] [<ffffffff81227577>] ? __alloc_fd+0xa7/0x130
Apr 6 14:13:35 <Host Name>kernel: [16482949.448805] [<ffffffff81206e0c>] do_sys_open+0x12c/0x220
Apr 6 14:13:35 <Host Name>kernel: [16482949.448813] [<ffffffff8102698b>] ? syscall_trace_enter_phase1+0x11b/0x180
Apr 6 14:13:35 <Host Name>kernel: [16482949.448816] [<ffffffff816ed45f>] ? system_call_after_swapgs+0xe9/0x18c
Apr 6 14:13:35 <Host Name>kernel: [16482949.448818] [<ffffffff81206f1e>] SyS_open+0x1e/0x20
Apr 6 14:13:35 <Host Name>kernel: [16482949.448820] [<ffffffff816ed51a>] system_call_fastpath+0x18/0xd4

And in vmcore if captured, you can see below call stacks for the mutex's owner process on which the above "multipath" process gets blocked on, 

#0 [ffff880f2ac73ba8] __schedule at ffffffff816e77cb
#1 [ffff880f2ac73c00] schedule at ffffffff816e7df7
#2 [ffff880f2ac73c20] schedule_timeout at ffffffff816eb551
#3 [ffff880f2ac73cd0] msleep at ffffffff810efae7
#4 [ffff880f2ac73cf0] __blk_drain_queue at ffffffff812f57ee
#5 [ffff880f2ac73d20] blk_cleanup_queue at ffffffff812f5bf5
#6 [ffff880f2ac73d50] __scsi_remove_device at ffffffff814cfcfa
#7 [ffff880f2ac73d70] scsi_remove_device at ffffffff814cfe06
#8 [ffff880f2ac73d90] scsi_remove_target at ffffffff814cffe2
#9 [ffff880f2ac73e00] process_one_work at ffffffff8109e529
#10 [ffff880f2ac73e50] worker_thread at ffffffff8109ed5b
#11 [ffff880f2ac73ec0] kthread at ffffffff810a486b
#12 [ffff880f2ac73f50] ret_from_fork at ffffffff816ed968

 

Changes

No Change made. 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.