xen-blkfront: IO Hung When Global Free Grant Used Up

(Doc ID 2318961.1)

Last updated on OCTOBER 29, 2017

Applies to:

Oracle VM - Version 3.0.1 and later
Linux OS - Version Oracle Linux 7.2 with Unbreakable Enterprise Kerne [4.1.12] and later
Linux x86-64

Symptoms

Oracle Linux 7 Oracle VM guests with UEK4 frequently freezes or stops responding.

The following call traces have been found in /var/log/messages prior to the issue on several guests.

kernel: INFO: task ora_xx:3971 blocked for more than 120 seconds.
kernel: Not tainted 4.1.12-61.1.34.el7uek.x86_64 #2
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: ora_xx D ffff88028e8d7840 0 3971 1 0x00000080
kernel: ffff88025f8c3a68 0000000000000082 ffff880282a84600 ffff880281722a00
kernel: 0000000000000000 ffff88025f8c4000 0000000000000000 7fffffffffffffff
kernel: 0000000000000002 ffffffff8171e1a0 ffff88025f8c3a88 ffffffff8171d8b7
kernel: Call Trace:
kernel: [<ffffffff8171e1a0>] ? bit_wait+0x50/0x50
kernel: [<ffffffff8171d8b7>] schedule+0x37/0x90
kernel: [<ffffffff8172091c>] schedule_timeout+0x24c/0x2c0
kernel: [<ffffffff8100bc65>] ? xen_clocksource_get_cycles+0x15/0x20
kernel: [<ffffffff810f95ae>] ? ktime_get+0x3e/0xa0
kernel: [<ffffffff8171ce66>] io_schedule_timeout+0xa6/0x110
kernel: [<ffffffff8119c7f0>] ? global_dirtyable_memory+0x50/0x50
kernel: [<ffffffff8171e1d6>] bit_wait_io+0x36/0x50
kernel: [<ffffffff8171dd55>] __wait_on_bit+0x65/0x90
kernel: [<ffffffff81190cfb>] wait_on_page_bit+0xcb/0xf0
kernel: [<ffffffff810cb710>] ? autoremove_wake_function+0x40/0x40
kernel: [<ffffffff81190e29>] filemap_fdatawait_range+0x109/0x1c0
kernel: [<ffffffff81192eef>] filemap_write_and_wait_range+0x3f/0x70
kernel: [<ffffffffa02b26be>] xfs_file_fsync+0x6e/0x220 [xfs]
kernel: [<ffffffff812419bd>] vfs_fsync_range+0x3d/0xb0
kernel: [<ffffffffa02b40e0>] xfs_file_write_iter+0xe0/0x120 [xfs]
kernel: [<ffffffff8120e2ee>] __vfs_write+0xce/0x120
kernel: [<ffffffff8120e999>] vfs_write+0xa9/0x1b0
kernel: [<ffffffff81026a2c>] ? do_audit_syscall_entry+0x6c/0x70
kernel: [<ffffffff8120fa4a>] SyS_pwrite64+0x8a/0xc0
kernel: [<ffffffff81721aae>] system_call_fastpath+0x12/0x71

kernel: INFO: task xfsaild/dm-9:9510 blocked for more than 120 seconds.
kernel: Not tainted 4.1.12-61.1.34.el7uek.x86_64 #2
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: xfsaild/dm-9 D 0000000000000001 0 9510 2 0x00000080
kernel: ffff880112493d18 0000000000000046 ffff880282a3d400 ffff88024dba7000
kernel: ffff880112493d18 ffff880112494000 ffff88024dba7000 0000000000000000
kernel: ffff880280238400 ffff88025bdcf400 ffff880112493d38 ffffffff8171d8b7
kernel: Call Trace:
kernel: [<ffffffff8171d8b7>] schedule+0x37/0x90
kernel: [<ffffffffa02bfc51>] _xfs_log_force+0x191/0x280 [xfs]
kernel: [<ffffffff810b49b0>] ? wake_up_state+0x20/0x20
kernel: [<ffffffff810efb10>] ? internal_add_timer+0xb0/0xb0
kernel: [<ffffffffa02bfd71>] xfs_log_force+0x31/0xb0 [xfs]
kernel: [<ffffffffa02cb2a0>] xfsaild+0x150/0x6a0 [xfs]
kernel: [<ffffffffa02cb150>] ? xfs_trans_ail_cursor_first+0xb0/0xb0 [xfs]
kernel: [<ffffffff810a5708>] kthread+0xd8/0xf0
kernel: [<ffffffff810a5630>] ? kthread_create_on_node+0x1b0/0x1b0
kernel: [<ffffffff81721ea2>] ret_from_fork+0x42/0x70
kernel: [<ffffffff810a5630>] ? kthread_create_on_node+0x1b0/0x1b0

kernel: INFO: task kworker/0:1H:5617 blocked for more than 120 seconds.
kernel: Not tainted 4.1.12-61.1.34.el7uek.x86_64 #2
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kworker/0:1H D 0000000000000000 0 5617 2 0x00000080
kernel: Workqueue: xfs-log/dm-2 xfs_log_worker [xfs]
kernel: ffff88000d1a7cb8 0000000000000046 ffff88024facaa00 ffff880108487000
kernel: ffff8800e7bef528 ffff88000d1a8000 ffff8800e7bef528 0000000000000000
kernel: 0000000000000000 ffff880108487000 ffff88000d1a7cd8 ffffffff8171d8b7
kernel: Call Trace:
kernel: [<ffffffff8171d8b7>] schedule+0x37/0x90
kernel: [<ffffffffa02c00df>] _xfs_log_force_lsn+0x29f/0x2f0 [xfs]
kernel: [<ffffffff810b49b0>] ? wake_up_state+0x20/0x20
kernel: [<ffffffffa02bb903>] xfs_trans_commit+0x1f3/0x240 [xfs]
kernel: [<ffffffffa0292aec>] xfs_sync_sb+0x5c/0xa0 [xfs]
kernel: [<ffffffffa02bfe3d>] xfs_log_worker+0x4d/0x50 [xfs]
kernel: [<ffffffff8109f135>] process_one_work+0x155/0x460
kernel: [<ffffffff8109f932>] worker_thread+0x112/0x520
kernel: [<ffffffff8109f820>] ? rescuer_thread+0x3e0/0x3e0
kernel: [<ffffffff810a5708>] kthread+0xd8/0xf0
kernel: [<ffffffff810a5630>] ? kthread_create_on_node+0x1b0/0x1b0
kernel: [<ffffffff81721ea2>] ret_from_fork+0x42/0x70
kernel: [<ffffffff810a5630>] ? kthread_create_on_node+0x1b0/0x1b0

kernel: INFO: task kworker/u30:0:6575 blocked for more than 120 seconds.
kernel: Not tainted 4.1.12-61.1.28.el7uek.x86_64 #2
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kernel: kworker/u30:0 D ffff88028e897840 0 6575 2 0x00000080
kernel: Workqueue: writeback bdi_writeback_workfn (flush-251:8)
kernel: ffff88021b48b468 0000000000000046 ffff880282a83800 ffff88027fe19c00
kernel: 0000003700000034 ffff88021b48c000 0000000000000000 7fffffffffffffff
kernel: ffff88027fafaec8 ffff88028e8a0700 ffff88021b48b488 ffffffff8171d617
kernel: Call Trace:
kernel: [<ffffffff8171d617>] schedule+0x37/0x90
kernel: [<ffffffff8172067c>] schedule_timeout+0x24c/0x2c0
kernel: [<ffffffffa02cece7>] ? xfs_mod_fdblocks+0x47/0x1a0 [xfs]
kernel: [<ffffffff8100bc65>] ? xen_clocksource_get_cycles+0x15/0x20
kernel: [<ffffffff810f95be>] ? ktime_get+0x3e/0xa0
kernel: [<ffffffff8171cbc6>] io_schedule_timeout+0xa6/0x110
kernel: [<ffffffff81314d27>] bt_get+0x157/0x1e0
kernel: [<ffffffff810cb6e0>] ? prepare_to_wait_event+0xf0/0xf0
kernel: [<ffffffff8131527f>] blk_mq_get_tag+0xbf/0xf0
kernel: [<ffffffff8131051b>] __blk_mq_alloc_request+0x1b/0x200
kernel: [<ffffffff81312aff>] blk_mq_map_request+0x1cf/0x250
kernel: [<ffffffff81313c6d>] blk_mq_make_request+0x7d/0x2d0
kernel: [<ffffffff81303b60>] generic_make_request+0xe0/0x130
kernel: [<ffffffff81303c27>] submit_bio+0x77/0x150
kernel: [<ffffffffa02aeb73>] xfs_submit_ioend_bio.isra.14+0x33/0x40 [xfs]
kernel: [<ffffffffa02aec4a>] xfs_submit_ioend+0xca/0x130 [xfs]
kernel: [<ffffffffa02af608>] xfs_vm_writepage+0x2c8/0x630 [xfs]
kernel: [<ffffffff8119c797>] __writepage+0x17/0x40

The Sar data in the VM guest and OSWatcher in the dom0 show no resource issue, Linux OS seems mostly idle.

No other issues found in message files on dom0 and domU.

Enabling hung_task_panic in the kernel line in one of the node generated a vmcore in the next occurrence of the issue.

The vmcore shows nr_grant_frames reached 64.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms