My Oracle Support Banner

System Shows High Load Average with Many Threads in "D" State with waitchannel on mlx4_table_put and mlx4_table_get (Doc ID 2488613.1)

Last updated on AUGUST 23, 2020

Applies to:

Linux OS - Version Oracle Linux 6.4 and later
Linux x86
Linux x86-64

Symptoms

System shows high load average with many threads in "D" state with waitchannel on mlx4_table_put and mlx4_table_get.

Hung stack traces are show as below

Sep 28 09:15:11 xxxhostname kernel: [2883791.536166] Tainted: P O 4.1.12-61.47.1.el6uek.x86_64 #2
Sep 28 09:15:11 xxxhostname kernel: [2883791.544429] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Sep 28 09:15:11 xxxhostname kernel: [2883791.553797] kworker/u144:2 D ffff885eff5976c0 0 302433 2 0x00000080
Sep 28 09:15:11 xxxhostname kernel: [2883791.553813] Workqueue: ipoib_wq ipoib_cm_tx_start [ib_ipoib]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553815] ffff8860be0ab948 0000000000000046 ffff887bfded0000 ffff885ecda18000
Sep 28 09:15:11 xxxhostname kernel: [2883791.553817] ffff887f51638e00 ffff8860be0a8008 ffff887bfded0000 ffff8821bc403204
Sep 28 09:15:11 xxxhostname kernel: [2883791.553819] ffff8821bc403208 00000000ffffffff ffff8860be0ab968 ffffffff8169993e
Sep 28 09:15:11 xxxhostname kernel: [2883791.553822] Call Trace:
Sep 28 09:15:11 xxxhostname kernel: [2883791.553829] [<ffffffff8169993e>] schedule+0x3e/0x90
Sep 28 09:15:11 xxxhostname kernel: [2883791.553832] [<ffffffff81699bae>] schedule_preempt_disabled+0xe/0x10
Sep 28 09:15:11 xxxhostname kernel: [2883791.553835] [<ffffffff8169b6e5>] __mutex_lock_slowpath+0x95/0x110
Sep 28 09:15:11 xxxhostname kernel: [2883791.553837] [<ffffffff8169b783>] mutex_lock+0x23/0x40
Sep 28 09:15:11 xxxhostname kernel: [2883791.553848] [<ffffffffa0cabc0e>] mlx4_table_get+0x5e/0x130 [mlx4_core]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553858] [<ffffffffa0cbc348>] __mlx4_qp_alloc_icm+0xb8/0x140 [mlx4_core]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553863] [<ffffffffa0c9bca3>] ? mlx4_zone_alloc_entries+0x83/0xb0 [mlx4_core]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553870] [<ffffffffa0cbc424>] mlx4_qp_alloc+0x54/0x180 [mlx4_core]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553876] [<ffffffffa0cbc769>] ? mlx4_qp_reserve_range+0x79/0x90 [mlx4_core]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553884] [<ffffffffa0e0ce47>] create_qp_common+0x5b7/0xc00 [mlx4_ib]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553889] [<ffffffffa0e0d5e3>] mlx4_ib_create_qp+0x153/0x2b0 [mlx4_ib]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553893] [<ffffffff811c935b>] ? __vmalloc_node_range+0x8b/0xd0
Sep 28 09:15:11 xxxhostname kernel: [2883791.553904] [<ffffffffa0b7caaf>] ib_create_qp_ex+0x3f/0x1c0 [ib_core]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553909] [<ffffffffa0b7cc40>] ib_create_qp+0x10/0x20 [ib_core]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553913] [<ffffffffa0ea047b>] ipoib_cm_tx_init+0xfb/0x2d0 [ib_ipoib]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553917] [<ffffffffa0ea1e79>] ipoib_cm_tx_start+0x229/0x3d0 [ib_ipoib]
Sep 28 09:15:11 xxxhostname kernel: [2883791.553921] [<ffffffff810c0246>] ? idle_balance+0x256/0x2b0
Sep 28 09:15:11 xxxhostname kernel: [2883791.553926] [<ffffffff8109f05e>] process_one_work+0x14e/0x4b0
Sep 28 09:15:11 xxxhostname kernel: [2883791.553929] [<ffffffff8109f4e0>] worker_thread+0x120/0x480
Sep 28 09:15:11 xxxhostname kernel: [2883791.553931] [<ffffffff816992a9>] ? __schedule+0x309/0x890
Sep 28 09:15:11 xxxhostname kernel: [2883791.553934] [<ffffffff8109f3c0>] ? process_one_work+0x4b0/0x4b0
Sep 28 09:15:11 xxxhostname kernel: [2883791.553937] [<ffffffff8109f3c0>] ? process_one_work+0x4b0/0x4b0
Sep 28 09:15:11 xxxhostname kernel: [2883791.553940] [<ffffffff810a46be>] kthread+0xce/0xf0
Sep 28 09:15:11 xxxhostname kernel: [2883791.553943] [<ffffffff8102587c>] ? do_audit_syscall_entry+0x6c/0x70
Sep 28 09:15:11 xxxhostname kernel: [2883791.553945] [<ffffffff810a45f0>] ? kthread_freezable_should_stop+0x70/0x70
Sep 28 09:15:11 xxxhostname kernel: [2883791.553948] [<ffffffff8169df62>] ret_from_fork+0x42/0x70
Sep 28 09:15:11 xxxhostname kernel: [2883791.553949] [<ffffffff810a45f0>] ? kthread_freezable_should_stop+0x70/0x70

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.