System hangs with high CPU load due to spinlock with [exception RIP: __ticket_spin_lock+25]

(Doc ID 2404387.1)

Last updated on MAY 30, 2018

Applies to:

Linux OS - Version Oracle Linux 6.8 with Unbreakable Enterprise Kernel [2.6.39] to Oracle Linux 6.9 with Unbreakable Enterprise Kernel [2.6.39] [Release OL6U8 to OL6U9]
Linux x86-64

Symptoms

 The system hangs with high CPU utlization due to the CPU being stuck in a spinlock. vmstat will show large number of runnable tasks and mpstat shows high %sys CPU utilization

e.g.

vmstat:

time-- procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st

10:05:34 188 3 1488 1265500 265212 122424232 0 0 25 228 69107 21287 1 60 40 0 0
10:05:40 198 3 1488 1261676 265256 122424280 0 0 0 106 73722 19611 0 68 31 0 0
10:05:45 202 3 1488 1289648 265300 122424448 0 0 12 98 74835 19402 0 70 29 0 0
10:05:50 211 3 1488 1316488 265368 122424544 0 0 2 206 71913 16482 0 72 27 0 0
10:05:55 214 3 1488 1391300 265444 122424640 0 0 0 298 75660 18041 0 75 25 0 0
10:06:00 226 3 1488 1384620 265500 122424768 0 0 0 131 80243 18840 0 79 21 0 0
10:06:05 279 3 1488 1384928 265548 122424856 0 0 0 152 77076 25588 2 81 17 0 0
10:06:10 315 3 1488 1380084 265572 122424928 0 0 0 146 79785 141938 19 81 0 0 0

 

mpstat:

10:05:36 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
10:05:41 all 0.45 0.00 63.24 0.00 0.00 0.04 0.00 0.00 36.27
10:05:41 0 0.00 0.00 99.80 0.00 0.00 0.20 0.00 0.00 0.00
10:05:41 1 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
10:05:41 2 0.00 0.00 99.80 0.00 0.00 0.20 0.00 0.00 0.00
10:05:41 3 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
10:05:41 4 0.00 0.00 99.80 0.00 0.00 0.20 0.00 0.00 0.00
10:05:41 5 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
10:05:41 6 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
10:05:41 7 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00
10:05:41 8 14.81 0.00 6.79 0.00 0.00 0.21 0.00 0.00 78.19
10:05:41 9 0.20 0.00 88.76 0.00 0.00 0.00 0.00 0.00 11.04
10:05:41 10 0.00 0.00 99.80 0.00 0.00 0.20 0.00 0.00 0.00
10:05:41 11 2.02 0.00 1.42 0.00 0.00 0.20 0.00 0.00 96.36
10:05:41 12 0.00 0.00 99.80 0.00 0.00 0.20 0.00 0.00 0.00

An analysis of a vmcore obtained from a crash dump when the system hung shows two deadlock threads:

 PID: 38445 TASK: ffff881072a1c600 CPU: 19 COMMAND: "ggcmd"
#0 [ffff88407f026e70] crash_nmi_callback at ffffffff810326c6
#1 [ffff88407f026e80] notifier_call_chain at ffffffff81513115
#2 [ffff88407f026ec0] atomic_notifier_call_chain at ffffffff8151317a
#3 [ffff88407f026ed0] notify_die at ffffffff815131ae
#4 [ffff88407f026f00] default_do_nmi at ffffffff815106b9
#5 [ffff88407f026f30] do_nmi at ffffffff81510840
#6 [ffff88407f026f50] nmi at ffffffff8150fc10
[exception RIP: __ticket_spin_lock+28]
RIP: ffffffff81040fec RSP: ffff881070b8fb48 RFLAGS: 00000297
RAX: 000000000000a41c RBX: ffff881fbd8e2cc4 RCX: 0000000000051000
RDX: 000000000000a41b RSI: ffff8811edefac50 RDI: ffff881fbd8e2cc4
RBP: ffff881070b8fb48 R8: ffff8811edefac58 R9: 0000000000000003
R10: ffff88407ffd8e00 R11: 000000000000007d R12: ffff881fbd8e2cc0
R13: ffff8811edefac50 R14: ffff8811edefac58 R15: ffff8811edefac50
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#7 [ffff881070b8fb48] __ticket_spin_lock at ffffffff81040fec
#8 [ffff881070b8fb50] _raw_spin_lock at ffffffff8150f16e
#9 [ffff881070b8fb60] fuse_request_send_background_locked at
ffffffffa00ffa97 [fuse]
#10 [ffff881070b8fb90] fuse_send_writepage at ffffffffa0108301 [fuse]
#11 [ffff881070b8fbc0] fuse_flush_writepages at ffffffffa01083f3 [fuse]
#12 [ffff881070b8fc00] fuse_writepage_locked at ffffffffa0108683 [fuse]
#13 [ffff881070b8fc60] fuse_writepage at ffffffffa010875e [fuse]
#14 [ffff881070b8fc80] __writepage at ffffffff8111a8a7
#15 [ffff881070b8fca0] write_cache_pages at ffffffff8111bc06
#16 [ffff881070b8fdd0] generic_writepages at ffffffff8111bf31
#17 [ffff881070b8fe30] do_writepages at ffffffff8111bf95
#18 [ffff881070b8fe40] __filemap_fdatawrite_range at ffffffff8111166b
#19 [ffff881070b8fe90] filemap_fdatawrite at ffffffff8111193f
#20 [ffff881070b8fea0] filemap_write_and_wait at ffffffff81111985
#21 [ffff881070b8fec0] fuse_vma_close at ffffffffa010662c [fuse]
#22 [ffff881070b8fed0] remove_vma at ffffffff8113c8b3
#23 [ffff881070b8fef0] do_munmap at ffffffff8113e8cf
#24 [ffff881070b8ff50] sys_munmap at ffffffff8113e9e6
#25 [ffff881070b8ff80] system_call_fastpath at ffffffff81517622
PID: 378084 TASK: ffff8825421942c0 CPU: 2 COMMAND: "dbfs_client"
#0 [ffff88207f846e70] crash_nmi_callback at ffffffff810326c6
#1 [ffff88207f846e80] notifier_call_chain at ffffffff81513115
#2 [ffff88207f846ec0] atomic_notifier_call_chain at ffffffff8151317a
#3 [ffff88207f846ed0] notify_die at ffffffff815131ae
#4 [ffff88207f846f00] default_do_nmi at ffffffff815106b9
#5 [ffff88207f846f30] do_nmi at ffffffff81510840
#6 [ffff88207f846f50] nmi at ffffffff8150fc10
[exception RIP: __ticket_spin_lock+25]
RIP: ffffffff81040fe9 RSP: ffff8801f6d3b8e8 RFLAGS: 00000297
RAX: 00000000000068f8 RBX: 0000000000021000 RCX: ffff881fbd8e2d50
RDX: 00000000000068f7 RSI: ffff8801f6d3ba78 RDI: ffff883127828000
RBP: ffff8801f6d3b8e8 R8: ffff8801f6d3ba20 R9: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: ffff883127828000
R13: ffff8801f6d3ba78 R14: ffff881fbd8e2cc4 R15: ffff881fbd8e2cc0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#7 [ffff8801f6d3b8e8] __ticket_spin_lock at ffffffff81040fe9
#8 [ffff8801f6d3b8f0] _raw_spin_lock at ffffffff8150f16e
#9 [ffff8801f6d3b900] fuse_get_unique at ffffffffa00fe2ce [fuse]
#10 [ffff8801f6d3b920] fuse_read_batch_forget at ffffffffa00fe820 [fuse]
#11 [ffff8801f6d3b9a0] fuse_dev_do_read at ffffffffa010052c [fuse]
#12 [ffff8801f6d3ba70] fuse_dev_read at ffffffffa0100984 [fuse]
#13 [ffff8801f6d3baf0] do_sync_read at ffffffff8116da52
#14 [ffff8801f6d3bc00] vfs_read at ffffffff8116e195
#15 [ffff8801f6d3bc30] sys_read at ffffffff8116e361
#16 [ffff8801f6d3bc80] _read_orig at ffffffffa05f411d
[krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#17 [ffff8801f6d3bce0] syscall_wrappers_generic_flow_with_param at
ffffffffa05f0cc6 [krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#18 [ffff8801f6d3bdb0] syscall_wrappers_generic_read.clone.2 at
ffffffffa05f136b [krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#19 [ffff8801f6d3bee0] SYS_read_common_wrap at ffffffffa05f6085
[krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#20 [ffff8801f6d3bf70] SYS_read_wrap64 at ffffffffa05f617e
[krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#21 [ffff8801f6d3bf80] system_call_fastpath at ffffffff81517622

and:

PID: 378084 TASK: ffff8825421942c0 CPU: 2 COMMAND: "dbfs_client"
#0 [ffff88207f846e70] crash_nmi_callback at ffffffff810326c6
#1 [ffff88207f846e80] notifier_call_chain at ffffffff81513115
#2 [ffff88207f846ec0] atomic_notifier_call_chain at ffffffff8151317a
#3 [ffff88207f846ed0] notify_die at ffffffff815131ae
#4 [ffff88207f846f00] default_do_nmi at ffffffff815106b9
#5 [ffff88207f846f30] do_nmi at ffffffff81510840
#6 [ffff88207f846f50] nmi at ffffffff8150fc10
[exception RIP: __ticket_spin_lock+25]
RIP: ffffffff81040fe9 RSP: ffff8801f6d3b8e8 RFLAGS: 00000297
RAX: 00000000000068f8 RBX: 0000000000021000 RCX: ffff881fbd8e2d50
RDX: 00000000000068f7 RSI: ffff8801f6d3ba78 RDI: ffff883127828000
RBP: ffff8801f6d3b8e8 R8: ffff8801f6d3ba20 R9: 0000000000000001
R10: 0000000000000001 R11: 0000000000000001 R12: ffff883127828000
R13: ffff8801f6d3ba78 R14: ffff881fbd8e2cc4 R15: ffff881fbd8e2cc0
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#7 [ffff8801f6d3b8e8] __ticket_spin_lock at ffffffff81040fe9
#8 [ffff8801f6d3b8f0] _raw_spin_lock at ffffffff8150f16e
#9 [ffff8801f6d3b900] fuse_get_unique at ffffffffa00fe2ce [fuse]
#10 [ffff8801f6d3b920] fuse_read_batch_forget at ffffffffa00fe820 [fuse]
#11 [ffff8801f6d3b9a0] fuse_dev_do_read at ffffffffa010052c [fuse]
#12 [ffff8801f6d3ba70] fuse_dev_read at ffffffffa0100984 [fuse]
#13 [ffff8801f6d3baf0] do_sync_read at ffffffff8116da52
#14 [ffff8801f6d3bc00] vfs_read at ffffffff8116e195
#15 [ffff8801f6d3bc30] sys_read at ffffffff8116e361
#16 [ffff8801f6d3bc80] _read_orig at ffffffffa05f411d
[krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#17 [ffff8801f6d3bce0] syscall_wrappers_generic_flow_with_param at
ffffffffa05f0cc6 [krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#18 [ffff8801f6d3bdb0] syscall_wrappers_generic_read.clone.2 at
ffffffffa05f136b [krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#19 [ffff8801f6d3bee0] SYS_read_common_wrap at ffffffffa05f6085
[krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#20 [ffff8801f6d3bf70] SYS_read_wrap64 at ffffffffa05f617e
[krg_10_5_0_3021_impOEL6-UEK4-smp-x86_64]
#21 [ffff8801f6d3bf80] system_call_fastpath at ffffffff81517622

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms