Kernel Panic in ueknano kernel - exception RIP: multi_cpu_stop

(Doc ID 2322622.1)

Last updated on DECEMBER 03, 2017

Applies to:

Linux OS - Version Oracle Linux 6.9 with Unbreakable Enterprise Kernel [4.1.12] and later
Linux x86-64

Symptoms

With Kernel linux-4.1.12-61.43.1.el6uek, CPU Stall result and node reboots.

@ vmcore
-----------------------

KERNEL: ./4.1.12-61.33.1.el6uek.x86_64/vmlinux
DUMPFILE: hostZ.vmcore [PARTIAL DUMP]
...
TASKS: 4757
NODENAME: hostZ
RELEASE: 4.1.12-61.33.1.el6uek.x86_64
VERSION: #2 SMP Tue Mar 14 13:16:51 PDT 2017
MACHINE: x86_64 (2294 Mhz)
...
PANIC: "Kernel panic - not syncing: softlockup: hung tasks"
PID: 163
COMMAND: "migration/22"
TASK: ffff887f64cac600 [THREAD_INFO: ffff887f64cc0000]
CPU: 22
STATE: TASK_RUNNING (PANIC)

crash64> bt -l

PID: 163 TASK: ffff887f64cac600 CPU: 22 COMMAND: "migration/22"
#0 [ffff887f7db03c90] machine_kexec at ffffffff8105de40
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/arch/x86/kernel/machine_kexec_64.c: 320
#1 [ffff887f7db03d00] crash_kexec at ffffffff811102b8
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/kernel/kexec.c: 1503
#2 [ffff887f7db03dd0] panic at ffffffff81698a2c
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/arch/x86/include/asm/smp.h: 95
#3 [ffff887f7db03e50] watchdog_timer_fn at ffffffff81135d29
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/kernel/watchdog.c: 410
#4 [ffff887f7db03e90] __run_hrtimer at ffffffff810efc87
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/include/trace/events/timer.h: 247
#5 [ffff887f7db03ee0] hrtimer_interrupt at ffffffff810f0072
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/include/linux/timerqueue.h: 37
#6 [ffff887f7db03f70] local_apic_timer_interrupt at ffffffff81054629
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/arch/x86/kernel/apic/apic.c: 900
#7 [ffff887f7db03f90] smp_apic_timer_interrupt at ffffffff816a0931
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/arch/x86/include/asm/apic.h: 651
#8 [ffff887f7db03fb0] apic_timer_interrupt at ffffffff8169e87e
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/arch/x86/kernel/entry_64.S: 921
--- <IRQ stack> ---
#9 [ffff887f64cc3cc8] apic_timer_interrupt at ffffffff8169e87e
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/arch/x86/kernel/entry_64.S: 921
[exception RIP: multi_cpu_stop+106]
RIP: ffffffff81122f4a RSP: ffff887f64cc3d78 RFLAGS: 00000293
RAX: 0000000000000001 RBX: 0000000000000020 RCX: dead000000000200
RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff8850ee04fa88
RBP: ffff887f64cc3da8 R8: ffff8850ee04fae8 R9: 0000000000007fe9
R10: 0000000000000000 R11: 0000000000003f00 R12: ffff88807e801040
R13: 0000000000000000 R14: 0000000200000000 R15: 0000000000000000
ORIG_RAX: ffffffffffffff10 CS: 0010 SS: 0000
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/kernel/stop_machine.c: 208
#10 [ffff887f64cc3db0] cpu_stopper_thread at ffffffff81122ba2
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/kernel/stop_machine.c: 474
#11 [ffff887f64cc3e80] smpboot_thread_fn at ffffffff810a7f46
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/kernel/smpboot.c: 162
#12 [ffff887f64cc3ec0] kthread at ffffffff810a465e
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/kernel/kthread.c: 207
#13 [ffff887f64cc3f50] ret_from_fork at ffffffff8169dda2
/usr/src/debug/kernel-4.1.12/linux-4.1.12-61.33.1.el6uek/arch/x86/kernel/entry_64.S: 640

crash64> p saved_command_line

saved_command_line = $1 = 0xffff88807ffa8d80 "root=LABEL=DBSYS bootarea=dbsys bootfrom=BOOT ro loglevel=7 panic=60 debug pci=noaer log_buf_len=1m nmi_watchdog=0 transparent_hugepage=never rd_NO_PLYMOUTH audit=1 console=tty1 console=ttyS0,115200n8 crashkernel=448M@128M numa=on"

Note that NUMA has been enabled in the kernel command line (numa=on).

crash64> log

...
[8126534.855068] INFO: rcu_sched self-detected stall on CPU
[8126534.856066] INFO: rcu_sched self-detected stall on CPU
[8126534.856067] INFO: rcu_sched self-detected stall on CPU
[8126534.856068] INFO: rcu_sched self-detected stall on CPU
[8126534.856070] INFO: rcu_sched self-detected stall on CPU
...
[8126607.440615] NMI watchdog: BUG: soft lockup - CPU#2 stuck for 67s! [migration/2:22]
[8126607.441245] NMI watchdog: BUG: soft lockup - CPU#22 stuck for 67s! [migration/22:163]
...
...skipping...
[8126607.441296] CPU: 22 PID: 163 Comm: migration/22 Tainted: P O 4.1.12-61.33.1.el6uek.x86_64 #2
[8126607.441297] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30110000 03/03/2017
[8126607.441299] task: ffff887f64cac600 ti: ffff887f64cc0000 task.ti: ffff887f64cc0000
[8126607.441323] RIP: 0010:[<ffffffff81122f4a>] [<ffffffff81122f4a>] multi_cpu_stop+0x6a/0xf0
[8126607.441323] RSP: 0000:ffff887f64cc3d78 EFLAGS: 00000293
[8126607.441324] RAX: 0000000000000001 RBX: 0000000000000020 RCX: dead000000000200
[8126607.441324] RDX: 0000000000000000 RSI: 0000000000000286 RDI: ffff8850ee04fa88
[8126607.441325] RBP: ffff887f64cc3da8 R08: ffff8850ee04fae8 R09: 0000000000007fe9
[8126607.441325] R10: 0000000000000000 R11: 0000000000003f00 R12: ffff88807e801040
[8126607.441326] R13: 0000000000000000 R14: 0000000200000000 R15: 0000000000000000
[8126607.441326] FS: 0000000000000000(0000) GS:ffff887f7db00000(0000) knlGS:0000000000000000
[8126607.441327] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[8126607.441328] CR2: 00007fd7e9615880 CR3: 0000000001a42000 CR4: 00000000001406e0
[8126607.441328] Stack:
[8126607.441329] ffff887f64cc3d88 ffff8850ee04fa58 ffff887f7db110e0 ffffffff81122ee0
[8126607.441330] ffff8850ee04fa88 ffff887f7db110e8 ffff887f64cc3e78 ffffffff81122ba2
[8126607.441331] 0000000000000000 ffff887f64cacf50 ffffffff816ca820 ffff887f7db176c0
[8126607.441332] Call Trace:
[8126607.441335] [<ffffffff81122ee0>] ? irq_cpu_stop_queue_work+0x30/0x30
[8126607.441337] [<ffffffff81122ba2>] cpu_stopper_thread+0x52/0x160
[8126607.441342] [<ffffffff816990f9>] ? __schedule+0x309/0x890
[8126607.441359] [<ffffffff810a7e30>] ? smpboot_create_threads+0x80/0x80
[8126607.441361] [<ffffffff810a7f46>] smpboot_thread_fn+0x116/0x170
[8126607.441363] [<ffffffff810a465e>] kthread+0xce/0xf0
[8126607.441364] [<ffffffff810a4590>] ? kthread_freezable_should_stop+0x70/0x70
[8126607.441367] [<ffffffff8169dda2>] ret_from_fork+0x42/0x70
[8126607.441380] [<ffffffff810a4590>] ? kthread_freezable_should_stop+0x70/0x70
[8126607.441403] Code: 45 19 ed 45 85 ed 41 0f 95 c7 4c 8d 73 24 31 d2 31 c0 eb 1a 0f 1f 44 00 00 41 83 fd 03 74 62 f0 41 ff 0e 74 2c 41 83 fd 04 74 3b <44> 89 e8 f3 90 44 8b 6b 20 41 39 c5 74 ec 41 83 fd 02 75 da fa
[8126607.441404] Kernel panic - not syncing: softlockup: hung tasks
[8126607.441406] CPU: 22 PID: 163 Comm: migration/22 Tainted: P O L 4.1.12-61.33.1.el6uek.x86_64 #2
[8126607.441406] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30110000 03/03/2017
[8126607.441407] 0000000000000000 ffff887f7db03dc8 ffffffff81698cc0 ffffffff8195a0fc
[8126607.441408] ffff887f64cc3cc8 ffff887f7db03e48 ffffffff81698a1c ffff887f00000008
[8126607.441409] ffff887f7db03e58 ffff887f7db03df8 0000000000000086 000000000000fffe
[8126607.441409] Call Trace:
[8126607.441411] <IRQ> [<ffffffff81698cc0>] dump_stack+0x63/0x83
[8126607.441412] [<ffffffff81698a1c>] panic+0xc1/0x210
[8126607.441428] [<ffffffff81135d29>] watchdog_timer_fn+0x1f9/0x200
[8126607.441431] [<ffffffff810efc87>] __run_hrtimer+0x87/0x240
[8126607.441432] [<ffffffff81135b30>] ? watchdog+0x50/0x50
[8126607.441434] [<ffffffff810f0072>] hrtimer_interrupt+0x102/0x240
[8126607.441450] [<ffffffff810887c6>] ? __do_softirq+0x1b6/0x350
[8126607.441453] [<ffffffff81054629>] local_apic_timer_interrupt+0x39/0x60
[8126607.441455] [<ffffffff8105b9d4>] ? native_apic_msr_eoi_write+0x14/0x20
[8126607.441472] [<ffffffff816a0931>] smp_apic_timer_interrupt+0x41/0x60
[8126607.441473] [<ffffffff8169e87e>] apic_timer_interrupt+0x6e/0x80
[8126607.441475] <EOI> [<ffffffff81122f4a>] ? multi_cpu_stop+0x6a/0xf0
[8126607.441476] [<ffffffff81122ee0>] ? irq_cpu_stop_queue_work+0x30/0x30
[8126607.441477] [<ffffffff81122ba2>] cpu_stopper_thread+0x52/0x160
[8126607.441479] [<ffffffff816990f9>] ? __schedule+0x309/0x890
[8126607.441480] [<ffffffff810a7e30>] ? smpboot_create_threads+0x80/0x80
[8126607.441482] [<ffffffff810a7f46>] smpboot_thread_fn+0x116/0x170
[8126607.441483] [<ffffffff810a465e>] kthread+0xce/0xf0
[8126607.441484] [<ffffffff810a4590>] ? kthread_freezable_should_stop+0x70/0x70
[8126607.441485] [<ffffffff8169dda2>] ret_from_fork+0x42/0x70
[8126607.441486] [<ffffffff810a4590>] ? kthread_freezable_should_stop+0x70/0x70

 

Changes

None.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms