Oracle Linux: BDA Node Unexpected Reboot as Process cmfagent Causing High Load with "cpu_rt_runtime_write at ffffffff810affdb"
(Doc ID 2637676.1)
Last updated on JANUARY 18, 2023
Applies to:
Linux OS - Version Oracle Linux 7.0 and laterLinux x86-64
Symptoms
The node unexpectedly rebooted with back trace from crash dump:
crash7latest> sys
KERNEL: /share/linuxrpm/vmlinux_repo/64/4.1.12-103.9.7.el6uek.x86_64/vmlinux
DUMPFILE: vmcore [PARTIAL DUMP]
CPUS: 96
DATE: Sun Jan 19 06:39:51 2020
UPTIME: 219 days, 18:39:06
LOAD AVERAGE: 1212.71, 904.63, 666.79
TASKS: 5871
NODENAME: <hostname>
RELEASE: 4.1.12-103.9.7.el6uek.x86_64
VERSION: #2 SMP Mon Nov 20 18:00:08 PST 2017
MACHINE: x86_64 (2100 Mhz)
MEMORY: 766.7 GB
PANIC: "BUG: unable to handle kernel NULL pointer dereference at (null)"
crash7latest> bt
PID: 2 TASK: ffff885e8e870e00 CPU: 42 COMMAND: "kthreadd"
#0 [ffff88bebda868b8] machine_kexec at ffffffff8105e290
#1 [ffff88bebda86928] crash_kexec at ffffffff81111e08
#2 [ffff88bebda869f8] panic at ffffffff816e45ae
#3 [ffff88bebda86a78] nmi_panic at ffffffff81086f5f
#4 [ffff88bebda86a88] watchdog_overflow_callback at ffffffff81137eb6
#5 [ffff88bebda86aa8] __perf_event_overflow at ffffffff8117cf27
#6 [ffff88bebda86b28] perf_event_overflow at ffffffff8117d804
#7 [ffff88bebda86b38] intel_pmu_handle_irq at ffffffff81037b64
#8 [ffff88bebda86de0] perf_event_nmi_handler at ffffffff8102ded4
#9 [ffff88bebda86e10] nmi_handle at ffffffff8101b067
#10 [ffff88bebda86e90] default_do_nmi at ffffffff8101b37e
#11 [ffff88bebda86ec0] do_nmi at ffffffff8101b545
#12 [ffff88bebda86ef0] end_repeat_nmi at ffffffff816ebcb5
[exception RIP: queued_write_lock_slowpath+205]
RIP: ffffffff810d0a1d RSP: ffff885e8e87fd48 RFLAGS: 00000002
RAX: 000000000000fd1a RBX: 0000000000800711 RCX: ffffffff81ab5080
RDX: ffffffff81ab5084 RSI: 000000000000fd1e RDI: 000000000000474d
RBP: ffff885e8e87fd68 R8: 000000000000fd1e R9: ffff885e8e87fd58
R10: 000000000000fd1a R11: 0000000000000004 R12: ffff88606c277000
R13: ffff88606c277808 R14: ffffffff810a6fd0 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#13 [ffff885e8e87fd48] queued_write_lock_slowpath at ffffffff810d0a1d
#14 [ffff885e8e87fd70] _raw_write_lock_irq at ffffffff816e8cbd
#15 [ffff885e8e87fd80] copy_process at ffffffff81085b05
#16 [ffff885e8e87fe10] do_fork at ffffffff810866c9
#17 [ffff885e8e87fe80] kernel_thread at ffffffff81086956
#18 [ffff885e8e87fe90] kthreadd at ffffffff810a735c
#19 [ffff885e8e87ff50] ret_from_fork at ffffffff816e9962
!! Watchdog detected the hanging and issue NMI to trigger the crash.
crash7latest> ps | grep java | wc -l
3912 <<< almost 4000 java processes
Load average:
LOAD AVERAGE: 1212.71, 904.63, 666.79
Load average is very high with almost 4000 Java processes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |