[PCA 2.x] Compute Node Kernel Panic - "not syncing: MLX4 device reset due to unrecoverable catastrophic failure"
(Doc ID 2376718.1)
Last updated on JANUARY 10, 2025
Applies to:
Private Cloud Appliance - Version 2.3.1 and laterLinux x86-64
Symptoms
Compute Node gets a kernel panic during boot while loading the Infiniband driver and restarts.
ILOM Console shows the following stack trace:
%Gdetecting hardware...
waiting for hardware to initialize...
[ 149.754120] Kernel panic - not syncing: MLX4 device reset due to unrecoverable catastrophic failure
[ 149.754120]
[ 149.880131] CPU: 61 PID: 1379 Comm: modprobe Not tainted 4.1.12-103.3.8.el6uek.x86_64 #2
[ 149.976979] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30120100 05/09/2017
[ 150.095674] 0000000000000000 ffff8801eeb4f418 ffffffff816e4313 ffffffffa0137608
[ 150.184645] ffff880004900000 ffff8801eeb4f498 ffffffff816e406f ffff880100000008
[ 150.273613] ffff8801eeb4f4a8 ffff8801eeb4f448 ffff8801eeb4f4a8 ffff8801eeb4f468
[ 150.362582] Call Trace:
[ 150.391821] [] dump_stack+0x63/0x88
[ 150.453305] [] panic+0xcc/0x21b
[ 150.510632] [] mlx4_enter_error_state+0xba/0xf0 [mlx4_core]
[ 150.597086] [] mlx4_cmd_reset_flow+0x38/0x60 [mlx4_core]
[ 150.680416] [] mlx4_cmd_poll+0xc1/0x2e0 [mlx4_core]
[ 150.758549] [] __mlx4_cmd+0xb0/0x160 [mlx4_core]
[ 150.833587] [] mlx4_SENSE_PORT+0x54/0xd0 [mlx4_core]
[ 150.912763] [] mlx4_dev_cap+0x4a4/0xb50 [mlx4_core]
[ 150.990891] [] mlx4_init_hca+0x4f/0x9f0 [mlx4_core]
[ 151.069021] [] ? pick_next_entity+0x80/0x140
[ 151.139869] [] ? pick_next_task_fair+0x9a/0x200
[ 151.213838] [] ? up+0x2f/0x50
[ 151.269086] [] ? mlx4_cmd_poll+0x136/0x2e0 [mlx4_core]
[ 151.350334] [] ? mlx4_cmd_poll+0xcc/0x2e0 [mlx4_core]
[ 151.430545] [] ? _raw_spin_lock_irqsave+0x1e/0xa0
[ 151.506592] [] ? _raw_spin_unlock_irqrestore+0x20/0x50
[ 151.587841] [] ? dma_pool_free+0xa9/0xd0
[ 151.654531] [] ? mlx4_free_cmd_mailbox+0x31/0x40 [mlx4_core]
[ 151.742022] [] ? mlx4_MOD_STAT_CFG+0x7e/0xa0 [mlx4_core]
[ 151.825378] [] ? mlx4_MAP_FA+0x1d/0x20 [mlx4_core]
[ 151.902468] [] mlx4_load_one+0x217/0xcd0 [mlx4_core]
Changes
No changes.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |