My Oracle Support Banner

[PCA 2.x] Compute Node Kernel Panic - "not syncing: MLX4 device reset due to unrecoverable catastrophic failure" (Doc ID 2376718.1)

Last updated on JANUARY 10, 2025

Applies to:

Private Cloud Appliance - Version 2.3.1 and later
Linux x86-64

Symptoms

Compute Node gets a kernel panic during boot while loading the Infiniband driver and restarts.

ILOM Console shows the following stack trace:


%Gdetecting hardware...
waiting for hardware to initialize...
[  149.754120] Kernel panic - not syncing: MLX4 device reset due to unrecoverable catastrophic failure
[  149.754120]
[  149.880131] CPU: 61 PID: 1379 Comm: modprobe Not tainted 4.1.12-103.3.8.el6uek.x86_64 #2
[  149.976979] Hardware name: Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U, BIOS 30120100 05/09/2017
[  150.095674]  0000000000000000 ffff8801eeb4f418 ffffffff816e4313 ffffffffa0137608
[  150.184645]  ffff880004900000 ffff8801eeb4f498 ffffffff816e406f ffff880100000008
[  150.273613]  ffff8801eeb4f4a8 ffff8801eeb4f448 ffff8801eeb4f4a8 ffff8801eeb4f468
[  150.362582] Call Trace:
[  150.391821]  [] dump_stack+0x63/0x88
[  150.453305]  [] panic+0xcc/0x21b
[  150.510632]  [] mlx4_enter_error_state+0xba/0xf0 [mlx4_core]
[  150.597086]  [] mlx4_cmd_reset_flow+0x38/0x60 [mlx4_core]
[  150.680416]  [] mlx4_cmd_poll+0xc1/0x2e0 [mlx4_core]
[  150.758549]  [] __mlx4_cmd+0xb0/0x160 [mlx4_core]
[  150.833587]  [] mlx4_SENSE_PORT+0x54/0xd0 [mlx4_core]
[  150.912763]  [] mlx4_dev_cap+0x4a4/0xb50 [mlx4_core]
[  150.990891]  [] mlx4_init_hca+0x4f/0x9f0 [mlx4_core]
[  151.069021]  [] ? pick_next_entity+0x80/0x140
[  151.139869]  [] ? pick_next_task_fair+0x9a/0x200
[  151.213838]  [] ? up+0x2f/0x50
[  151.269086]  [] ? mlx4_cmd_poll+0x136/0x2e0 [mlx4_core]
[  151.350334]  [] ? mlx4_cmd_poll+0xcc/0x2e0 [mlx4_core]
[  151.430545]  [] ? _raw_spin_lock_irqsave+0x1e/0xa0
[  151.506592]  [] ? _raw_spin_unlock_irqrestore+0x20/0x50
[  151.587841]  [] ? dma_pool_free+0xa9/0xd0
[  151.654531]  [] ? mlx4_free_cmd_mailbox+0x31/0x40 [mlx4_core]
[  151.742022]  [] ? mlx4_MOD_STAT_CFG+0x7e/0xa0 [mlx4_core]
[  151.825378]  [] ? mlx4_MAP_FA+0x1d/0x20 [mlx4_core]
[  151.902468]  [] mlx4_load_one+0x217/0xcd0 [mlx4_core]

Changes

No changes.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.