My Oracle Support Banner

Oracle Linux: Kernel Panic with "MLX4 device reset due to unrecoverable catastrophic failure" (Doc ID 2370779.1)

Last updated on JUNE 24, 2020

Applies to:

Linux OS - Version Oracle Linux 6.7 with Unbreakable Enterprise Kernel [4.1.12] to Oracle Linux 7.3 with Unbreakable Enterprise Kernel [4.1.12] [Release OL6U7 to OL7U3]
Oracle Cloud Infrastructure - Version N/A and later
Information in this document applies to any platform.

Symptoms

Kernel panic occurs with the following panic message:


[9100873.191478] RDS/IB: connection <##.##.##.##,##.##.##.##,#> dropped due to 'rds_rdma module unload'
[9100875.223951] mlx4_core 0000:03:00.0: command 0xff9 failed: fw status = 0x1
[9100875.231754] mlx4_core 0000:03:00.0: device is going to be reset
[9100875.740675] mlx4_core 0000:03:00.0: device was reset successfully
[9100875.747706] Kernel panic - not syncing: MLX4 device reset due to unrecoverable catastrophic failure
[9100875.759885] CPU: 3 PID: 269445 Comm: reboot Tainted: P OE 4.1.12-61.47.1.el6uek.x86_64 #2
[9100875.770408] Hardware name: Oracle Corporation ORACLE SERVER X6-2/ASM,MOTHERBOARD,1U, BIOS 38080000 05/08/2017
[9100875.781711] 0000000000000000 ffff881ff1b33aa8 ffffffff81698e70 ffffffffa0f7e088
[9100875.790236] ffff88117cf40000 ffff881ff1b33b28 ffffffff81698bcc ffff881f00000008
[9100875.798752] ffff881ff1b33b38 ffff881ff1b33ad8 ffff881ff1b33b38 ffff881ff1b33af8
[9100875.807631] Call Trace:
[9100875.810745] [] dump_stack+0x63/0x83
[9100875.816889] [] ? CSWTCH.1708+0x468/0xffffffffffffdec7 [mlx4_core]
[9100875.826110] [] panic+0xc1/0x210
[9100875.831851] [] mlx4_enter_error_state+0xba/0xf0 [mlx4_core]
[9100875.840489] [] mlx4_cmd_reset_flow+0x38/0x60 [mlx4_core]
[9100875.848662] [] mlx4_cmd_poll+0xc2/0x2e0 [mlx4_core]
[9100875.856349] [] __mlx4_cmd+0xb0/0x160 [mlx4_core]
[9100875.863744] [] mlx4_cleanup_icm_table+0x73/0xb0 [mlx4_core]
[9100875.872385] [] mlx4_free_icms+0x4a/0x100 [mlx4_core]
[9100875.880161] [] mlx4_close_hca+0x4b/0x70 [mlx4_core]
[9100875.887847] [] mlx4_unload_one+0x17b/0x2d0 [mlx4_core]
[9100875.895825] [] mlx4_shutdown+0x62/0x80 [mlx4_core]
[9100875.903415] [] ? msi_x+0x8/0xffffffffffff3f2f [mlx4_core]
[9100875.911859] [] pci_device_shutdown+0x41/0x90
[9100875.918868] [] device_shutdown+0x1d/0x180
[9100875.925581] [] kernel_restart_prepare+0x36/0x40
[9100875.932875] [] kernel_restart+0x16/0x60
[9100875.939389] [] SYSC_reboot+0x1b1/0x260
[9100875.945807] [] ? dentry_free+0x5c/0xa0
[9100875.952226] [] ? __dentry_kill+0xe0/0x120
[9100875.958940] [] ? __fput+0x170/0x250
[9100875.965069] [] ? __audit_syscall_entry+0xac/0x110
[9100875.972560] [] ? do_audit_syscall_entry+0x6c/0x70
[9100875.980051] [] ? syscall_trace_enter_phase1+0x153/0x180
[9100875.988142] [] SyS_reboot+0xe/0x10
[9100875.994174] [] system_call_fastpath+0x12/0x71

Changes

 Mostly rebooting, shutting down the OS, or with high load.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.