My Oracle Support Banner

[PCA 2.x] When Upgrading a Node, Kernel Panic with "NULL pointer dereference in RIP ib_mad_recv_done_handler" (Doc ID 2511518.1)

Last updated on OCTOBER 01, 2023

Applies to:

Private Cloud Appliance - Version 2.3.1 to 2.4.3 [Release 2.0]
Linux x86-64
This is only seen on Infiniband based PCA racks.





Symptoms

Upgrading passive Management node causing active Management Node to crash/kernel panic.

Issue also seen on compute node

Similar stack trace seen:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000020

Call Trace:
[<ffffffffa03bf273>] ? mlx4_ib_poll_cq+0xb3/0x2a0 [mlx4_ib]
[<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
[<ffffffff810a1a41>] process_one_work+0x151/0x4b0
[<ffffffff810a1ec0>] worker_thread+0x120/0x480
[<ffffffff816e4c7b>] ? __schedule+0x30b/0x890
[<ffffffff810a1da0>] ? process_one_work+0x4b0/0x4b0
[<ffffffff810a1da0>] ? process_one_work+0x4b0/0x4b0
[<ffffffff810a709e>] kthread+0xce/0xf0
[<ffffffff810a6fd0>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff816e9962>] ret_from_fork+0x42/0x70
[<ffffffff810a6fd0>] ? kthread_freezable_should_stop+0x70/0x70

RIP [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.