[PCA 2.x] When Upgrading a Node, Kernel Panic with "NULL pointer dereference in RIP ib_mad_recv_done_handler"
(Doc ID 2511518.1)
Last updated on DECEMBER 19, 2022
Applies to:
Private Cloud Appliance - Version 2.3.1 to 2.4.3 [Release 2.0]Linux x86-64
This is only seen on Infiniband based PCA racks.
Symptoms
Upgrading passive Management node causing active Management Node to crash/kernel panic.
Issue also seen on compute node
Similar stack trace seen:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
Call Trace:
[<ffffffffa03bf273>] ? mlx4_ib_poll_cq+0xb3/0x2a0 [mlx4_ib]
[<ffffffffa039ff8d>] ib_mad_completion_handler+0x7d/0xa0 [ib_mad]
[<ffffffff810a1a41>] process_one_work+0x151/0x4b0
[<ffffffff810a1ec0>] worker_thread+0x120/0x480
[<ffffffff816e4c7b>] ? __schedule+0x30b/0x890
[<ffffffff810a1da0>] ? process_one_work+0x4b0/0x4b0
[<ffffffff810a1da0>] ? process_one_work+0x4b0/0x4b0
[<ffffffff810a709e>] kthread+0xce/0xf0
[<ffffffff810a6fd0>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff816e9962>] ret_from_fork+0x42/0x70
[<ffffffff810a6fd0>] ? kthread_freezable_should_stop+0x70/0x70
RIP [<ffffffffa039f926>] ib_mad_recv_done_handler+0x26/0x610 [ib_mad]
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |