ODA X7-2 : Node hangs/reboots when executing big workloads
(Doc ID 2429712.1)
Last updated on JULY 20, 2024
Applies to:
Oracle Database Appliance Software - Version 12.2.1.2 to 12.2.1.4 [Release 12.2]Information in this document applies to any platform.
Symptoms
One of the ODA nodes hang/crash with the following errors signaled in /var/log/messages
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421312] Hardware name: Oracle
Corporation ORACLE SERVER X7-2/ASM, MB, X7-2, BIOS 41021300 02/05/2018
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421314] 0000000000000000 ffff8801936c3d38 ffffffff816e01cf ffff8801936c3d88
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421315] 0000000000000132 ffff8801936c3d78 ffffffff810868a5 ffff8801936c3d58
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421317] ffff880005480000 ffff8800054803e0 ffff88000548ed80 000000000000004a
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421318] Call Trace:
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421320] <IRQ> [<ffffffff816e01cf>] dump_stack+0x63/0x84
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421328] [<ffffffff810868a5>] warn_slowpath_common+0x95/0xe0
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421330] [<ffffffff810869a6>] warn_slowpath_fmt+0x46/0x50
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421334] [<ffffffff8162ea30>] dev_watchdog+0x240/0x250
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421335] [<ffffffff8162e7f0>] ? __netdev_watchdog_up+0x80/0x80
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421338] [<ffffffff810ef3b7>] call_timer_fn+0x47/0x160
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421339] [<ffffffff810ef6d0>] run_timer_softirq+0x200/0x380
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421340] [<ffffffff8162e7f0>] ? __netdev_watchdog_up+0x80/0x80
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421342] [<ffffffff8108aa9a>] __do_softirq+0x10a/0x350
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421343] [<ffffffff8108ae55>] irq_exit+0x125/0x130
Corporation ORACLE SERVER X7-2/ASM, MB, X7-2, BIOS 41021300 02/05/2018
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421314] 0000000000000000 ffff8801936c3d38 ffffffff816e01cf ffff8801936c3d88
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421315] 0000000000000132 ffff8801936c3d78 ffffffff810868a5 ffff8801936c3d58
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421317] ffff880005480000 ffff8800054803e0 ffff88000548ed80 000000000000004a
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421318] Call Trace:
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421320] <IRQ> [<ffffffff816e01cf>] dump_stack+0x63/0x84
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421328] [<ffffffff810868a5>] warn_slowpath_common+0x95/0xe0
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421330] [<ffffffff810869a6>] warn_slowpath_fmt+0x46/0x50
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421334] [<ffffffff8162ea30>] dev_watchdog+0x240/0x250
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421335] [<ffffffff8162e7f0>] ? __netdev_watchdog_up+0x80/0x80
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421338] [<ffffffff810ef3b7>] call_timer_fn+0x47/0x160
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421339] [<ffffffff810ef6d0>] run_timer_softirq+0x200/0x380
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421340] [<ffffffff8162e7f0>] ? __netdev_watchdog_up+0x80/0x80
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421342] [<ffffffff8108aa9a>] __do_softirq+0x10a/0x350
Jun 12 11:52:32 ###oda0-dom0 kernel: [ 4776.421343] [<ffffffff8108ae55>] irq_exit+0x125/0x130
Please note that this issue only affects X7-2 ODA hardware.
Firmware version is lower than 20.08.01.14 , as seen in output of : ethtool -i <eth_name>
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |