Infiniband Switch rebooted as a part of Patching can caused multiple node evictions in the Cluster
(Doc ID 2703311.1)
Last updated on AUGUST 07, 2023
Applies to:
Exadata X5-2 Hardware - Version All Versions to All Versions [Release All Releases]Exalogic Elastic Cloud X3-2 Hardware - Version X6 to X6 [Release X6]
Exadata X6-2 Hardware - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster M7 Hardware - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster M8 Hardware - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
Symptoms
The clock on the first Infiniband switch jumped backwards or forwards in time during the re-boot during the patching process.
If the clock jumps forward, you could have a Split Infiniband subnet with a short period of time where there are the Subnet Masters on the fabric.
If the clock jumps backwards, the Subnet Manager state on the switch that was patched is still in DISCOVER state until the clock catches up to the time prior to the jump backwards:
[root@xtu16sw-iba01]# uptime;date;sminfo
17:11:18 up 23 min, 1 user, load average: 0.15, 0.17, 0.15
Jul 19 10:42:57 UTC 2020
sminfo: sm lid 0 sm guid 0x0, activity count 184 priority 5 state 1 SMINFO_DISCOVER
OR
[root@xtu16sw-iba01]# getmaster
Local SM enabled and running, state DISCOVER
BOOT IN PROGRESS
If the second switch is patched and "re-booted" prior to the first switches clock catching up to the time it jumped backwards from (10:55:23 as seen in the example below), you could have a condition where there is NO Sumbnet Master on the fabric or possibly a split Fabric with two Subnet Masters for s short period of time causing the issue.
Changes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |