Infiniband Switch rebooted as a part of Patching can caused multiple node evictions in the Cluster
(Doc ID 2703311.1)
Last updated on AUGUST 24, 2020
Applies to:Exadata Database Machine V2 - Version All Versions to All Versions [Release All Releases]
Exadata X5-2 Hardware - Version All Versions to All Versions [Release All Releases]
Exalogic Elastic Cloud X3-2 Hardware - Version X6 to X6 [Release X6]
Exadata X6-2 Hardware - Version All Versions to All Versions [Release All Releases]
Sun Datacenter InfiniBand Switch 36 - Version All Versions to All Versions [Release All Releases]
Information in this document applies to any platform.
The clock on the first Infiniband switch jumped backwards or forwards in time during the re-boot during the patching process.
If the clock jumps forward, you could have a Split Infiniband subnet with a short period of time where there are t Subnet Masters on the fabric.
If the clock jumps backwards, the Subnet Manager state on the switch that was patched is still in DISCOVER state until the clock catches up to the time prior to the jump backwards:
17:11:18 up 23 min, 1 user, load average: 0.15, 0.17, 0.15
Jul 19 10:42:57 UTC 2020
sminfo: sm lid 0 sm guid 0x0, activity count 184 priority 5 state 1 SMINFO_DISCOVER
Local SM enabled and running, state DISCOVER
BOOT IN PROGRESS
If the second switch is patched and "re-booted" prior to the first switches clock catching up to the time it jumped backwards from (10:55:23 as seen in the example below), you could have a condition where there is NO Sumbnet Master on the fabric or possibly a split Fabric with two Subnet Masters for s short period of time causing the issue.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document