RDS Did Not Failover After Infiniband Switch Failure
(Doc ID 2722681.1)
Last updated on DECEMBER 02, 2022
Applies to:
Linux OS - Version Oracle Linux 7.0 and laterOracle Cloud Infrastructure - Exadata Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Version N/A and later
Linux x86-64
Symptoms
After a failure in one infiniband switch, one Exadata node went down and fail to get back online.
Node2 is unable to communicate to any of the cell nodes.
Rebooted the domU and dom0 didn't resolve the issue.
# ibstat
CA 'mlx4_0'
CA type: MT4100
Number of ports: 2
Firmware version: 2.35.6312
Hardware version: 1
Node GUID: <NODE_GUID>
System image GUID: <SYS_IMG_GUID>
Port 1:
State: Down <<<<<<<<<<<<<<<<<<<<<<<<<<<
Physical state: LinkUp
Rate: 10
Base lid: 7
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: <PORT_GUID_1>
Link layer: InfiniBand
Port 2:
State: Active
Physical state: LinkUp
Rate: 40
Base lid: 8
LMC: 0
SM lid: 1
Capability mask: 0x02514868
Port GUID: <PORT_GUID_2>
Link layer: InfiniBand
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |