My Oracle Support Banner

RDS Did Not Failover After Infiniband Switch Failure (Doc ID 2722681.1)

Last updated on DECEMBER 02, 2022

Applies to:

Linux OS - Version Oracle Linux 7.0 and later
Oracle Cloud Infrastructure - Exadata Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Version N/A and later
Linux x86-64

Symptoms

After a failure in one infiniband switch, one Exadata node went down and fail to get back online.

Node2 is unable to communicate to any of the cell nodes.

Rebooted the domU and dom0 didn't resolve the issue.

 

# ibstat
CA 'mlx4_0'
        CA type: MT4100
        Number of ports: 2
        Firmware version: 2.35.6312
        Hardware version: 1
        Node GUID: <NODE_GUID>
        System image GUID: <SYS_IMG_GUID>
        Port 1:
                State: Down                <<<<<<<<<<<<<<<<<<<<<<<<<<<
                Physical state: LinkUp
                Rate: 10
                Base lid: 7
                LMC: 0
                SM lid: 1
                Capability mask: 0x02514868
                Port GUID: <PORT_GUID_1>
                Link layer: InfiniBand
        Port 2:
                State: Active
                Physical state: LinkUp
                Rate: 40
                Base lid: 8
                LMC: 0
                SM lid: 1
                Capability mask: 0x02514868
                Port GUID: <PORT_GUID_2>
                Link layer: InfiniBand
 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.