Extended cluster instance crashes if one interconnect network fails
(Doc ID 2236102.1)
Last updated on SEPTEMBER 16, 2024
Applies to:
Oracle Database - Enterprise Edition - Version 12.1.0.2 to 12.2.0.1 [Release 12.1 to 12.2]Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
Symptoms
o Extended RAC configuration with two sites (Site A and Site B) and two nodes on each site.
o Two network interfaces registered to be used for the private communication, 'oifcfg getif' shows:
<interface1> <subnet1> global public
<interface2> <subnet2> global cluster_interconnect
<interface3> <subnet3> global cluster_interconnect
o Communication issue on one of the private network
o Several communication errors were produced on the database and ASM suggesting when the network communication was affected like:
- At the ASM instances:
- At database instances:
Receiver: inst 3 binc 636266236 ospid <PID>
IPC Send timeout detected. Sender: ospid <PID>...
Receiver: inst 4 binc 636568242 ospid <PID>
...
LMON (ospid: <PID>) detects hung instances during IMR reconfiguration
LMON (ospid: <PID>) tries to kill the instance 3 in 10 seconds.
Please check instance 3's alert log and LMON trace file for more details.
LMON (ospid: 137826) aborts 1 previously scheduled instance kills
...
Evicting instance 3 from cluster
Evicting instance 4 from cluster
Waiting for instances to leave: 3 4
o When the issue happens, GIPC RANK remains as 99 even the number of messages decreases:
- Note that the values decreased to around 61 from 196 in earlier time.
o If stop one node to make it a 3-node cluster, issue will not happen.
o In 3-node configuration, GIPC RANK goes to 0 on the node when the issue happens resulting in HAIP to move HAIP 169.254.*.* to the other interface.
Changes
This is a RAC Extended implementation
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |