Extended cluster instance crashes if one interconnect network fails
Last updated on MAY 12, 2017
Applies to:Oracle Database - Enterprise Edition - Version 188.8.131.52 to 184.108.40.206 [Release 12.1 to 12.2]
Information in this document applies to any platform.
o Extended RAC configuration with two sites (Site A and Site B) and two nodes on each site.
o Two network interfaces registered to be used for the private communication, 'oifcfg getif' shows:
<interface1> <subnet1> global public
<interface2> <subnet2> global cluster_interconnect
<interface3> <subnet3> global cluster_interconnect
o Communication issue on one of the private network
o Several communication errors were produced on the database and ASM suggesting when the network communication was affected like:
- At the ASM instances:
- At database instances:
Receiver: inst 3 binc 636266236 ospid <PID>
IPC Send timeout detected. Sender: ospid <PID>...
Receiver: inst 4 binc 636568242 ospid <PID>
LMON (ospid: <PID>) detects hung instances during IMR reconfiguration
LMON (ospid: <PID>) tries to kill the instance 3 in 10 seconds.
Please check instance 3's alert log and LMON trace file for more details.
LMON (ospid: 137826) aborts 1 previously scheduled instance kills
Evicting instance 3 from cluster
Evicting instance 4 from cluster
Waiting for instances to leave: 3 4
o When the issue happens, GIPC RANK remains as 99 even the number of messages decreases:
- Note that the values decreased to around 61 from 900 in earlier time.
o If stop one node to make it a 3-node cluster, issue will not happen.
o In 3-node configuration, GIPC RANK goes to 0 on the node when the issue happens resulting in HAIP to move HAIP 169.254.*.* to the other interface.
This is a RAC Extended implementation
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
Million Knowledge Articles and hundreds of Community platforms