RAC Non-First Instance can not Start After Relinking With RDS if Two or More Infiniband Interface is Used for Private Network

(Doc ID 1907441.1)

Last updated on APRIL 17, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1.0 and later
Oracle Solaris on SPARC (64-bit)
IBM AIX on POWER Systems (64-bit)

Symptoms

RAC with two infiniband interfaces as private network, it is working fine with UDP protocol, after relink the oracle binary for ASM with RDS protocol, the ASM instance does not start on the non-first node.

alert_+ASM2.log shows:

Wed Jun 11 02:04:21 2014
Starting ORACLE instance (normal)

Private Interface 'grid0:1' configured from GPnP for use as a private interconnect.
  [name='grid0:1', type=1, ip=169.254.127.30, mac=00-00-00-00-00-00, net=169.254.0.0/17, mask=255.255.128.0, use=haip:cluster_interconnect/62]
Private Interface 'grid1:1' configured from GPnP for use as a private interconnect.
  [name='grid1:1', type=1, ip=169.254.142.194, mac=00-00-00-00-00-00, net=169.254.128.0/17, mask=255.255.128.0, use=haip:cluster_interconnect/62]
...
Cluster communication is configured to use the following interface(s) for this instance
  169.254.127.30
  169.254.142.194
cluster interconnect IPC version:Oracle RDS/IP (generic)
IPC Vendor 1 proto 3
  Version 4.1
...
Wed Jun 11 02:04:23 2014
MMNL started with pid=21, OS id=28082
DISM started, OS id=28084
lmon registered with NM - instance number 2 (internal mem no 1)
Wed Jun 11 02:06:23 2014
PMON (ospid: 28057): terminating the instance due to error 481
Wed Jun 11 02:06:23 2014
ORA-1092 : opitsk aborting process
Wed Jun 11 02:06:23 2014
System state dump requested by (instance=2, osid=28057 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /app/grid_base/diag/asm/+asm/+ASM2/trace/+ASM2_diag_28063_20140611020623.trc
Dumping diagnostic data in directory=[cdmp_20140611020623], requested by (instance=2, osid=28057 (PMON)), summary=[abnormal instance termination].
Instance terminated by PMON, pid = 28057


The running ASM instance alert_+ASM1.log shows:

Wed Jun 11 02:05:45 2014
LMON (ospid: 29052) detects hung instances during IMR reconfiguration
LMON (ospid: 29052) tries to kill the instance 2 in 37 seconds.
Please check instance 2's alert log and LMON trace file for more details.
Wed Jun 11 02:06:22 2014
Remote instance kill is issued with system inc 20
Remote instance kill map (size 1) : 2
LMON received an instance eviction notification from instance 1
The instance eviction reason is 0x20000000
The instance eviction map is 2
Reconfiguration started (old inc 20, new inc 22)
List of instances:
 1 (myinst: 1)
...

Reconfiguration complete
Wed Jun 11 02:06:23 2014
Dumping diagnostic data in directory=[cdmp_20140611020623], requested by (instance=2, osid=28057 (PMON)), summary=[abnormal instance termination].




Changes

Relink oracle binary with RDS protocol

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms