Grid Infrastructure 11gR2 ROOT.SH Fails on Second Node due to Firewall (Doc ID 1103313.1)

Last updated on JULY 31, 2013

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

Symptoms

Fresh installation of 11gR2 Grid Infrastructure with no error reported by runcluvfy.sh, root.sh finishes fine on first node, but times out and fails on other nodes:

$GRID_HOME/cfgtoollogs/crsconfig/rootcrs_$HOST.log on other nodes:

2010-04-23 13:45:11: CRS-2676: Start of 'ora.diskmon' on 'node2' succeeded
2010-04-23 13:45:11: CRS-2676: Start of 'ora.cssd' on 'node2' succeeded
2010-04-23 13:45:12: Start of resource "ora.ctssd -init -env USR_ORA_ENV=CTSS_REBOOT=TRUE" Succeeded
2010-04-23 13:50:14: Start of resource "ora.asm -init" Succeeded     >>>> Note a 5 minutes gap
2010-04-23 13:50:16: Start of resource "ora.crsd -init" Succeeded
2010-04-23 13:50:17: Start of resource "ora.evmd -init" Succeeded
2010-04-23 13:50:17: Successfully started Oracle clusterware stack
2010-04-23 13:50:17: Waiting for Oracle CRSD and EVMD to start
2010-04-23 13:50:22: Waiting for Oracle CRSD and EVMD to start
..

$GRID_HOME/log/$HOST/ohasd/ohasd.log on other nodes:

2010-04-23 13:45:12.685: [   CRSPE][1347090752] CRS-2672: Attempting to start 'ora.asm' on 'node2'
..                              >>>> Note a 5 minutes gap
2010-04-23 13:50:14.756: [   CRSPE][1347090752] CRS-2676: Start of 'ora.asm' on 'node2' succeeded
..
2010-04-23 13:50:14.914: [   CRSPE][1347090752] CRS-2672: Attempting to start 'ora.crsd' on 'node2'

$GRID_HOME/log/$HOST/crsd/crsd.log on other nodes:

2010-04-23 13:50:16.008: [  OCRASM][4169789936]proprasmo: Error in open/create file in dg [DATA]
[  OCRASM][4169789936]SLOS : SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup

2010-04-23 13:50:16.010: [  OCRASM][4169789936]proprasmo: kgfoCheckMount returned [7]
2010-04-23 13:50:16.010: [  OCRASM][4169789936]proprasmo: The ASM instance is down
2010-04-23 13:50:16.010: [  OCRRAW][4169789936]proprioo: Failed to open [+DATA]. Returned proprasmo() with [26]. Marking location as UNAVAILABLE.
2010-04-23 13:50:16.010: [  OCRRAW][4169789936]proprioo: No OCR/OLR devices are usable
2010-04-23 13:50:16.010: [  OCRASM][4169789936]proprasmcl: asmhandle is NULL
2010-04-23 13:50:16.010: [  OCRRAW][4169789936]proprinit: Could not open raw device
2010-04-23 13:50:16.010: [  OCRASM][4169789936]proprasmcl: asmhandle is NULL
2010-04-23 13:50:16.010: [  OCRAPI][4169789936]a_init:16!: Backend init unsuccessful : [26]
2010-04-23 13:50:16.010: [  CRSOCR][4169789936] OCR context init failure.  Error: PROC-26: Error while accessing the physical storage ASM error [SLOS: cat=7, opn=kgfoAl06, dep=15077, loc=kgfokge
ORA-15077: could not locate ASM instance serving a required diskgroup
] [7]
2010-04-23 13:50:16.010: [    CRSD][4169789936][PANIC] CRSD exiting: Could not init OCR, code: 26
2010-04-23 13:50:16.011: [    CRSD][4169789936] Done.

alert_+ASM1.log on first node:

Fri Apr 23 13:45:19 2010
Reconfiguration started (old inc 2, new inc 4)
List of instances:
 1 2 (myinst: 1)
 Global Resource Directory frozen
 Communication channels reestablished
 Master broadcasted resource hash value bitmaps
 Non-local Process blocks cleaned out
Fri Apr 23 13:50:40 2010
IPC Send timeout detected. Sender: ospid 20999 [oracle@node1 (PING)]
Receiver: inst 2 binc 462889467 ospid 13885
Fri Apr 23 13:50:57 2010
IPC Send timeout detected. Sender: ospid 21015 [oracle@node1 (LMD0)]
Receiver: inst 2 binc 462889757 ospid 13901
IPC Send timeout to 2.0 inc 4 for msg type 53 from opid 10
Fri Apr 23 13:50:59 2010
Communications reconfiguration: instance_number 2
Evicting instance 2 from cluster
Waiting for instances to leave:
2
Fri Apr 23 13:50:59 2010
Trace dumping is performing id=[cdmp_20100423135059]
Reconfiguration started (old inc 4, new inc 8)
List of instances:
 1 (myinst: 1)
 Nested reconfiguration detected.

alert_+ASMn.log on other nodes:

Fri Apr 23 13:45:18 2010
MMNL started with pid=21, OS id=13947
lmon registered with NM - instance number 2 (internal mem no 1)
Reconfiguration started (old inc 0, new inc 4)
ASM instance
List of instances:
 1 2 (myinst: 2)
 Global Resource Directory frozen
* allocate domain 0, invalid = TRUE
 Communication channels reestablished
Fri Apr 23 13:50:53 2010
IPC Send timeout detected. Sender: ospid 13885 [oracle@node2 (PING)]
Receiver: inst 1 binc 462852653 ospid 20999
Fri Apr 23 13:50:59 2010
Received an instance abort message from instance 1
Please check instance 1 alert and LMON trace files for detail.
LMS0 (ospid: 13905): terminating the instance due to error 481
Fri Apr 23 13:50:59 2010
System state dump is made for local instance
Trace dumping is performing id=[cdmp_20100423135059]
Instance terminated by LMS0, pid = 13905

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms