Grid Infrastructure Startup During Patching, Install or Upgrade May Fail Due to Multicasting Requirement (Doc ID 1212703.1)

Last updated on APRIL 17, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.2 to 12.1.0.1 [Release 11.2 to 12.1]
Information in this document applies to any platform.
This issue impacts environments that do not have multicast enabled for the private network in the following situations:

New installations of Oracle Grid Infrastructure 11.2.0.2 where multicast is not enabled on 230.0.1.0
Upgrades to Oracle Grid Infrastructure 11.2.0.2 from a pre-11.2.0.2 release where multicast is not enabled on 230.0.1.0 or 224.0.0.251
Installation of GI PSU 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 where multicast is not enabled on 230.0.1.0 or 224.0.0.251
Installation or upgrade to 12.1.0.1.0 where multicast is not enabled on 230.0.1.0 or 224.0.0.251


Symptoms

If multicast based communication is not enabled as required either on the nodes of the cluster or on the network switches used for the private interconnect, the root.sh, which is called as part of a fresh installation of Oracle Grid Infrastructure 11.2.0.2, or the rootupgrade.sh (called as part of an upgrade to Oracle Grid Infrastructure 11.2.0.2) will only succeed on the first node of the cluster, but will fail on subsequent nodes with the symptoms shown below:

CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node node1, number 1, and is terminating
An active cluster was found during exclusive startup, restarting to join the cluster
Failed to start Oracle Clusterware stack
Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/crs/11.2.0.2/crs/install/crsconfig_lib.pm line 1016.
/u01/app/crs/11.2.0.2/perl/bin/perl -I/u01/app/crs/11.2.0.2/perl/lib -I/u01/app/crs/11.2.0.2/crs/install /u01/app/crs/11.2.0.2/crs/install/rootcrs.pl execution failed

 

Note:  The symptoms will be the same whether an upgrade or a fresh installation of Oracle Grid Infrastructure 11.2.0.2 is performed; so will be the required diagnostics. This issue is also documented in the Oracle Database Readme 11g Release 2 Section 2.39 - "Open Bugs" under <BUG: 9974223>. 

 

Note: This issue also impacts the following 11.2.0.3 PSUs: 11.2.0.3.5, 11.2.0.3.6, 11.2.0.3.7 as well as 12.1.0.1 installations if multicast is not enabled on the 230.0.1.0 or 224.0.0.251 multicast addresses (one of the 2 must be enabled/functional). With 11.2.0.3 GI was enhanced to utilize broadcast or multicast to bootstrap. However the 11.2.0.3.5 GI PSU introduced a new issue that effectively disables the broadcast functionality (<Bug 16547309>). 



Symptom verification 
To verify that Oracle CSS daemon fails to start in clustered mode due to a multicasting issue on the network, the ocssd.log file (located under $GI_HOME/log/<nodename>/cssd/ocssd.log) must be reviewed. In case, joining the cluster fails because of such an issue, the following can be observed:

1. When CSS starts in clustered mode to join an existing cluster, we will see an entry in the CSSD log indicating that CSS will attempt to establish communication with a peer in the cluster. For this analysis, we see in the CSSD log for node2 that communication is attempted with node1, which looks similar to:  

 

2010-09-16 23:13:14.862: [GIPCHGEN][1107937600] gipchaNodeCreate: adding new node 0x2aaab408d4a0 { host 'node1', haName 'CSS_ttoprf10cluster', srcLuid 54d7bb0e-ef4a0c7e, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 9563084, flags 0x0 }

2.  Shortly after the above log entry we will see an attempt to establish communication to node1 from node2 via multicast address 230.0.1.0, port 42424 on the private interconnect (here: 192.168.1.2):

2010-09-16 23:13:14.862: [GIPCHTHR][1106360640] gipchaWorkerUpdateInterface: created remote interface for node 'node1', haName 'CSS_mycluster', inf 'mcast://230.0.1.0:42424/192.168.1.2'


3.  If the communication can be established successfully, we will see a log entry on node2 containing "gipchaLowerProcessAcks: ESTABLISH finished" for the peer node (node1). If the communication cannot be established, we will not see this log entry. Instead, we will see an entry indicating that the network communication cannot be established. This entry will look similar to the one shown below:

2010-09-16 23:13:15.839: [ CSSD][1087465792]clssnmvDHBValidateNCopy: node 1, node1, has a disk HB, but no network HB, DHB has rcfg 180134562, wrtcnt, 8627, LATS 9564064, lastSeqNo 8624, uniqueness 1284701023, timestamp 1284703995/10564774


The above log entry indicates that CSSD is unable to establish network communication on the interface used for the private interconnect. In this particular case, the issue was that multicast communication on the 230.0.1.0 IP was blocked on the network used as the private interconnect.

 

 

Changes

Note: 11.2.0.4 Grid Infrastructure is not impacted by this issue.

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms