CRSD Fails to Start due to GIPC Communication Failure with Master (Doc ID 1337730.1)

Last updated on JULY 04, 2016

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.2 to 11.2.0.2 [Release 11.2]
Information in this document applies to any platform.

Symptoms

11gR2 Grid Infrastructure CRSD fails to join the cluster with the following error even though there's no network issue:

2011-04-10 22:22:39.677: [GIPCHALO][14636816] gipchaLowerDropMsg: dropping because of node failure msg 0x7fefec041c08 { len 1160, seq 181, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 180, flags 0x1, srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000, msgId 179 }, node 0x7fefe801dd40 { host 'racnode1', haName 'e2cc-aa06-08f2-2397', srcLuid fc67760b-7a856aaa, dstLuid 11d91062-ea2cdc97 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [90 : 90], createTime 4294237860, flags 0xa0 }
...
2011-04-10 22:23:09.759: [GIPCHALO][14636816] gipchaLowerProcessNode: no valid interfaces found to node for 4294357960 ms, node 0x7fefe801d970 { host 'racnode1', haName 'e2cc-aa06-08f2-2397', srcLuid fc67760b-e83bdc10, dstLuid 11d91062-ea2cdc97 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [31 : 31], createTime 4294327890, flags 0x4 }
2011-04-10 22:23:10.707: [GIPCHALO][14636816] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30010, msg 0x7fefec046d28 { len 1160, seq 183, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 182, flags 0x1, srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000, msgId 181 }, node 0x7fefe801d970 { host 'racnode1', haName 'e2cc-aa06-08f2-2397', srcLuid fc67760b-e83bdc10, dstLuid 11d91062-ea2cdc97 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [31 : 31], createTime 4294327890, flags 0x4 }
...
2011-04-10 22:23:21.317: [ CRSMAIN][31647504] Policy Engine is not initialized yet!

 

2011-04-10 22:19:37.582: [  OCRSRV][2157287184]s_u_r_c_i: Connecting to node with host name [racnode2]
2011-04-10 22:19:37.582: [  OCRSRV][2157287184]s_update_remote_cache_int: Connecting to node [1] at address [racnode2:d633-1505-1365-7a21] for invaldiation.
2011-04-10 22:19:37.583: [GIPCXCPT][2157287184] gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'racnode1', port 'b015-632e-0f93-4247', hctx 0x19f0dd0 [0000000000000010] { gipchaContext : host 'racnode1', name 'e2cc-aa06-08f2-2397', luid '11d91062-00000000', numNode 2, numInf 2, usrFlags 0x0, flags 0x1 }, ret gipcretKeyNotFound (36)
2011-04-10 22:19:37.583: [GIPCHGEN][2157287184] gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 795]: EXCEPTION[ ret gipcretKeyNotFound (36) ]  failed to resolve ctx 0x19f0dd0 [0000000000000010] { gipchaContext : host 'racnode1', name 'e2cc-aa06-08f2-2397', luid '11d91062-00000000', numNode 2, numInf 2, usrFlags 0x0, flags 0x1 }, host 'racnode1', port 'b015-632e-0f93-4247', flags 0x0
2011-04-10 22:19:37.588: [GIPCHGEN][1946154768] gipchaNodeCreate: adding new node 0x7fa3680207d0 { host 'racnode2', haName 'cc4b-6b7e-e26a-1e05', srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 8356304, flags 0x0 }
2011-04-10 22:19:37.589: [GIPCHALO][2178299664] gipchaLowerSend: deffering startup of hdr 0x7fa36c021db8 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAck 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 0x7fa3680207d0 { host 'racnode2', haName 'cc4b-6b7e-e26a-1e05', srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 8356304, flags 0x0 }
2011-04-10 22:19:37.589: [GIPCHALO][2178299664] gipchaLowerProcessNode: no valid interfaces found to node for 8356304 ms, node 0x7fa3680207d0 { host 'racnode2', haName 'cc4b-6b7e-e26a-1e05', srcLuid 11d91062-ea2cdc97, dstLuid 00000000-00000000 numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [0 : 0], createTime 8356304, flags 0x4 }
...
2011-04-10 22:21:41.349: [  OCRSRV][2157287184]proas_u_r_c: No reconfig received for node [1]after waiting for 93000 msec
2011-04-10 22:21:41.351: [  OCRSRV][2157287184]proas_u_r_c: Invoking conn clean on node [1]. Recovery state [0].
2011-04-10 22:21:41.351: [  OCRSRV][2157287184]proas_u_r_c: Issued a reconnect on node [1] host [racnode2]. Recovery state changed to [1]
...
2011-04-10 22:23:45.173: [  OCRSRV][2157287184]proas_u_r_c: No reconfig received for node [1]after waiting for 93000 msec
2011-04-10 22:23:45.175: [  OCRSRV][2157287184]proas_u_r_c: Invoking member kill on node [1]. Recovery state [1].
2011-04-10 22:23:45.175: [ CSSCLNT][2157287184]clssgsmbrkill: Member kill request: Members map 0x00000002
2011-04-10 22:23:45.177: [ CSSCLNT][2157287184]clssgsmbrkill: Success from kill call rc 0
[  OCRSRV][2157287184](:PROCR00000:)proas_u_r_c: Successfully issued clssgsmbrkill() node [1] host [racnode2].  Recovery state changed to [2]
..
2011-04-10 22:23:45.197: [  OCRMSG][2161489680]prom_send: Failed to send [13]
2011-04-10 22:23:45.198: [  OCRMSG][2161489680]GIPC error [13] msg [gipcretConnectionInvalid]
2011-04-10 22:23:45.199: [  OCRSRV][2161489680]s_update_remote_cache_int: FAILED TO SEND MESSAGE to node 1 retcode 203
2011-04-10 22:23:45.200: [  OCRSRV][2161489680]s_update_remote_cache_int: returning con FAILURE for nodenum [1]

 

Other likely symptoms is OCR slave restarts after OCR master node is terminated.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms