CRS can not Start After Node Reboot (Doc ID 733260.1)

Last updated on MAY 22, 2013

Applies to:

Oracle Database - Enterprise Edition - Version 10.1.0.2 to 11.1.0.7 [Release 10.1 to 11.1]
Information in this document applies to any platform.
***Checked for relevance on 23-Apr-2013***

Symptoms

On a 2-node RAC cluster, it's possible that on one node, CRS is running but on the other node, CRS is not coming up after node reboot. Even rebooting a few times does not alleviate the problem. This can happen to a multi-node cluster too.

ocssd.log for 10g(located in $CRS_HOME/log/<hostname>/cssd) shows repeated messages like:

[ CSSD]2008-07-28 16:30:42.369 [1126189408] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(4) wrtcnt(2969549) LATS(576133094) Disk lastSeqNo(2969549)
[ CSSD]2008-07-28 16:30:42.909 [1136679264] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(4) wrtcnt(2969549) LATS(576133634) Disk lastSeqNo(2969549)
[ CSSD]2008-0-28 16:30:43.172 [1262557536] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2008-07-28 16:30:43.172 [1262557536] >WARNING: clssnmLocalJoinEvent: takeover aborted due to ALIVE node on Disk
[ CSSD]2008-07-28 16:30:43.371 [1115699552] >TRACE: clssnmReadDskHeartbeat: node(1) is down. rcfg(4) wrtcnt(2969550) LATS(576134094) Disk lastSeqNo(2969550)
.... << repeated messages

 

ocssd.log for 11gR1 (located in $CRS_HOME/log/<hostname>/cssd) shows repeated messages like:

[ CSSD]2008-10-01 01:03:36.658 [62843792] >TRACE: clssnmReadDskHeartbeat:
node 1, ndb-01, has a disk HB, but no network HB, DHB has rcfg 111839697,
wrtcnt, 6238187, LATS 4479404, lastSeqNo 6238187, timestamp 1222823015/258149934
[ CSSD]2008-10-01 01:03:37.661 [62843792] >TRACE: clssnmReadDskHeartbeat:
node 1, ndb-01, has a disk HB, but no network HB, DHB has rcfg 111839697,
wrtcnt, 6238188, LATS 4480404, lastSeqNo 6238188, timestamp 1222823016/258150944
[ CSSD]2008-10-01 01:03:38.504 [3007802256] >TRACE: clssnmRcfgMgrThread: Local Join
[ CSSD]2008-10-01 01:03:38.504 [3007802256] >WARNING: clssnmLocalJoinEvent:
takeover aborted due to ALIVE node on Disk
<
OR

There are no messages in the ocssd.log at all,

ps -ef |grep init shows "/etc/init.d/init.cssd startcheck" is running constantly.

cat /tmp/crsctl.xxxx shows:

OCR initialization failed with invalid format: PROC-22: The OCR backend has an invalid format

Changes

This can happen in an environment where a node is shutdown for various reasons, then restarted.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms