Pre-11.2: Node Fails to Reboot after Node Eviction or CRS Can't Rejoin the Cluster after Node reboot as diagwait has Wrong Value (Doc ID 1277538.1)

Last updated on JANUARY 15, 2014

Applies to:

Oracle Database - Enterprise Edition - Version 10.2.0.4 to 11.1.0.7 [Release 10.2 to 11.1]
Information in this document applies to any platform.

Symptoms

2 node RAC cluster, there was a network outage, as the result node 2 was evicted but the server failed to reboot, crs can't be restarted on this node.

alert<host>.log on node 1 shows:

2010-12-04 12:05:12.463
[cssd(3721)]CRS-1612:node node2 (0) at 50% heartbeat fatal, eviction in 14.860 seconds
...
2010-12-05 05:10:30.304
[cssd(3721)]CRS-1610:node node2 (0) at 90% heartbeat fatal, eviction in 1.400 seconds
2010-12-05 05:10:30.677
[cssd(3721)]CRS-1607:CSSD evicting node node2. Details in /oracle/crs/product/10/crs/log/node2/cssd/ocssd.log.


ocssd.log on node 2 shows:

[ CSSD]2010-12-05 05:10:14.665 [18] >WARNING: clssnmPollingThread: node node1 (1) at 50 2.123764e-314artbeat fatal, eviction in 14.880 seconds
...
[ CSSD]2010-12-05 05:10:27.665 [18] >WARNING: clssnmPollingThread: node node1 (1) at 90 2.123764e-314artbeat fatal, eviction in 1.880 seconds
[ CSSD]2010-12-05 05:10:29.555 [18] >TRACE: clssnmPollingThread: Eviction started for node node1 (1), flags 0x040f, state 3, wt4c 0

[ CSSD]2010-12-05 05:10:29.557 [20] >TRACE: clssnmCheckDskInfo: node 1 has active disk heartbeat
[ CSSD]2010-12-05 05:10:29.557 [1] >TRACE: clssgmSuspendAllGrocks: Issue grock ORA_CLSRD_1_OPEWS event SUSPEND
[ CSSD]2010-12-05 05:10:29.557 [20] >ERROR: clssnmCheckDskInfo: Aborting local node to avoid splitbrain.
[ CSSD]2010-12-05 05:10:29.557 [1] >TRACE: clssgmSuspendAllGrocks: Issue grock ORA_CLSRD_1_OPEWS event SUSPEND
[ CSSD]2010-12-05 05:10:29.557 [1] >TRACE: clssgmQueueGrockEvent: skipping remote member 0
[ CSSD]2010-12-05 05:10:29.557 [20] >ERROR: : my node(2), Leader(2), Size(1) VS Node(1), Leader(1), Size(1)
[ CSSD]2010-12-05 05:10:29.557 [1] >TRACE: clssgmSuspendAllGrocks: Issue grock ORA_CLSRD_2_OPEWS event SUSPEND
[ CSSD]2010-12-05 05:10:29.557 [20] >ERROR:###################################
[ CSSD]2010-12-05 05:10:29.557 [20] >ERROR: clssscExit: CSSD aborting
[ CSSD]2010-12-05 05:10:29.557 [20] >ERROR:###################################


'messages' on node 2:

Dec 5 05:10:31 node2 root: [ID 702911 user.alert] Oracle CSSD failure 134.
Dec 5 05:10:31 node2 root: [ID 702911 user.alert] Oracle CRS failure. Rebooting for cluster integrity.
Dec 5 05:10:32 node2 root: [ID 702911 user.error] Oracle clsomon failed with fatal status 12.
Dec 5 05:10:32 node2 root: [ID 702911 user.alert] Oracle CRS failure. Rebooting for cluster integrity.  <Dec 5 05:10:36 node2 root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.
Dec 5 05:10:36 node2 last message repeated 1 time
Dec 5 05:10:36 node2 root: [ID 702911 user.error] Running CRSD with TZ = GMT
Dec 5 05:18:49 node2 root: [ID 702911 user.error] Cluster Ready Services completed waiting on dependencies.
<< repeated message about CRSD

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms