Grid Infrastructure Does not Start after Node Reboot as Master octssd.bin Stuck (Doc ID 1215893.1)

Last updated on SEPTEMBER 06, 2012

Applies to:

Oracle Server - Enterprise Edition - Version 11.2.0.1 and later
Information in this document applies to any platform.

Symptoms

3-node 11.2.0.1 Grid Infrastructure cluster. 3rd node is rebooted, node 1 and 2 are still running fine, afterward CRS won't startup,

alert<host>.log shows:

2010-09-19 15:27:26.197
[ctssd(16149)]CRS-2403:The Cluster Time Synchronization Service on host prod03 is in observer mode.
2010-09-19 15:27:26.206
[ctssd(16149)]CRS-2407:The new Cluster Time Synchronization Service reference node is host prod01.
2010-09-19 15:27:26.208
[ctssd(16149)]CRS-2406:The Cluster Time Synchronization Service timed out on host prod03. Details in /u01/app/11.2.0/grid/log/prod03/ctssd/octssd.log.
2010-09-19 15:27:27.059
[ctssd(16149)]CRS-2401:The Cluster Time Synchronization Service started on host prod03.
...
<< 10min later:
2010-09-19 15:37:26.126
[/u01/app/11.2.0/grid/bin/orarootagent.bin(15076)]CRS-5818:Aborted command 'start for resource: ora.ctssd 1 1' for resource 'ora.ctssd'. Details at (:CRSAGF00113:) in /u01/app/11.2.0/grid/log/prod03/agent/ohasd/orarootagent_root/orarootagent_root.log.
2010-09-19 15:37:30.135
[ohasd(11583)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.ctssd'. Details at (:CRSPE00111:) in /u01/app/11.2.0/grid/log/prod03/ohasd/ohasd.log.


ctssd.log shows:

2010-09-19 15:27:26.206: [ CTSS][1157343552]ctsselect_mmg8: Host [prod01] Node num [2] is the master
2010-09-19 15:27:26.206: [ CTSS][1157343552]ctsselect_sm2: Node [2] is the CTSS master
2010-09-19 15:27:26.206: [ CTSS][1157343552]ctssslave_meh1: Master private node name [prod01]
2010-09-19 15:27:26.206: [ CTSS][1157343552]ctssslave_msh: Connect String is (ADDRESS=(PROTOCOL=tcp)(HOST=prod01)(PORT=49317))
2010-09-19 15:27:26.207: [ CTSS][1157343552]ctssslave_msh: Forming connection with CTSS master node [2]
2010-09-19 15:27:26.208: [ COMMCRS][1092983104]clsc_send_msg: (0x1131e6c0) NS err (12541, 12541), transport (0, 0, 0)

2010-09-19 15:27:26.208: [ CTSS][1157343552]ctssslave_meh: Failed connecting to master [9]
2010-09-19 15:27:26.208: [ CTSS][1157343552]ctsselect_mmg9_1: Failed in clsctsselect_select_mode [5]: newly elected master left, wait for another reconfig
2010-09-19 15:27:27.059: [ CTSS][1167833408]ctss_checkcb: clsdm requested check alive. Returns [62]
2010-09-19 15:27:27.059: [ CTSS][3660357376]ctss_init: Spawn completed. Waiting for threads to join
2010-09-19 15:27:28.061: [ CTSS][1167833408]ctss_checkcb: clsdm requested check alive. Returns [62]
......

Changes

Node 3 rebooted

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms