Node 1 was rebooted and after some time, the surviving node 2 also got rebooted (Doc ID 2078186.1)

Last updated on NOVEMBER 30, 2015

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.3 and later
Information in this document applies to any platform.

Symptoms

Node1 and node2 were unpingable. Node1 was rebooted at 18:26 but few minutes later, Node2 were rebooted by itself. We expect the surviving Node2 to be available.

No OSWatcher logs were captured on Node1 after 15:30 as:

kulpltaxdb01.kul.apac.dell.com_vmstat_15.10.12.1500.dat
=======================================================
Linux OSW v3.0.2 kulpltaxdb01.kul.apac.dell.com
zzz ***Mon Oct 12 15:02:58 MYT 2015
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
5  0  52288 886880 1783388 6641572    0    0    24    10    0    0  1  1 99  0  0
0  0  52288 887988 1783392 6641556    0    0    32   317 8461 11237  1  2 97  1  0
0  0  52288 887732 1783392 6641648    0    0    16   476 5984 9693  0  1 99  0  0
zzz ***Mon Oct 12 15:07:58 MYT 2015
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
5  0  52288 885200 1783400 6642140    0    0    24    10    0    0  1  1 99  0  0
0  0  52288 881860 1783400 6642092    0    0    32   448 7886 10926  1  2 97  1  0
1  0  52288 881580 1783400 6642184    0    0    16    65 6213 9921  1  1 99  0  0
zzz ***Mon Oct 12 15:12:58 MYT 2015
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
6  0  52288 879708 1783400 6645152    0    0    24    10    0    0  1  1 99  0  0
1  0  52288 876252 1783404 6645088    0    0    32    90 8705 11175  1  1 97  0  0
1  0  52288 879700 1783404 6645208    0    0    16   545 6046 9757  1  1 98  1  0
zzz ***Mon Oct 12 15:17:58 MYT 2015
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
6  0  52288 904944 1783404 6632228    0    0    24    10    0    0  1  1 99  0  0
0  0  52288 900976 1783404 6632392    0    0    16   512 9394 11386  1  2 97  0  0
1  0  52288 901576 1783404 6632484    0    0    32   121 6890 10292  1  1 98  0  0

<<<<<<<<<<<< No logs from Oct 12 15:30 and onwards when the issue actually happened >>>>>>>>>>>>

 

We had the ora.asm and diskgroup resources check failures as below and later ora.crsd went offline:

alertkulpltaxdb01.log
=====================
2015-10-12 15:20:52.151
[/u01/app/grid/product/11.2.0.3/grid_1/bin/oraagent.bin(28068)]CRS-5818:Aborted command 'check' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:40:2} in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb01/agent/crsd/oraagent_oracle/oraagent_oracle.log.
.
.
2015-10-12 15:41:35.395
[/u01/app/grid/product/11.2.0.3/grid_1/bin/orarootagent.bin(1315)]CRS-5822:Agent '/u01/app/grid/product/11.2.0.3/grid_1/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:3:2044} in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb01/agent/crsd/orarootagent_root/orarootagent_root.log.
2015-10-12 15:44:32.539
[client(9261)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb01/client/crsctl_processoemagent.log.
2015-10-12 15:51:36.825
[/u01/app/grid/product/11.2.0.3/grid_1/bin/orarootagent.bin(48915)]CRS-5818:Aborted command 'start' for resource 'ora.crsd'. Details at (:CRSAGF00113:) {0:6:206} in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb01/agent/ohasd/orarootagent_root/orarootagent_root.log.
2015-10-12 15:51:40.830
[ohasd(48699)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.crsd'. Details at (:CRSPE00111:) {0:6:206} in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb01/ohasd/ohasd.log.
.......
2015-10-12 18:14:31.186
[client(38235)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb01/client/crsctl_processoemagent.log.
2015-10-12 18:25:56.036
[ohasd(14633)]CRS-2112:The OLR service started on node kulpltaxdb01.
2015-10-12 18:25:56.079
[ohasd(14633)]CRS-1301:Oracle High Availability Service started on node kulpltaxdb01.
ohasd - orarootagent_root.l04
=============================
2015-10-12 15:41:36.546: [ COMMCRS][1797601024]clsc_connect: (0x7f8130047e30) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=kulpltaxdb01DBG_CRSD))

[  clsdmc][1780475648]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=kulpltaxdb01DBG_CRSD)) with status 9
2015-10-12 15:41:36.547: [ora.crsd][1780475648] {0:6:206} [check] Error = error 9 encountered when connecting to CRSD
2015-10-12 15:41:36.555: [ora.crsd][1780475648] {0:6:206} [check] Calling PID check for daemon
2015-10-12 15:41:36.555: [ora.crsd][1780475648] {0:6:206} [check] Process id 1155 translated to
2015-10-12 15:41:36.720: [ COMMCRS][1797601024]clsc_connect: (0x7f8130291800) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=kulpltaxdb01DBG_CRSD))

[  clsdmc][1780475648]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=kulpltaxdb01DBG_CRSD)) with status 9
2015-10-12 15:41:36.721: [ora.crsd][1780475648] {0:6:206} [check] Error = error 9 encountered when connecting to CRSD
2015-10-12 15:41:36.721: [ora.crsd][1780475648] {0:6:206} [check] DaemonAgent::check returned 1
2015-10-12 15:41:36.721: [    AGFW][2284398336] {0:6:206} ora.crsd 1 1 state changed from: CLEANING to: OFFLINE
.
.
[  clsdmc][1780475648]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=kulpltaxdb01DBG_CRSD)) with status 9
2015-10-12 15:41:37.913: [ora.crsd][1780475648] {0:6:206} [start] Error = error 9 encountered when connecting to CRSD
2015-10-12 15:41:38.913: [ora.crsd][1780475648] {0:6:206} [start] without returnbuf
2015-10-12 15:41:39.078: [ COMMCRS][1785206528]clsc_connect: (0x7f813018d190) no listener at (ADDRESS=(PROTOCOL=ipc)(KEY=kulpltaxdb01DBG_CRSD))

<<<<< Repeated messages >>>>>

[  clsdmc][1811937024]Fail to connect (ADDRESS=(PROTOCOL=ipc)(KEY=kulpltaxdb01DBG_CRSD)) with status 9
2015-10-12 15:51:42.506: [ora.crsd][1811937024] {0:6:206} [check] Error = error 9 encountered when connecting to CRSD
2015-10-12 15:51:42.506: [ora.crsd][1811937024] {0:6:206} [check] DaemonAgent::check returned 1
2015-10-12 15:51:42.506: [    AGFW][2284398336] {0:6:206} ora.crsd 1 1 state changed from: CLEANING to: OFFLINE  <<<< crsd went offline

 

Node 2 OS logs show there was a core dump by the ocssd process by 18:38 just before the node reboot time:

messages-20151013
==================
Oct 12 18:33:55 kulpltaxdb02 abrt[46752]: Saved core dump of pid 39416 (/u01/app/grid/product/11.2.0.3/grid_1/bin/ocssd.bin) to /var/spool/abrt/ccpp-2015-10-12-18:33:53-39416 (64794624 bytes) <<<<<<<<<<<<<
Oct 12 18:33:55 kulpltaxdb02 abrtd: Directory 'ccpp-2015-10-12-18:33:53-39416' creation detected
Oct 12 18:33:55 kulpltaxdb02 abrtd: Executable '/u01/app/grid/product/11.2.0.3/grid_1/bin/ocssd.bin' doesn't belong to any package and ProcessUnpackaged is set to 'no'
Oct 12 18:33:55 kulpltaxdb02 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2015-10-12-18:33:53-39416' exited with 1
Oct 12 18:33:55 kulpltaxdb02 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2015-10-12-18:33:53-39416'
Oct 12 18:33:57 kulpltaxdb02 kernel: SysRq : Resetting
Oct 12 18:38:49 kulpltaxdb02 kernel: imklog 5.8.10, log source = /proc/kmsg started.
Oct 12 18:38:49 kulpltaxdb02 rsyslogd: [origin software="rsyslogd" swVersion="5.8.10" x-pid="12740" x-info="http://www.rsyslog.com"] start <<< restart time

 

 The node 2 was rebooted by the ocssd process as below:

alertkulpltaxdb02.log
=====================
2015-10-12 18:21:16.835
[cssd(39416)]CRS-1632:Node kulpltaxdb01 is being removed from the cluster in cluster incarnation 324253241
2015-10-12 18:21:16.851
[cssd(39416)]CRS-1601:CSSD Reconfiguration complete. Active nodes are kulpltaxdb02 .
.
.
2015-10-12 18:32:36.716
[client(45793)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb02/client/crsctl_processoemagent.log.
2015-10-12 18:32:47.701
[client(45881)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb02/client/crsctl_processoemagent.log.
.
.
2015-10-12 18:33:52.437
[cssd(39416)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb02/cssd/ocssd.log.
2015-10-12 18:33:52.437
[cssd(39416)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb02/cssd/ocssd.log
2015-10-12 18:33:52.517
[cssd(39416)]CRS-1652:Starting clean up of CRSD resources.
2015-10-12 18:33:52.518
[cssd(39416)]CRS-1608:This node was evicted by node 1, kulpltaxdb01; details at (:CSSNM00005:) in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb02/cssd/ocssd.log.
2015-10-12 18:33:52.685
[cssd(39416)]CRS-1653:The clean up of the CRSD resources failed.
2015-10-12 18:33:55.177
[ohasd(39125)]CRS-2765:Resource 'ora.ctssd' has failed on server 'kulpltaxdb02'.
2015-10-12 18:33:55.284
[ohasd(39125)]CRS-2765:Resource 'ora.evmd' has failed on server 'kulpltaxdb02'.
2015-10-12 18:33:56.217
[ctssd(46761)]CRS-2402:The Cluster Time Synchronization Service aborted on host kulpltaxdb02. Details at (:ctss_css_init1:) in /u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb02/ctssd/octssd.log.
2015-10-12 18:33:56.255
[/u01/app/grid/product/11.2.0.3/grid_1/bin/oraagent.bin(39299)]CRS-5011:Check of resource "+ASM" failed: details at "(:CLSN00006:)" in "/u01/app/grid/product/11.2.0.3/grid_1/log/kulpltaxdb02/agent/ohasd/oraagent_oracle/oraagent_oracle.log"
2015-10-12 18:33:56.432
[ohasd(39125)]CRS-2765:Resource 'ora.asm' has failed on server 'kulpltaxdb02'.
2015-10-12 18:33:57.887
[ohasd(39125)]CRS-8011:reboot advisory message from host: kulpltaxdb02, component: cssagent, with time stamp: L-2015-10-12-18:33:57.827
[ohasd(39125)]CRS-8013:reboot advisory message text: clsnomon_status: need to reboot, unexpected failure 8 received from CSS <<<<<<<

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms