CRSD goes down on one node when remote node is started due to OCR diskgroup dismount

(Doc ID 2333474.1)

Last updated on NOVEMBER 29, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 12.1.0.2 to 12.1.0.2 [Release 12.1]
Information in this document applies to any platform.

Symptoms

When clusterware is started on a node, crsd on other remote node goes down

For example:


Starting grid infrastructure node 1 when node 2 is already up

alert.log inside <ORACLE_BASE>/diag/crs/<node>/crs/trace from node 1:

2017-11-19 22:56:16.266 [OHASD(32654)]CRS-2112: The OLR service started on node rachost1.
2017-11-19 22:56:16.288 [OHASD(32654)]CRS-1301: Oracle High Availability Service started on node rachost1.
2017-11-19 22:56:16.290 [OHASD(32654)]CRS-8011: reboot advisory message from host: rachost1, component: cssmonit, with time stamp: L-2017-11-19-13:37:28.942
2017-11-19 22:56:16.290 [OHASD(32654)]CRS-8013: reboot advisory message text: Rebooting node due to connection problems with CSS
2017-11-19 22:56:16.290 [OHASD(32654)]CRS-8017: location: /etc/oracle/lastgasp has 2 reboot advisory log files, 1 were announced and 0 errors occurred
.
.
2017-11-19 22:56:45.660 [OCSSD(693)]CRS-1605: CSSD voting file is online: ORCL:OCR03; details in /u01/app/grid/diag/crs/rachost1/crs/trace/ocssd.trc.
2017-11-19 22:56:45.665 [OCSSD(693)]CRS-1605: CSSD voting file is online: ORCL:OCR02; details in /u01/app/grid/diag/crs/rachost1/crs/trace/ocssd.trc.
2017-11-19 22:56:45.675 [OCSSD(693)]CRS-1605: CSSD voting file is online: ORCL:OCR01; details in /u01/app/grid/diag/crs/rachost1/crs/trace/ocssd.trc.
2017-11-19 22:56:47.082 [OCSSD(693)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rachost2 rachost1 . <<<<<<<<<< Both nodes in cluster 
.
2017-11-19 22:57:13.237 [CRSD(1368)]CRS-1201: CRSD started on node rachost1. 

 alert.log inside <ORACLE_BASE>/diag/crs/<node>/crs/trace from node 2:

2017-11-19 22:56:47.115 [OCSSD(13102)]CRS-1601: CSSD Reconfiguration complete. Active nodes are rachost2 rachost1 . <<<<< Both nodes joined cluster 
2017-11-19 22:57:08.395 [CRSD(14071)]CRS-1006: The OCR location is inaccessible. Details in /u01/app/grid/diag/crs/rachost2/crs/trace/crsd.trc.
2017-11-19 22:57:08.441 [ORAAGENT(14169)]CRS-5822: Agent '/u01/app/12.1.0.2/grid/bin/oraagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:1:10} in /u01/app/grid/diag/crs/rachost2/crs/trace/crsd_oraagent_grid.trc.
2017-11-19 22:57:08.442 [ORAROOTAGENT(14173)]CRS-5822: Agent '/u01/app/12.1.0.2/grid/bin/orarootagent_root' disconnected from server. Details at (:CRSAGF00117:) {0:2:7} in /u01/app/grid/diag/crs/rachost2/crs/trace/crsd_orarootagent_root.trc.
2017-11-19 22:57:08.442 [ORAAGENT(15308)]CRS-5822: Agent '/u01/app/12.1.0.2/grid/bin/oraagent_oracle' disconnected from server. Details at (:CRSAGF00117:) {0:7:11} in /u01/app/grid/diag/crs/rachost2/crs/trace/crsd_oraagent_oracle.trc.
2017-11-19 22:57:08.443 [SCRIPTAGENT(14481)]CRS-5822: Agent '/u01/app/12.1.0.2/grid/bin/scriptagent_grid' disconnected from server. Details at (:CRSAGF00117:) {0:6:41} in /u01/app/grid/diag/crs/rachost2/crs/trace/crsd_scriptagent_grid.trc.
2017-11-19 22:57:08.517 [CRSD(10201)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 10201
2017-11-19 22:57:09.912 [CRSD(10201)]CRS-1013: The OCR location in an ASM disk group is inaccessible. Details in /u01/app/grid/diag/crs/rachost2/crs/trace/crsd.trc.
2017-11-19 22:57:09.920 [CRSD(10201)]CRS-0804: Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage Storage layer error [Insufficient quorum to open OCR devices] [0]]. Details at (:CRSD00111:) in /u01/app/grid/diag/crs/rachost2/crs/trace/crsd.trc.
2017-11-19 22:57:10.003 [CRSD(10219)]CRS-8500: Oracle Clusterware CRSD process is starting with operating system process ID 10219

At the same time, +ASM1 alert log inside <oracle_base>/diag/asm/+asm/+ASM1/trace from node 1 shows +ASM1 instance startup with diskgroup OCR mounted

alert_+ASM1.log
=============
Sun Nov 19 22:57:00 2017
MEMORY_TARGET defaulting to 2281701376.
* instance_number obtained from CSS = 2, checking for the existence of node 0...
* node 0 does not exist. instance_number = 2
Starting ORACLE instance (normal) (OS id: 1267)
.
Sun Nov 19 22:57:06 2017
Reconfiguration started (old inc 0, new inc 16)
ASM instance
List of instances (total 2) :
1 2.
.
Sun Nov 19 22:57:07 2017
SQL> ALTER DISKGROUP ALL MOUNT /* asm agent call crs *//* {0:5:3} */
Sun Nov 19 22:57:07 2017
NOTE: Diskgroup used for Voting files is:
OCR
Diskgroup with spfile:OCR
NOTE: Diskgroup used for OCR is:OCR
NOTE: Diskgroups listed in ASM_DISKGROUP are
DATA

FRA
.
.
Sun Nov 19 22:57:09 2017
SUCCESS: diskgroup OCR was mounted

During the time, alert_+ASM2.log inside <oracle_base>/diag/asm/+asm/+ASM1/trace from node 2 shows OCR diskgroup dismount with ERROR: no read quorum in group.

Sun Nov 19 22:57:06 2017
Reconfiguration started (old inc 14, new inc 16)
List of instances (total 2) :
1 2
.
Sun Nov 19 22:57:06 2017
Reconfiguration complete (total time 0.2 secs)
Sun Nov 19 22:57:07 2017
ERROR: no read quorum in group: required 2, found 1 disks <<<<<<<<<<<<<<<<<<< No read quorum reported
Sun Nov 19 22:57:07 2017
ERROR: Could not read PST for grp 3. Force dismounting the disk group. <<<<<<<<<<<<<<< dismounting diskgroup further
Sun Nov 19 22:57:07 2017
NOTE: cache dismounting (not clean) group 3/0xD1285CC0 (OCR)
NOTE: messaging CKPT to quiesce pins Unix process pid: 10187, image: oracle@rachost2 (B000)
Sun Nov 19 22:57:07 2017
NOTE: halting all I/Os to diskgroup 3 (OCR)
Sun Nov 19 22:57:07 2017
NOTE: LGWR doing non-clean dismount of group 3 (OCR) thread 1
NOTE: LGWR sync ABA=28.854 last written ABA 28.854
.
Sun Nov 19 22:57:08 2017
SQL> alter diskgroup OCR dismount force /* ASM SERVER:3509083328 */
.
Sun Nov 19 22:57:08 2017
SUCCESS: diskgroup OCR was dismounted <<<<<<<<<<<< Diskgroup dismounted, hence crsd goes down from node 2

Similar behaviour happens when node 2 clusterware starts up while node 1 is already UP.

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms