CLUSTERWARE DOESN'T RESTART A DB INSTANCE WHEN INSTANCE IS KILLED ('KILL -9 <PMON OF THE INSTANCE>') (Doc ID 2223404.1)

Last updated on JANUARY 21, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.4 to 12.1.0.2 [Release 11.2 to 12.1]
Information in this document applies to any platform.

Symptoms

Clusterware doesn't always restart a db instance when instance killed using 'kill -9 <pmon of the instance>'

Example: Killed DB instance  at Fri Jul 01 15:26:06 2016 and instance manually restarted at Fri Jul 01 15:43:46 2016

Most of the time the clusterware will immediately restart the instance ,However a couple times isn't getting started even after waiting over 15 minutes

Instance name: ORCL1

alert_ORCL1.log (instance alert log)

=======================

Fri Jul 01 15:26:01 2016
CJQ0 (ospid: 35994): terminating the instance due to error 472 <----PMON killed using 'kill -9' command
Fri Jul 01 15:26:01 2016
System state dump requested by (instance=1, osid=35994 (CJQ0)), summary=[abnormal instance termination].
System State dumped to trace file /ora01/app/oracle/diag/rdbms/ORCL/ORCL1/trace/ORCL1_diag_35354_20160701152601.trc
Fri Jul 01 15:26:06 2016
Instance terminated by CJQ0, pid = 35994
Fri Jul 01 15:43:46 2016
Starting ORACLE instance (normal) (OS id: 6436) <----Manually started

alert.log (Clusterware alert log)

=====================

2016-07-01 15:26:03.873 [ORAAGENT(49349)]CRS-5011: Check of resource "ORCL" failed: details at "(:CLSN00007:)" in "/ora01/app/oracle/diag/crs/rachost1/crs/trace/crsd_oraagent_oracle.trc"
2016-07-01 15:26:06.964 [ORAAGENT(49349)]CRS-5017: The resource action "ora.ORCL.db start" encountered the following error:
2016-07-01 15:26:06.964+ORA-29701: unable to connect to Cluster Synchronization Service
ORA-29702: error occurred in Cluster Group Service operation
ORA-29701: unable to connect to Cluster Synchronization Service
ORA-29702: error occurred in Cluster Group Service operation
. For details refer to "(:CLSN00107:)" in "/ora01/app/oracle/diag/crs/rachost1/crs/trace/crsd_oraagent_oracle.trc".
2016-07-01 15:43:45.175 [ORAAGENT(49349)]CRS-5011: Check of resource "ORCL" failed: details at "(:CLSN00007:)" in "/ora01/app/oracle/diag/crs/rachost1/crs/trace/crsd_oraagent_oracle.trc"
2016-07-01 15:55:52.780 [ORAAGENT(49349)]CRS-5011: Check of resource "ORCL" failed: details at "(:CLSN00007:)" in "/ora01/app/oracle/diag/crs/rachost1/crs/trace/crsd_oraagent_oracle.trc"
2016-07-01 15:55:55.998 [ORAAGENT(49349)]CRS-5011: Check of resource "ORCL" failed: details at "(:CLSN00007:)" in "/ora01/app/oracle/diag/crs/rachost1/crs/trace/crsd_oraagent_oracle.trc"
2016-07-01 15:55:59.095 [ORAAGENT(49349)]CRS-5017: The resource action "ora.ORCL.db start" encountered the following error:
2016-07-01 15:55:59.095+ORA-29701: unable to connect to Cluster Synchronization Service
ORA-29702: error occurred in Cluster Group Service operation
ORA-29701: unable to connect to Cluster Synchronization Service
ORA-29702: error occurred in Cluster Group Service operation

 

ocssd_1.trc (ocssd daemon trace log)
========================
2016-07-01 15:26:01.179710 : CSSD:3944572672: clssgmProcessFenceClient: Client clientID 1:2907:1 with pid(8715), proc(0x7fffc493dc70), client(0x7fffc4919940)
371845:2016-07-01 15:26:01.179714 : CSSD:3944572672: clssgmQueueFenceForCheck: (0x7fffc488b540) Death check for object type 5, pid 8715
371846:2016-07-01 15:26:01.179754 :GIPCXCPT:3944572672: gipcInternalDissociate: obj 0x7fffc56d7490 [0000000000841260] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rachost1_)(GIPCID=5a84d318-035ce3ae-47736))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rachost1_)(GIPCID=035ce3ae-5a84d318-8715))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 8715, readyRef (nil), ready 1, wobj (nil), sendp (nil) status 0flags 0x2013061e, flags-2 0x0, usrFlags 0x10010 } not associated with any container, ret gipcretFail (1)
2016-07-01 15:26:01.179794 :GIPCXCPT:3944572672: gipcDissociateF [clssgmDeadClient : clssgm1.c : 2858]: EXCEPTION[ ret gipcretFail (1) ] failed to dissociate obj 0x7fffc56d7490 [0000000000841260] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rachost1_)(GIPCID=5a84d318-035ce3ae-47736))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rachost1_)(GIPCID=035ce3ae-5a84d318-8715))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 8715, readyRef (nil), ready 1, wobj (nil), sendp (nil) status 0flags 0x2013061e, flags-2 0x0, usrFlags 0x10010 }, flags 0x0
2016-07-01 15:26:01.179805 : CSSD:3944572672: clssgmFenceClient: Initiating fence type(1) for clientID 1:2907:2 (0x7fffc5050870), same-group share of memberID 88:2:0 group(DBORCL)

It appears that CSS fenced the client at the time the kill -9 was initiated  as above.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms