ocssd.bin High CPU Usage, Instance Crashes With ORA-29702 or ORA-29770 or ORA-29701 With "gipcWait failed with 16" (Doc ID 1287709.1)

Last updated on NOVEMBER 23, 2016

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1 to 11.2.0.2 [Release 11.2]
Information in this document applies to any platform.

Symptoms

 

Symptom1:

Instance terminates with the following symptoms:

LMON (ospid: 26356) has not moved for 25 sec .. : waiting for event 'CSS operation: data update' for 23 secs with wait_id 6238251.

LMON (ospid: 26356) waits for event 'CSS operation: data update' for 176 secs.

..
ORA-29770: global enqueue process LMON (OSID 26356) is hung for more than 150 seconds
..
ERROR: Some process(s) is not making progress.
LMHB (ospid: 26364) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process(s) is not making progress.
LMHB (ospid: 26364): terminating the instance due to error 29770
sgipcwWaitHelper sgipcwWait gipcWaitOsd gipcInternalWait gipcWaitF clsssRecvMsg clsssServerRPC clssgsUpdateData clssgsupdateprivate kgxgnupdpridata kjxgrc_hb_write kjxgrDD_hb_write kjxgrrcfgchk kjxggpoll kjfmact kjfcln ksbrdp

Symptom2:

Server process (foreground process, parallel process etc) reports error ORA-29701

..
ERROR: unrecoverable error ORA-29701 raised in ASM I/O path; terminating process 1911 .. 
..
2011-01-24 11:22:49.041: [ CSSCLNT]clssscConnect: gipcWait failed with 16 (12)
2011-01-24 11:22:52.498: [ CSSCLNT]clsssInitNative: connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_racnode1_)) failed, rc 16
kgxgncin: CLSS init failed with status 3
kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSS
NOTE: kfmsInit: ASM failed to initialize group services
Error ORA-29701 signaled at
..

Call stack

.. kfmsInit kfmsSlvReg kfmdSlvOpPriv kfmdWriteSubmitted kfk_process_an_ioq kfk_submit_io ..

 

Symptom3:

LMON reports error ORA-29702

ORA-29702: error occurred in Cluster Group Service operation
LMON (ospid: 25581): terminating the instance due to error 29702
The LMON trace shows a reconfiguration just before the crash:

*** 2011-10-12 01:17:27.418
kjxgmrcfg: Reconfiguration started, type 1
CGS/IMR TIMEOUTS:
CSS recovery timeout = 31 sec (Total CSS waittime = 65)
IMR Reconfig timeout = 75 sec
CGS rcfg timeout = 85 sec
kjxgmcs: Setting state to 6 0.

*** 2011-10-12 01:17:27.446
Name Service frozen
kjxgmcs: Setting state to 6 1.
kjxgmmeminfo: instance 2 info has a wrong size (112 120)
kjxggpoll: substate 1 action failed
kjxggpoll: change poll time to 50 ms
Error: Cluster Group Service encounters an error (2)
LMON caught an error 29702 in the main loop
error 29702 detected in background process
ORA-29702: error occurred in Cluster Group Service operation
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+461<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+53<-ksuitm()+1325<-ksbrdp  ..
----- End of Abridged Call Stack Trace -----

 

Common  Symptoms

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms