LMON Hang With "CSS operation: data query" and Instance Was Terminated By LMHB Due To Error 29770

(Doc ID 2320077.1)

Last updated on OCTOBER 30, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1 to 11.2.0.4 [Release 11.2]
Information in this document applies to any platform.

Symptoms

1. LMHB terminated the database instance due to error 29770 (alert_orcl1.log)

Wed Sep 20 23:08:34 2017
LMON (ospid: 5008) waits for event 'CSS operation: data query' for 8 secs.
Wed Sep 20 23:08:38 2017
Thread 1 advanced to log sequence 66354 (LGWR switch)
Current log# 4 seq# 66354 mem# 0: +LOG1/ORCL/onlinelog/group_4.260.864051729
Current log# 4 seq# 66354 mem# 1: +LOG2/ORCL/onlinelog/group_4.260.864051743
Errors in file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_lmhb_5027.trc (incident=480165):
ORA-29770: global enqueue process LMON (OSID 5008) is hung for more than 70 seconds
Incident details in: /oracle/diag/rdbms/orcl/orcl1/incident/incdir_480165/orcl1_lmhb_5027_i480165.trc
ERROR: Some process(s) is not making progress.
LMHB (ospid: 5027) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process(s) is not making progress.
Wed Sep 20 23:08:39 2017
System state dump requested by (instance=1, osid=5027 (LMHB)), summary=[abnormal instance termination].
System State dumped to trace file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_diag_4996.trc
LMHB (ospid: 5027): terminating the instance due to error 29770 <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

 2. From systemstate dump, LMON is waiting for "CSS operation: data query" (orcl1_lmhb_5027_i480165.trc):

=== LMON (ospid: 5008) Heartbeat Report
==================================================
LMON (ospid: 5008) has no heartbeats for 99 sec. (threshold 70 sec) <<<<<<<<<<<<<<<<<<
: waiting for event 'CSS operation: data query' for 13 secs with wait_id 173783400.
===[ Wait Chain ]===
Wait chain is empty.
==============================
Dumping PROCESS LMON (ospid: 5008) States
==============================
===[ System Load State ]===
CPU Total 32 Core 16 Socket 2
Load normal: Cur 4881 Highmark 40960 (19.06 160.00)
===[ Latch State ]===
Not in Latch Get
===[ Session State Object ]===
----------------------------------------
SO: 0x1fc1583260, type: 4, owner: 0x1fd125efe0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0x1fd125efe0, name=session, file=ksu.h LINE:12624, pg=0
(session) sid: 1046 ser: 1 trans: (nil), creator: 0x1fd125efe0
flags: (0x51) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x409) -/-/INC
DID: , short-term DID:
txn branch: (nil)
oct: 0, prv: 0, sql: (nil), psql: (nil), user: 0/SYS
ksuxds FALSE at location: 0
service name: SYS$BACKGROUND
Current Wait Stack:
0: waiting for 'CSS operation: data query'
function_id=0x3, =0x0, =0x0
wait_id=173783400 seq_num=23067 snap_id=1
wait times: snap=13.550468 sec, exc=13.550468 sec, total=13.550468 sec <<<<<<<<<<<<<<<<<<
wait times: max=infinite, heur=1 min 15 sec
wait counts: calls=0 os=0
in_wait=1 iflags=0x5a0
Wait State:
fixed_waits=0 flags=0x22 boundary=(nil)/-1
Session Wait History:
elapsed time of 0.000004 sec since current wait
0: waited for 'CSS operation: data query'
function_id=0x3, =0x0, =0x0
wait_id=173783399 seq_num=23066 snap_id=1
wait times: snap=15.610047 sec, exc=15.610047 sec, total=15.610047 sec <<<<<<<<<<<<<
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000005 sec of elapsed time
1: waited for 'CSS operation: data query'
function_id=0x3, =0x0, =0x0
wait_id=173783398 seq_num=23065 snap_id=1
wait times: snap=15.606784 sec, exc=15.606784 sec, total=15.606784 sec <<<<<<<<<<<<<<
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000004 sec of elapsed time
2: waited for 'CSS operation: data query'
function_id=0x3, =0x0, =0x0
wait_id=173783397 seq_num=23064 snap_id=1
wait times: snap=15.602135 sec, exc=15.602135 sec, total=15.602135 sec <<<<<<<<<<<<<<<
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000004 sec of elapsed time
3: waited for 'CSS operation: data query'
function_id=0x3, =0x0, =0x0
wait_id=173783396 seq_num=23063 snap_id=1
wait times: snap=15.602871 sec, exc=15.602871 sec, total=15.602871 sec <<<<<<<<<<<<<<<
wait times: max=infinite
wait counts: calls=0 os=0
occurred after 0.000006 sec of elapsed time

 3. "ORA-29701: unable to connect to Cluster Synchronization Service" has been observed nearby time (alert_orcl1.log):

Wed Sep 20 23:05:18 2017
Errors in file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_10843.trc:
ORA-01114: IO error writing block to file (block # )
ORA-01114: IO error writing block to file 20001 (block # 722305)
ORA-29701: unable to connect to Cluster Synchronization Service
ORA-29701: unable to connect to Cluster Synchronization Service
Wed Sep 20 23:05:30 2017
LMON (ospid: 5008) waits for event 'CSS operation: data query' for 11 secs.
Wed Sep 20 23:05:34 2017
Errors in file /oracle/diag/rdbms/orcl/orcl1/trace/orcl1_ora_846.trc:
ORA-01114: IO error writing block to file (block # )
ORA-01114: IO error writing block to file 20002 (block # 78977)
ORA-29701: unable to connect to Cluster Synchronization Service
ORA-29701: unable to connect to Cluster Synchronization Service

 4. Trace file having "gipcretAuthFail (22)" messages logged (orcl1_ora_7312.trc):

2017-09-20 23:08:26.233: [ GIPCMUX] gipcmodMuxCallbackRecv: EXCEPTION[ ret gipcretAuthFail (22) ] error during recv on endp 0x196b5570
2017-09-20 23:08:26.233: [GIPCXCPT] gipcmodMuxCallbackRecv: internal receive request failed req 0x196d6f30 [0000000000000035] { gipcReceiveRequest : peerName '', data (nil), len 0, olen 0, off 0, parentEndp 0x196b73d0, ret gipcretAuthFail (22), objFlags 0x0, reqFlags 0x4 }, ret gipcretAuthFail (22) <<<<<<<<
2017-09-20 23:08:26.233: [ GIPCMUX] gipcmodMuxCallbackRecv: EXCEPTION[ ret gipcretAuthFail (22) ] error during recv on endp 0x196b5570
2017-09-20 23:08:26.233: [GIPCXCPT] gipcmodMuxCallbackRecv: internal receive request failed req 0x196cc060 [0000000000000036] { gipcReceiveRequest : peerName '', data (nil), len 0, olen 0, off 0, parentEndp 0x196b73d0, ret gipcretAuthFail (22), objFlags 0x0, reqFlags 0x4 }, ret gipcretAuthFail (22) <<<<<<<<

 5. CSSD Log also shows the "gipcretAuthFail (22)" error:

2017-09-20 22:55:35.755: [GIPCXCPT][3824294208] gipcmodClsaAuthStart: failuring during clsaauthmsg ret clsaretPROTERR (4), endp 0x2ac1e4392650 [000000001fa9c9f9] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_rachost1a_)(GIPCID=fc049e15-66ed076f-28437))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_talracsdw1a_)(GIPCID=66ed076f-fc049e15-15531))', numPend 4, numReady 1, numDone 3, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 15531, flags 0x603710, usrFlags 0x14000 }
2017-09-20 22:55:35.755: [GIPCXCPT][3824294208] gipcmodMuxTransferAccept: internal accept request failed endp 0x1af74a40, child 0x2ac1e4392650, ret gipcretAuthFail (22)
2017-09-20 22:55:35.755: [ GIPCMUX][3824294208] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretAuthFail (22) ] error during accept on endp 0x1af74a40 <<<<<<<<<<<<

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms