LMON unresponsive executing a cross instance call to offline an ASM disk (Doc ID 2090742.1)

Last updated on DECEMBER 22, 2015

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.4 to 12.1.0.2 [Release 11.2 to 12.1]
Information in this document applies to any platform.

Symptoms

1. This is an Extended RAC configuration with two sites (Site A and B).

2. Customer is testing a site loss scenario, where the machines and disk array in the remote site are powered off

3. In one of the instances LMON is executing a Cross Instance Call (CIC) to offline a disk, which causes LMON to be unresponsive. A LMHB trace shows:

  

*** 2015-10-18 15:43:42.664
loadavg : 2.94 3.50 2.54
System user time: 0.01 sys time: 0.00 context switch: 46580
Memory (Avail / Total) = 205120.65M / 257569.06M
Swap (Avail / Total) = 16896.00M /  16896.00M
F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME    CMD
0 S o_edsdp  14967     1  0  80   0 - 7795331 poll_s 14:58 ?      00:00:11 ora_lmon_INST1
...
Short stack dump:
ksedsts()+244<-ksdxfstk()+58<-ksdxcb()+918<-sspuser()+224<-__sighandler()
<-__poll()+24<-sskgxp_selectex()+423<-skgxpiwait()+3894<-skgxpwaiti()+1900
<-skgxpwait()+178<-ksxpwait()+2585<-ksliwat()+9356<-kslwaitctx()+161
<-kjusuc()+8387<-ksigeti()+5778<-ksbcic_int()+4379<-ksbcic()+14
<-kfdOfflinePriv()+6692<-kfioDiskOffline()+748
<-kfioGenRqSet_complete()+738<-kfioRqSetComplete()+1970
<-kfioCompleteIO()+224<-kfioWaitIO()+1017<-kfioRequestPriv()+234
<-kfioRequest()+685<-ksfdafRequest()+708<-ksfdafBatchIO()+397
<-ksfdbio()+1484<-ksfdss_bio()+94<-ksfdbio()+167<-kccwbp()+934
<-kccpcp()+237<-kjxgrf_vr_write_i()+1163<-kjxgrf_vr_write()+63
<-kjxgrDD_vr_write()+107<-kjxgrs0h()+3105<-kjxgmcs()+404
<-kjxgmni()+177<-kjxgmrcfg()+464<-kjxggpoll()+562<-kjfmact()+317
<-kjfdact()+192<-kjfcln()+6023<-ksbrdp()+1068<-opirip()+1488<-opidrv()+616
<-sou2o()+145<-opimai_real()+270<-ssthrdmain()+412<-main()+236
<-__libc_start_main()+253

  

 - note that LMON is executing ksbcic(), which was from kfiDiskOffline().

 4. Since LMON was unresponsive this instance was terminated with the ORA-29770:

Sun Oct 18 15:43:02 2015
CKPT (ospid: 15064) waits for event 'enq: XR - database force logging' for 208 secs.
Sun Oct 18 15:43:22 2015
LGWR (ospid: 15050) waits for event 'log file parallel write' for 209 secs.
Sun Oct 18 15:43:40 2015
LMON (ospid: 14967) waits for event 'control file parallel write' for 202
secs.
Errors in file <PATCH>/trace/INST1_lmhb_15018.trc (incident=1016746):
ORA-29770: global enqueue process LMON (OSID 14967) is hung for more than 200 seconds

    

Changes

 Customer is making the disk array in the remote site unavailable, in order to test a site loss scenario.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms