Database Open waits indefinitely on "parallel recovery coord wait for reply"

(Doc ID 1440522.1)

Last updated on MARCH 26, 2012

Applies to:

Oracle Server - Enterprise Edition - Version: 11.2.0.1 to 11.2.0.1 - Release: 11.2 to 11.2
Oracle Exadata Storage Server Software - Version: 11.1.0.3.0 to 11.2.2.1.1   [Release: 11.1 to 11.2]
Information in this document applies to any platform.
Affected Database: Oracle

Symptoms

1) Hanganalyze on database shows most of parallel sessions are waiting for ckpt process.
All 15 parallel recovery sessions are blocked by ckpt process.

===================================================================
Chains most likely to have caused the hang:
[a] Chain 1 Signature: 'rdbms ipc message'<='checkpoint completed'
Chain 1 Signature Hash: 0x3d61ea2d
[b] Chain 2 Signature: 'rdbms ipc message'<='checkpoint completed'
Chain 2 Signature Hash: 0x3d61ea2d
[c] Chain 3 Signature: 'rdbms ipc message'<='checkpoint completed'
Chain 3 Signature Hash: 0x3d61ea2d

Chain 1:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 2 (dbm.dbm2)
os id: 25367
process id: 48, oracle@exa1db02.obs.outback.com (P012)
session id: 3
session serial #: 1
}
is waiting for 'checkpoint completed' with wait info:
{
time in wait: 4.311249 sec
heur. time in wait: 21 min 58 sec
timeout after: 0.688751 sec
wait id: 616
blocking: 0 sessions
current sql: <none>
short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-__poll()+47<-sskgxp_select()+279<-skgxpiwait()+2623<-skgxpwaiti()+1420<-skgxpwait()+135<-ksxpwait()+2354<-ksliwat()+10828<-kslwaitctx()+162<-kslwait()+141<-kcbzck1()+518<-kcbrcp()+326<-kcrpcf()+186<-kcrpdv()+2064<-kxfprdp()+1076<-opirip()+1304<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
wait history:
* time between current wait and wait #1: 0.000081 sec
1. event: 'checkpoint completed'
time waited: 5.130105 sec
wait id: 615
* time between wait #1 and #2: 0.000074 sec
2. event: 'checkpoint completed'
time waited: 5.117093 sec
wait id: 614
* time between wait #2 and #3: 0.000068 sec
3. event: 'checkpoint completed'
time waited: 6.220294 sec
wait id: 613
}
and is blocked by
=> Oracle session identified by:
{
instance: 2 (dbm.dbm2)
os id: 24567
process id: 21, oracle@exa1db02.obs.outback.com (CKPT)
session id: 492
session serial #: 1
}
which is waiting for 'rdbms ipc message' with wait info:
short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10<-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-kslwait()+141<-ksarcv()+207<-ksbabs()+323<-ksbrdp()+923<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+266<-ssthrdmain()+214<-main()+201<-__libc_start_main()+244<-_start()+36
wait history:
* time between current wait and wait #1: 0.000024 sec
1. event: 'control file parallel write'
time waited: 0.000235 sec
wait id: 1037 p1: 'files'=0x2
p2: 'block#'=0x4
p3: 'requests'=0x2
* time between wait #1 and #2: 0.000319 sec
2. event: 'rdbms ipc message'
time waited: 3.000489 sec
wait id: 1036 p1: 'timeout'=0x12c
* time between wait #2 and #3: 0.000021 sec
3. event: 'control file parallel write'
time waited: 0.000220 sec
wait id: 1035 p1: 'files'=0x2
p2: 'block#'=0x4
} p3: 'requests'=0x2
.
.
.

Chain 16:
-------------------------------------------------------------------------------
Oracle session identified by:
{
instance: 2 (dbm.dbm2)
os id: 24599
process id: 33, oracle@exa1db02.obs.outback.com (TNS V1-V3)
session id: 100
session serial #: 1
}
is waiting for 'parallel recovery coord wait for reply' with wait info:
{
p1: ''=0x10020000
p2: ''=0x3d3
time in wait: 1.380203 sec
timeout after: 0.619797 sec
wait id: 1991
blocking: 0 sessions
current sql: alter database open
short stack: ksedsts()+461<-ksdxfstk()+32<-ksdxcb()+1782<-sspuser()+112<-__restore_rt()<-semtimedop()+10<-skgpwwait()+156<-ksliwat()+1821<-kslwaitctx()+162<-kslwait()+141<-kxfpqidqr()+2015<-kxfpqdqr()+236<-kcrprd()+110<-kcrpcf()+304<-kcratr_apply()+7234<-kcratr()+3075<-kctrec()+9509<-kcvcrv()+6232<-kcfopd()+1019<-adbdrv()+9883<-opiexe()+18300<-opiosq0()+6458<-kpooprx()+295<-kpoal8()+961<-opiodr()+1149<-ttcpip()+1251<-opitsk()+1633<-opiino()+958<-opiodr()+1149<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+214<-mai
wait history:
* time between current wait and wait #1: 0.000086 sec
1. event: 'parallel recovery coord wait for reply'
time waited: 2.001025 sec
wait id: 1990 p1: ''=0x10020000
p2: ''=0x3d2
* time between wait #1 and #2: 0.000031 sec
2. event: 'parallel recovery coord wait for reply'
time waited: 0.010932 sec
wait id: 1989 p1: ''=0x10020000
p2: ''=0x3d1
* time between wait #2 and #3: 0.000067 sec
3. event: 'parallel recovery coord wait for reply'
time waited: 0.997976 sec
wait id: 1988 p1: ''=0x10020000
p2: ''=0x3d0
}
==========================================================================

2) Found below messages in pstack of cellsrv process in all 3 cell servers:
#0 0x00000031a340ddcb in recvmsg () from /lib64/libpthread.so.0
#1 0x00002b4c516fb8f3 in ssskgxp_rcvmsg () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so
#2 0x00002b4c516f6910 in sskgxp_rcvmsg () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so
#3 0x00002b4c516a68d8 in skgxpfragrcv () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so
#4 0x00002b4c516a3813 in skgxplostack () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so
#5 0x00002b4c516705a3 in skgxp_recv_next_fragment () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so
#6 0x00002b4c516c5b61 in skgxprusr () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so
#7 0x00002b4c5166b726 in skgxpiwait () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so
#8 0x00002b4c51669dd6 in skgxpwaiti () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so
#9 0x00002b4c5166977a in skgxpwait () from /opt/oracle/cell/cellsrv/lib/libskgxp11.so

3) Also Free memory on all cell servers are lower as 4 Gb

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms