Intermittent Hang and High CPU With Parallel Execution on Solaris

(Doc ID 1363601.1)

Last updated on MAY 17, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.1 and later
Oracle Solaris on SPARC (64-bit)

Symptoms

 

Note:  This issue is specific to the Solaris operating system.

 
High CPU and intermittent hangs during parallel execution were seen on a server running Solaris.   The top 5 timed events on an AWR showed "CPU + wait for CPU."

Monitoring the parallel query showed all slaves but one in an idle wait.   A single slave showed "NOT WAIT." 

Username QC/Slave SlaveSet SID Slave INS STATE WAIT_EVENT QC SID QC INS Req. DOP Actual DOP
------------ -------- -------- ------ --------- -------- ------------------------------ ------ ------ -------- ----------
RPTOWNER QC 243 1 WAIT PX Deq: Execute Reply 243
- p030 (Slave) 1 251 1 NOT WAIT 243 1 6 6
- p033 (Slave) 1 423 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p032 (Slave) 1 366 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p031 (Slave) 1 308 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p023 (Slave) 1 306 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p022 (Slave) 1 244 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p034 (Slave) 2 11 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p038 (Slave) 2 245 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p039 (Slave) 2 304 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p037 (Slave) 2 188 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p036 (Slave) 2 128 1 WAIT PX Deq: Execution Msg 243 1 6 6
- p035 (Slave) 2 67 1 WAIT PX Deq: Execution Msg 243 1 6 6


Doing multiple pstacks (pstack <ospid>) on the slave that was showing "NOT WAIT" listed the following functions:

----------------- lwp# 258 / thread# 258 --------------------
ffffffff7add6254 lwp_park (0, 0, 0) <---------- 258 threads and pstack is the same among all
ffffffff7adcf9fc cond_wait_queue (10b6db690, 10b6db6a0, 0, 0, ffffffff7af49e88, 0) + 4c
ffffffff7adcff80 cond_wait (10b6db690, 10b6db6a0, 0, 0, 0, ffffffff7af48100) + 10
ffffffff7ab03a60 _aio_idle (10b6db620, 1, 10b6db6a0, 0, ffffffff7ac08000, 0) + 28
ffffffff7ab031a8 _aio_do_request (10b6db620, 10000, 0, 0, ffffffff7ac08000, 13b) + c4
ffffffff7add61b4 _lwp_start (0, 0, 0, 0, 0, 0)


When a truss was was taken on the slave not in wait (truss -fae -E -o truss.out -p <ospid>), it showed the following.

16662/1: 0.0002 mmap(0xFFFFFFFF79EA0000, 65536, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_NORESERVE, 19, 3801088) = 0xFFFFFFFF79EA0000
16662/1: 0.0000 times(0xFFFFFFFF7FFF25F0) = 636045188
16662/1: 0.0000 times(0xFFFFFFFF7FFF25F0) = 636045188
16662/1: 0.0000 semctl(989855779, 43, SETVAL, 1) = 0
16662/1: 0.0002 kaio(AIOREAD, 267, 0xFFFFFFFF79E0F000, 262144, 0x5F60000079DF4210) = 0
16662/1: 0.0000 kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) = -2250291176
16662/1: 0.0000 kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) = 0
16662/1: 0.0000 semctl(989855779, 43, SETVAL, 1) = 0
16662/1: 0.0001 kaio(AIOREAD, 267, 0xFFFFFFFF79E5F000, 262144, 0x5F64000079DF4818) = 0
16662/1: 0.0000 kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) = 0
16662/1: 0.0000 kaio(AIOWAIT, 0xFFFFFFFF7FFF19D0) = -2250292720
16662/1: 0.0000 kaio(AIOREAD, 267, 0xFFFFFFFF79E0F000, 262144, 0x5F68000079DF4210) = 0
16662/1: 0.0000 kaio(AIOWAIT, 0xFFFFFFFFFFFFFFFF) = 0
16662/1: 0.0000 kaio(AIOWAIT, 0xFFFFFFFF7FFF19D0) = -2250291176
16662/1: 0.0000 semctl(989855779, 43, SETVAL, 1) = 0
. . .
16662/1: semtimedop(989855779, 0xFFFFFFFF7FFFC994, 1, 0xFFFFFFFF7FFFC980) (sleeping...)
16662/1: 0.0001 semtimedop(989855779, 0xFFFFFFFF7FFFC994, 1, 0xFFFFFFFF7FFFC980) Err#11 EAGAIN
16662/1: semtimedop(989855779, 0xFFFFFFFF7FFFC994, 1, 0xFFFFFFFF7FFFC980) (sleeping...)
16662/1: 0.0000 semtimedop(989855779, 0xFFFFFFFF7FFFC994, 1, 0xFFFFFFFF7FFFC980) Err#11 EAGAIN
16662/1: semtimedop(989855779, 0xFFFFFFFF7FFFC994, 1, 0xFFFFFFFF7FFFC980) (sleeping...)


The fact that the truss broke the hang loose and let the parallel execution complete was a major clue that this was an OS-related issue.

Changes

Upgrade of the database from 10gR1 to 11gR2.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms