My Oracle Support Banner

High CPU Utilization by cellsrv on Physical Standby on Exadata X8M (Doc ID 2761968.1)

Last updated on NOVEMBER 29, 2021

Applies to:

Oracle Exadata Storage Server Software - Version 19.3.0.0.0 to 20.1.8.0.0 [Release 12.2 to 20.0]
Information in this document applies to any platform.

Symptoms

1. When the standby database applies archive log, the CPU of all cell nodes will rise to 100%.

There are many cellsrv threads, CPU usage is most in user CPU and Load average is high.

 

e.g.

top - 10:02:15 up 31 days, 22:04,  0 users,  load average: 82.43, 77.86, 78.51
%Cpu(s): 97.2 us,  2.3 sy,  0.0 ni,  0.3 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
19160 root      20   0  116.7t  12.2g   5.8g R 54.5  6.5   1004:05 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19163 root      20   0  116.7t  12.2g   5.8g S 42.4  6.5 766:24.83 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19186 root      20   0  116.7t  12.2g   5.8g R 42.4  6.5   1132:08 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19295 root      20   0  116.7t  12.2g   5.8g R 42.4  6.5   1100:41 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19241 root      20   0  116.7t  12.2g   5.8g R 36.4  6.5   1089:40 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19256 root      20   0  116.7t  12.2g   5.8g R 36.4  6.5   1221:54 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19271 root      20   0  116.7t  12.2g   5.8g R 36.4  6.5   1188:40 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19307 root      20   0  116.7t  12.2g   5.8g R 36.4  6.5   1097:03 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19315 root      20   0  116.7t  12.2g   5.8g R 36.4  6.5   1106:27 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19206 root      20   0  116.7t  12.2g   5.8g R 33.3  6.5   1092:17 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19224 root      20   0  116.7t  12.2g   5.8g R 33.3  6.5   1153:19 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19275 root      20   0  116.7t  12.2g   5.8g R 33.3  6.5   1149:12 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19286 root      20   0  116.7t  12.2g   5.8g R 33.3  6.5   1143:49 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19314 root      20   0  116.7t  12.2g   5.8g R 33.3  6.5   1129:21 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19148 root      20   0  116.7t  12.2g   5.8g R 30.3  6.5   1258:32 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042

...

top - 10:02:20 up 31 days, 22:04,  0 users,  load average: 76.23, 76.65, 78.11
%Cpu(s): 10.1 us,  2.0 sy,  0.0 ni, 88.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

top - 10:02:26 up 31 days, 22:04,  0 users,  load average: 78.06, 77.02, 78.22
%Cpu(s): 97.3 us,  2.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st

top - 10:02:31 up 31 days, 22:04,  0 users,  load average: 79.57, 77.35, 78.33
%Cpu(s): 97.1 us,  2.5 sy,  0.1 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st

top - 10:02:37 up 31 days, 22:05,  0 users,  load average: 80.89, 77.66, 78.42
%Cpu(s): 97.3 us,  2.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st

top - 10:02:43 up 31 days, 22:05,  0 users,  load average: 82.26, 78.00, 78.53
%Cpu(s): 96.7 us,  2.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st

top - 10:02:48 up 31 days, 22:05,  0 users,  load average: 81.84, 77.98, 78.52
%Cpu(s): 97.0 us,  2.6 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st

top - 10:02:54 up 31 days, 22:05,  0 users,  load average: 83.05, 78.30, 78.62
%Cpu(s): 96.3 us,  3.2 sy,  0.0 ni,  0.2 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st

top - 10:03:00 up 31 days, 22:05,  0 users,  load average: 77.99, 77.42, 78.33
%Cpu(s): 18.3 us,  6.7 sy,  0.0 ni, 75.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

top - 10:03:05 up 31 days, 22:05,  0 users,  load average: 79.59, 77.76, 78.44
%Cpu(s): 96.6 us,  3.0 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st

top - 10:03:11 up 31 days, 22:05,  0 users,  load average: 80.91, 78.07, 78.53
%Cpu(s): 97.4 us,  2.3 sy,  0.0 ni,  0.2 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st

top - 10:03:16 up 31 days, 22:05,  0 users,  load average: 82.12, 78.36, 78.63
%Cpu(s): 96.8 us,  2.5 sy,  0.0 ni,  0.3 id,  0.0 wa,  0.0 hi,  0.5 si,  0.0 st

top - 10:03:22 up 31 days, 22:05,  0 users,  load average: 83.15, 78.64, 78.72
%Cpu(s): 17.7 us,  4.8 sy,  0.0 ni, 77.4 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

top - 10:03:27 up 31 days, 22:05,  0 users,  load average: 76.57, 77.35, 78.30
%Cpu(s): 13.3 us,  4.4 sy,  0.0 ni, 82.3 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st

top - 10:03:32 up 31 days, 22:05,  0 users,  load average: 77.89, 77.61, 78.38
%Cpu(s): 96.8 us,  2.9 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st

top - 10:03:38 up 31 days, 22:06,  0 users,  load average: 79.10, 77.87, 78.46
%Cpu(s): 96.7 us,  2.7 sy,  0.3 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.3 si,  0.0 st

top - 10:03:44 up 31 days, 22:06,  0 users,  load average: 80.29, 78.14, 78.54
%Cpu(s): 96.9 us,  2.7 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.4 si,  0.0 st

top - 10:03:49 up 31 days, 22:06,  0 users,  load average: 82.87, 78.76, 78.74
%Cpu(s): 97.4 us,  2.3 sy,  0.0 ni,  0.0 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st

 


2. Peeking cellsrv threads call stack show the following:

Cache::issueIO() -> PMemLog::checkIOOverlap() -> PMemLogStore::findMatchingRequests()

e.g.

Thread 166 (Thread 0xc5daffff700 (LWP 19149)):
#0  0x00000000013398cb in PMemLogStore::findMatchingRequests(GridDisk*, PMemLogGDiskState**, PMemLogStoreClientInfo*, unsigned int, PMemLogWriteLocation*, PMemLogStore_findMatchingRequests_Reason, Cacheable*) () at PMemLog.cpp:8147
#1  0x0000000001338751 in PMemLog::checkIOOverlap(IoType, Cacheable*, IOContext*, Job*) () at PMemLog.cpp:3435
#2  0x000000000112aa81 in Cache::issueIO(IoType, Cacheable**, IOContext&, ScanBlkStats*, int*, IOClientType, unsigned int, unsigned int*, int, int&) () at Cache.cpp:2148
#3  0x000000000112827a in Cache::put(Cacheable**, int&, Job&, oss_iorm*, int*, IOClientType, int) () at Cache.cpp:2610
#4  0x000000000113838c in CachePut::process() () at CachePut.cpp:398
#5  0x0000000001497cba in UserThread::mainLoop(unsigned int) () at UserThread.cpp:849
#6  0x00000000014966ab in UserThread::run() () at UserThread.cpp:1024
#7  0x0000000001451c68 in Scheduler::schedule() () at Scheduler.cpp:1722
#8  0x0000000001451a4e in _INTERNAL_13_Scheduler_cpp_7ff7ab03::kernelThreadMain(void*) () at Scheduler.cpp:1496
#9  0x000000000273f8da in oracle_fp_thread_main ()
#10 0x00007f5e1fe17ea5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5e1f70c96d in clone () from /lib64/libc.so.6

 

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.