High CPU Utilization by cellsrv on Physical Standby on Exadata X8M
(Doc ID 2761968.1)
Last updated on NOVEMBER 29, 2021
Applies to:
Oracle Exadata Storage Server Software - Version 19.3.0.0.0 to 20.1.8.0.0 [Release 12.2 to 20.0]Information in this document applies to any platform.
Symptoms
1. When the standby database applies archive log, the CPU of all cell nodes will rise to 100%.
There are many cellsrv threads, CPU usage is most in user CPU and Load average is high.
e.g.
top - 10:02:15 up 31 days, 22:04, 0 users, load average: 82.43, 77.86, 78.51
%Cpu(s): 97.2 us, 2.3 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19160 root 20 0 116.7t 12.2g 5.8g R 54.5 6.5 1004:05 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19163 root 20 0 116.7t 12.2g 5.8g S 42.4 6.5 766:24.83 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19186 root 20 0 116.7t 12.2g 5.8g R 42.4 6.5 1132:08 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19295 root 20 0 116.7t 12.2g 5.8g R 42.4 6.5 1100:41 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19241 root 20 0 116.7t 12.2g 5.8g R 36.4 6.5 1089:40 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19256 root 20 0 116.7t 12.2g 5.8g R 36.4 6.5 1221:54 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19271 root 20 0 116.7t 12.2g 5.8g R 36.4 6.5 1188:40 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19307 root 20 0 116.7t 12.2g 5.8g R 36.4 6.5 1097:03 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19315 root 20 0 116.7t 12.2g 5.8g R 36.4 6.5 1106:27 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19206 root 20 0 116.7t 12.2g 5.8g R 33.3 6.5 1092:17 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19224 root 20 0 116.7t 12.2g 5.8g R 33.3 6.5 1153:19 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19275 root 20 0 116.7t 12.2g 5.8g R 33.3 6.5 1149:12 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19286 root 20 0 116.7t 12.2g 5.8g R 33.3 6.5 1143:49 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19314 root 20 0 116.7t 12.2g 5.8g R 33.3 6.5 1129:21 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
19148 root 20 0 116.7t 12.2g 5.8g R 30.3 6.5 1258:32 /opt/oracle/cell/cellsrv/bin/cellsrv 100 5000 9 5042
...
top - 10:02:20 up 31 days, 22:04, 0 users, load average: 76.23, 76.65, 78.11
%Cpu(s): 10.1 us, 2.0 sy, 0.0 ni, 88.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
top - 10:02:26 up 31 days, 22:04, 0 users, load average: 78.06, 77.02, 78.22
%Cpu(s): 97.3 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.4 si, 0.0 st
top - 10:02:31 up 31 days, 22:04, 0 users, load average: 79.57, 77.35, 78.33
%Cpu(s): 97.1 us, 2.5 sy, 0.1 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
top - 10:02:37 up 31 days, 22:05, 0 users, load average: 80.89, 77.66, 78.42
%Cpu(s): 97.3 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
top - 10:02:43 up 31 days, 22:05, 0 users, load average: 82.26, 78.00, 78.53
%Cpu(s): 96.7 us, 2.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.4 si, 0.0 st
top - 10:02:48 up 31 days, 22:05, 0 users, load average: 81.84, 77.98, 78.52
%Cpu(s): 97.0 us, 2.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.4 si, 0.0 st
top - 10:02:54 up 31 days, 22:05, 0 users, load average: 83.05, 78.30, 78.62
%Cpu(s): 96.3 us, 3.2 sy, 0.0 ni, 0.2 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
top - 10:03:00 up 31 days, 22:05, 0 users, load average: 77.99, 77.42, 78.33
%Cpu(s): 18.3 us, 6.7 sy, 0.0 ni, 75.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
top - 10:03:05 up 31 days, 22:05, 0 users, load average: 79.59, 77.76, 78.44
%Cpu(s): 96.6 us, 3.0 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.4 si, 0.0 st
top - 10:03:11 up 31 days, 22:05, 0 users, load average: 80.91, 78.07, 78.53
%Cpu(s): 97.4 us, 2.3 sy, 0.0 ni, 0.2 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
top - 10:03:16 up 31 days, 22:05, 0 users, load average: 82.12, 78.36, 78.63
%Cpu(s): 96.8 us, 2.5 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.5 si, 0.0 st
top - 10:03:22 up 31 days, 22:05, 0 users, load average: 83.15, 78.64, 78.72
%Cpu(s): 17.7 us, 4.8 sy, 0.0 ni, 77.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
top - 10:03:27 up 31 days, 22:05, 0 users, load average: 76.57, 77.35, 78.30
%Cpu(s): 13.3 us, 4.4 sy, 0.0 ni, 82.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
top - 10:03:32 up 31 days, 22:05, 0 users, load average: 77.89, 77.61, 78.38
%Cpu(s): 96.8 us, 2.9 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
top - 10:03:38 up 31 days, 22:06, 0 users, load average: 79.10, 77.87, 78.46
%Cpu(s): 96.7 us, 2.7 sy, 0.3 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.3 si, 0.0 st
top - 10:03:44 up 31 days, 22:06, 0 users, load average: 80.29, 78.14, 78.54
%Cpu(s): 96.9 us, 2.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.4 si, 0.0 st
top - 10:03:49 up 31 days, 22:06, 0 users, load average: 82.87, 78.76, 78.74
%Cpu(s): 97.4 us, 2.3 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.2 si, 0.0 st
2. Peeking cellsrv threads call stack show the following:
Cache::issueIO() -> PMemLog::checkIOOverlap() -> PMemLogStore::findMatchingRequests()
e.g.
Thread 166 (Thread 0xc5daffff700 (LWP 19149)):
#0 0x00000000013398cb in PMemLogStore::findMatchingRequests(GridDisk*, PMemLogGDiskState**, PMemLogStoreClientInfo*, unsigned int, PMemLogWriteLocation*, PMemLogStore_findMatchingRequests_Reason, Cacheable*) () at PMemLog.cpp:8147
#1 0x0000000001338751 in PMemLog::checkIOOverlap(IoType, Cacheable*, IOContext*, Job*) () at PMemLog.cpp:3435
#2 0x000000000112aa81 in Cache::issueIO(IoType, Cacheable**, IOContext&, ScanBlkStats*, int*, IOClientType, unsigned int, unsigned int*, int, int&) () at Cache.cpp:2148
#3 0x000000000112827a in Cache::put(Cacheable**, int&, Job&, oss_iorm*, int*, IOClientType, int) () at Cache.cpp:2610
#4 0x000000000113838c in CachePut::process() () at CachePut.cpp:398
#5 0x0000000001497cba in UserThread::mainLoop(unsigned int) () at UserThread.cpp:849
#6 0x00000000014966ab in UserThread::run() () at UserThread.cpp:1024
#7 0x0000000001451c68 in Scheduler::schedule() () at Scheduler.cpp:1722
#8 0x0000000001451a4e in _INTERNAL_13_Scheduler_cpp_7ff7ab03::kernelThreadMain(void*) () at Scheduler.cpp:1496
#9 0x000000000273f8da in oracle_fp_thread_main ()
#10 0x00007f5e1fe17ea5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007f5e1f70c96d in clone () from /lib64/libc.so.6
Changes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |