High System CPU Usage Noticed on Oracle Big Data Appliance when Running HBase Region Server on a Critical Node
(Doc ID 1594521.1)
Last updated on OCTOBER 18, 2019
Applies to:
Big Data Appliance Integrated Software - Version 2.2.1 and laterLinux x86-64
Symptoms
The HBase Region Servers running on one or more critical BDA nodes went down. Also noticed HBase Region Server residing on the same node as HBase Master goes down more frequently. Trying to restart HBase Region Servers from Cloudera Manager fails.
64.5% system CPU usage or higher is noticed, and the processes kswapd0 and kswapd1 are using 100% of one core each. /var/log/messages reports 'Page Allocation' failures.
OS reboot didn't succeed and had to perform power cycle of the server through ILOM.
1. In /var/log/messages the following error may be seen:
Oct 15 15:36:16 <HOSTNAME3> kernel: java: page allocation failure. order:5, mode:0xd0
Oct 15 15:36:16 <HOSTNAME3> kernel: Pid: 2551, comm: java Not tainted 2.6.32-200.21.1.el5uek #1
Oct 15 15:36:16 <HOSTNAME3> kernel: Call Trace:
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff810dd82f>] __alloc_pages_nodemask+0x524/0x595
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8110cf67>] kmem_getpages+0x4f/0xf4
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8110d13a>] fallback_alloc+0x12e/0x1ce
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8110d2fb>] ____cache_alloc_node+0x121/0x134
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8110d91b>] kmem_cache_alloc_node_notrace+0x84/0xb9
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8110d996>] __kmalloc_node+0x46/0x73
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff813b6bf0>] ? __alloc_skb+0x72/0x13d
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff813b6bf0>] __alloc_skb+0x72/0x13d
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff813ef154>] sk_stream_alloc_skb+0x3d/0xaf
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff813f03a5>] tcp_sendmsg+0x176/0x6cf
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff813ade9f>] __sock_sendmsg+0x5e/0x67
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff813adf68>] sock_aio_write+0xc0/0xd4
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8104b72a>] ? finish_task_switch+0x88/0xab
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8111990f>] do_sync_write+0xe7/0x12b
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8110c22e>] ? virt_to_head_page+0x29/0x2b
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff81076db0>] ? autoremove_wake_function+0x0/0x3d
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff811e9fc8>] ? security_file_permission+0x16/0x18
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8111a087>] vfs_write+0xc3/0x10a
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff8111adad>] sys_write+0x4c/0x72
Oct 15 15:36:16 <HOSTNAME3> kernel: [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b
2. Top command may show something similar to the following:
top - 11:36:01 up 5 days, 18:54, 4 users, load average: 666.00, 651.94, 482.60
Tasks: 943 total, 65 running, 868 sleeping, 0 stopped, 10 zombie
Cpu(s): 0.0%us, 64.5%sy, 0.0%ni, 29.0%id, 6.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 65831148k total, 21281748k used, 44549400k free, 59848k buffers
Swap: 0k total, 0k used, 0k free, 2533784k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11429 hdfs 20 0 2644m 1.5g 2568 S 1197.1 2.3 2176:48 java
274 root 20 0 0 0 0 R 99.9 0.0 29:57.80 kswapd1
686 root 20 0 0 0 0 R 99.9 0.0 28:30.80 flush-9:2
11521 root 20 0 8 4 0 R 99.9 0.0 21:38.85 OSWatcherFM.sh
11726 root 20 0 8 4 0 R 99.9 0.0 19:18.95 oswsub.sh
32217 <APPLICATION> 20 0 17.1g 6.1g 2056 S 99.9 9.8 2556:15 java
273 root 20 0 0 0 0 R 99.5 0.0 29:24.26 kswapd0
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |