High System CPU Usage Noticed on Oracle Big Data Appliance when Running HBase Region Server on a Critical Node (Doc ID 1594521.1)

Last updated on OCTOBER 22, 2015

Applies to:

Big Data Appliance Integrated Software - Version 2.2.1 and later
Linux x86-64

Symptoms

The HBase Region Servers running on one or more critical BDA nodes went down. Also noticed HBase Region Server residing on the same node as HBase Master goes down more frequently. Trying to restart HBase Region Servers from Cloudera Manager fails.

64.5% system CPU usage or higher is noticed, and the processes kswapd0 and kswapd1 are using 100% of one core each. /var/log/messages reports 'Page Allocation' failures.

OS reboot didn't succeed and had to perform power cycle of the server through ILOM.

1. In /var/log/messages the following error may be seen:

Oct 15 15:36:16 bda1node03 kernel: java: page allocation failure. order:5, mode:0xd0
Oct 15 15:36:16 bda1node03 kernel: java: page allocation failure. order:5, mode:0xd0
Oct 15 15:36:16 bda1node03 kernel: Pid: 2551, comm: java Not tainted 2.6.32-200.21.1.el5uek #1
Oct 15 15:36:16 bda1node03 kernel: Call Trace:
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff810dd82f>] __alloc_pages_nodemask+0x524/0x595
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8110cf67>] kmem_getpages+0x4f/0xf4
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8110d13a>] fallback_alloc+0x12e/0x1ce
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8110d2fb>] ____cache_alloc_node+0x121/0x134
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8110d91b>] kmem_cache_alloc_node_notrace+0x84/0xb9
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8110d996>] __kmalloc_node+0x46/0x73
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff813b6bf0>] ? __alloc_skb+0x72/0x13d
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff813b6bf0>] __alloc_skb+0x72/0x13d
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff813ef154>] sk_stream_alloc_skb+0x3d/0xaf
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff813f03a5>] tcp_sendmsg+0x176/0x6cf
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff813ade9f>] __sock_sendmsg+0x5e/0x67
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff813adf68>] sock_aio_write+0xc0/0xd4
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8104b72a>] ? finish_task_switch+0x88/0xab
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8111990f>] do_sync_write+0xe7/0x12b
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8110c22e>] ? virt_to_head_page+0x29/0x2b
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff81076db0>] ? autoremove_wake_function+0x0/0x3d
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff811e9fc8>] ? security_file_permission+0x16/0x18
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8111a087>] vfs_write+0xc3/0x10a
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff8111adad>] sys_write+0x4c/0x72
Oct 15 15:36:16 bda1node03 kernel:  [<ffffffff81011db2>] system_call_fastpath+0x16/0x1b

2. Top command may show something similar to the following:

# top
top - 11:36:01 up 5 days, 18:54,  4 users,  load average: 666.00, 651.94, 482.60
Tasks: 943 total,  65 running, 868 sleeping,   0 stopped,  10 zombie
Cpu(s):  0.0%us, 64.5%sy,  0.0%ni, 29.0%id,  6.4%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  65831148k total, 21281748k used, 44549400k free,    59848k buffers
Swap:        0k total,        0k used,        0k free,  2533784k cached

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                                                                        
11429 hdfs      20   0 2644m 1.5g 2568 S 1197.1  2.3   2176:48 java                                                                                                                                                                          
 274 root      20   0     0    0    0 R 99.9  0.0  29:57.80 kswapd1                                                                                                                                                                        
 686 root      20   0     0    0    0 R 99.9  0.0  28:30.80 flush-9:2                                                                                                                                                                      
11521 root      20   0     8    4    0 R 99.9  0.0  21:38.85 OSWatcherFM.sh                                                                                                                                                                  
11726 root      20   0     8    4    0 R 99.9  0.0  19:18.95 oswsub.sh                                                                                                                                                                      
32217 <customer app>  20   0 17.1g 6.1g 2056 S 99.9  9.8   2556:15 java                                                                                                                                                                            
 273 root      20   0     0    0    0 R 99.5  0.0  29:24.26 kswapd0

  

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms