Memory Contention in Linux results in High CPU load (Doc ID 2170336.1)

Last updated on AUGUST 11, 2016

Applies to:

Linux OS - Version Enterprise Linux 3.0 and later
Linux x86-64

Symptoms

 You have been directed to this Global Customer Support (GCS) document because your symptoms and provided data match a known scenario that can result from an inefficient configuration of the memory on your x86_64 Linux system. The goal of this document is to explain that scenario at a simple and high level.

Generally speaking, the symptoms can include:

Poor database performance
System running out of memory or excessive swapping
Database instances cannot be started
Crucial system services failing
Basic OS commands (such as "ls" or "cd") hanging or very slow
SSH connections hanging or very slow
RAC instance evictions
RAC node reboots

The data that you provided in the Service Request (SR) that directed you to this GCS document was Cluster Health Monitor (CHM), OS Watcher (OSW), or ExaWatcher data.  The data leading up the the time of the event shows:

Low free memory
Swap usage
Kswapd listed in the top 20 processes - often within the top 5
Increasing CPU load as kswapd works harder and harder to find swap-able memory for the Linux kernel.

 Here is an actual example from a recent SR:

2016_05_11_02_28_58_TopExaWatcher_hostname.dat
# Starting Time: 05/11/2016 02:28:58
# Sample Interval(s): 5

top - 02:46:01 up 178 days, 6:15, 0 users, load average: 21.76, 22.40, 22.73
Mem: 528998448k total, 523839964k used, 5158484k free, 885012k buffers
Swap: 25165820k total, 921000k used, 24244820k free, 25125744k cached

  
<<<< large but sustainable load, only 5.1Gb free memory, and the system has swapped in the past. Interesting. This may turn out to be kswapd resource contention. Lets see...........

top - 02:46:06 up 178 days, 6:15, 0 users, load average: 21.94, 22.43, 22.74
Mem: 528998448k total, 525601640k used, 3396808k free, 885012k buffers
Swap: 25165820k total, 921000k used, 24244820k free, 25128412k cached

  
<<<< free memory down to 3.3Gb

top - 02:46:11 up 178 days, 6:15, 0 users, load average: 22.26, 22.49, 22.75
Mem: 528998448k total, 526169396k used, 2829052k free, 884796k buffers
Swap: 25165820k total, 921484k used, 24244336k free, 25078876k cached

  
<<<< free memory down to 2.8Gb. Trouble should break lose any time now

top - 02:46:17 up 178 days, 6:15, 0 users, load average: 26.09, 23.28, 23.01
Mem: 528998448k total, 526165964k used, 2832484k free, 884612k buffers
Swap: 25165820k total, 922040k used, 24243780k free, 25013036k cached

  
<<<< free memory up just a bit to 2.8Gb from swapping. Kswapd just rose to #6

top - 02:46:22 up 178 days, 6:15, 0 users, load average: 35.61, 25.30, 23.66
Mem: 528998448k total, 526142256k used, 2856192k free, 884356k buffers
Swap: 25165820k total, 922224k used, 24243596k free, 24993280k cached

  
<<<< kswapd is #9. system is swapping, CPU is rising

top - 02:46:31 up 178 days, 6:15, 0 users, load average: 134.48, 46.94, 30.73
Mem: 528998448k total, 526088480k used, 2909968k free, 884300k buffers
Swap: 25165820k total, 922340k used, 24243480k free, 24984160k cached

  
<<<< look at CPU take off. System is now getting too busy to acknowledge network heartbeats over the private interconnect in a timely fashion.

top - 02:46:43 up 178 days, 6:16, 0 users, load average: 425.82, 112.18, 52.13
Mem: 528998448k total, 526050636k used, 2947812k free, 884248k buffers
Swap: 25165820k total, 922516k used, 24243304k free, 24960060k cached

  
<<< 425 CPU. Unsustainable. evictions / reboot sure to follow.

top - 02:46:51 up 178 days, 6:16, 0 users, load average: 639.99, 167.56, 70.75
Mem: 528998448k total, 526000952k used, 2997496k free, 884152k buffers
Swap: 25165820k total, 922756k used, 24243064k free, 24929772k cached

  
<<<< system out of control. Reboot imminent.

May 11 02:49:42 Linux OS reboot of Compute Node

  

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms