My Oracle Support Banner

Oracle Linux: RHCK 7 Automatic NUMA Balancing Induces High IO Wait (Doc ID 2749259.1)

Last updated on APRIL 07, 2024

Applies to:

Linux OS - Version Oracle Linux 7.4 and later
Oracle Cloud Infrastructure - Version N/A and later
Linux x86-64

Symptoms

RHCK kernel with default Automatic NUMA Balancing induces high IO wait times due to hints page fault during page migration.
For NUMA hardware, the access speed to main memory is determined by the location of the memory relative to the CPU. The performance of a workload depends on the application threads accessing data that is local to the CPU the thread is executing on. Automatic NUMA Balancing migrates data on demand to memory nodes that are local to the CPU accessing that data.

Command "cat /proc/sys/kernel/numa_balancing" will show to be 1.

The NUMA balancing is achieved from the following steps

  1. A task scanner periodically scans a portion of a task's address space and marks the memory to force a page fault when the data is next accessed.
  2. The next access to the data will result in a NUMA Hinting Fault. Based on this fault, the data can be migrated to a memory node associated with the task accessing the memory.
  3. To keep a task, the CPU it is using and the memory it is accessing together, the scheduler groups tasks that share data.

It is due to this induced page fault for page migration that causes high %iowait.

Whether this overhead can be compensated by tasks accessing local node memory depends.

Performance data shows blocking of tasks with high %iowait but no high disk I/Os access issues due to system reads and writes.

vmstat shows blocking on wa

 NUMA statistics can be found from /proc/vmstat

numa_pte_updates
The amount of base pages that were marked for NUMA hinting faults.

numa_huge_pte_updates
The amount of transparent huge pages that were marked for NUMA hinting faults. In combination with numa_pte_updates the total address space that was marked can be calculated.

numa_hint_faults
Records how many NUMA hinting faults were trapped.

numa_hint_faults_local
Shows how many of the hinting faults were to local nodes. In combination with numa_hint_faults, the percentage of local versus remote faults can be calculated. A high percentage of local hinting faults indicates that the workload is closer to being converged.

numa_pages_migrated
Records how many pages were migrated because they were misplaced. As migration is a copying operation, it contributes the largest part of the overhead created by NUMA balancing.

Also, Oracle UEK4 kernel has numa_balancing turning off starting from kernel version 4.1.12-124.20.5

Bug 28814880 - Enabling numa balancing causes high I/O wait on numa systems

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.