My Oracle Support Banner

ODA HA: Massive IO Performance Issue On One of Two Nodes: Same OS version, Same Shared Storage, (Doc ID 2559921.1)

Last updated on SEPTEMBER 17, 2021

Applies to:

Linux OS - Version Oracle Linux 6.10 to Oracle Linux 6.10 [Release OL6U10]
Oracle Database Appliance Software - Version 12.1.2.10 to 18.1 [Release 12.1 to 12.2]
Information in this document applies to any platform.

Symptoms

A two-node ODA Cluster configuration using the same binaries and shared storage.
Extreme performance degradation for basic OS IO operations on one of two nodes after a database was moved from one node to another.

OS, Hardware, RDBMS and ODA were all checked for problems via several SRs

It was only after the database performance as the source was ruled out that a generic IO level test from each node confirmed the problem as system level from one node.

The following script was run for the IO level test:

!#/bin/bash
start_ts=$(date +%H:%M:%S:%N)
for i in {1..1000}
do
echo "Count: $i"
done
end_ts=$(date +%H:%M:%S:%N)
echo $start_ts
echo $end_ts
exit

  

It simply echos 1000 lines to the shell and show the start and enddate of the script.

The results were the following

[oda00]$ cat time_output_node0.txt

08:23:36:776457699
08:23:36:837322700

real 0m 0.068s
user 0m 0.044s
sys 0m 0.015s

 

 

  

[oda01]$ cat time_output_node1.txt

08:23:25:521320353
08:23:25:537715253

real 0m 0.020s
user 0m 0.010s
sys 0m 0.007s

 

Another test / pass resulted in the following time differences.

node0: 57.50323ms
node1: 15.596865ms

 

 

Changes

This particular installation had one upgrade to 18.3 after being imaged to ODA version 12.1.2.10
All components show as equal using ODACLI commands.
The OS also appeared to be at the same level.


ODA: 18.3.0.0.0
Linux: 6.10
Kernel: kernel-uek-4.1.12-124.18.6.el6uek.x86_64

 

 However a closer look in the sosreport did detect a few differences.
The tainted levels were not equal on each node.
By itself having tainted levels are not necessarily a problem. 
However, this was the first indication of a difference that previously had not been detected after several HW,OS and ODA checks.

tainted

node0 kernel.tainted = 69633
node1 kernel.tainted = 4097

 


A closer looks when trying to alter debug settings revealed more differences.
Files under the debug directory were not equal on both nodes with the problem node missing files.

node0:

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.