Flash Disk Group and Cache Performance Problems in a Clustered Configuration Including ODA X5-2; Symptoms: Very High DBWR CPU, DB FLASH cache waits (User I/O) ; Buffer Cache / Busy Waits (Concurrency) ; enq: TX-row lock contention (Concurrency)
(Doc ID 2120400.1)
Last updated on JUNE 13, 2023
Applies to:
Oracle Database Appliance X5-2 - Version All Versions to All Versions [Release All Releases]Oracle Database - Enterprise Edition - Version 11.2.0.1 to 12.2.0.1 [Release 11.2 to 12.2]
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
ODA, performance problems, CPU 100%, Hang, FLASH,ODI TRUNCATE,LOCK,RAC,DBWR,SPIN,CHKPT, Lock, outage
Symptoms
This note discusses two independent forms of FLASH usage:
- Flash Cache which is a database feature available in 12c. In-memory database functionality uses flash cache.
- FLASH DISK GROUP: The ODA X5-2 introduced a hybrid HDD and SSD topology>
The FLASH Diskgroup on the ODA X5-2 is built on NVMe shared drives(4) which are accessible on the remote as well as the local node.
Symptoms
This note will discuss a collection of symptoms that have a common source: Database Flash Cache enabled on two or more nodes.
The FLASH / buffer cache symptoms discussed share one factor: Disabling FLASH CACHE will workaround the problem.
Storing data in the FLASH DISK GROUP in the ODA X5-2 can encounter similar symptoms even if FLASH CACHE is not explicitly enabled.
Using Database Flash Cache does not mean you will hit this problem with your usage.
There are potentially two or more bugs which have similar symptoms but unique fixes, but all share the pre-requisites of:
- Flash Enabled or storage on shared FLASH disks
- Two or more nodes ¹
This can occur on any platform that uses Flash Cache on more than the local node,
When using the DB Smart Flash Cache feature in ODA X5-2 the remote FLASH CACHE is automatically accessible to the opposing node
Otherwise, this problem typically requires a RAC configuration with database Flash cache enabled and accessible for two or more nodes.
(Doc ID 2120400.1). Formerly Labeled -- ODA X5-2: Lock, Hang for Checkpoint (CKPT) or DBwriter (DBWR), 'GC CURRENT REQUEST' or Intermittent Bad Performance, High CPU and / or Very Poor IO
* The Oracle Database Appliance (ODA) X5-2 by default automatically enables remote access to the FLASH DISK GROUP* on both nodes if cache size <> 0:
regardless of the instance type: even if using Enterprise Edition (EE) which is a single-instance database.
Please upgrade to the newest version of the ODA if using FLASH on X5-2: There are also several single-patches and merges available or in the process of being created.
Note: Oracle Database Appliance X5-2 introduces four additional 400 GB SSDs in slot numbers 16-19 that can be used to host database files, or they can be used as a database flash cache in addition to the buffer cache.
An ASM diskgroup named +FLASH with Normal Redundancy is provisioned on these SSDs. All of the storage in the +FLASH diskgroup is allocated to an ASM Dynamic Volume (flashdata), and formatted as an ACFS file system.
Storage in this flashdata file system is then made available as an ACFS file system and is used to create database flash cache files that accelerate read operations.
The file that contains the flash cache is automatically created for each database and is specified using the database init.ora parameter db_flash_cache_file.
By default, flash_cache_file_size is set to 3 times the size of SGA, up to 196 GB, unless there is not enough space, in which case the size parameter is set to 0.
Changing the flash_cache_file_size parameter requires restarting the database in order to use the newly sized flash cache.
Rediscovery
RAC or Clustered* configuration experiences very poor performance², up to and including complete hangs;
Pre-requisite
db_flash_cache_size <> 0
X5-2 ODA or RAC / Cluster configuration
Database flash cached enabled on two or more nodes
or remote Storage enabled in the Flash Diskgroup
Avoids symptoms
Disable db flash cache
SYMPTOMS
High CPU
- DBWR 90% up to 100%
- DBWR for parallel slave processes
- CKPT sessions WAITING between nodes with DBWR process
- Very high CPU Elapsed time for SQL
Flash metrics confirm active usage during slow performance time windows:
NOTE - The existence of these metrics does not indicate a problem in progress:
These AWR metrics will confirm Flash Cache usage:
- flash cache inserts
- flash cache eviction: invalidated
- flash cache insert skip: DBWR overloaded ***
- flash cache insert skip: not current
- flash cache insert skip: not useful
Also - the following can grow substantially during the problem period - free buffer inspected ***
* Note that the ODA can implicitly use both nodes of the FLASH for RAC or RACone.
** Not confirmed or tested against 10.xNOTE: While this document was originally written for the ODA X5-2, the bug is generic to multiple nodes + FLASH usage on RDBMS 11.2.0.3.x and higher including 12.x
Several RDBMS fixes now exist for this problem and others are in progress.
We recommend applying this single-patches on your 12.x RDBMS versions for now and will include this in the next ODA Patch Bundle.Cause
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.