My Oracle Support Banner

Latency Issues with Compute Incremental Stats on a Wide Impala Table (Doc ID 2022008.1)

Last updated on AUGUST 03, 2021

Applies to:

Big Data Appliance Integrated Software - Version 4.1.0 and later
Information in this document applies to any platform.

Symptoms

NOTE: In the examples that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

On Oracle Big Data Appliance, COMPUTE INCREMENTAL STATS taking a very long time for a wide impala table (large number of columns).

As per Cloudera in general COMPUTE INCREMENTAL STATS takes more time than COMPUTE STATS for the same volume of data. Therefore it is most suitable for tables with large data volume where new partitions are added frequently, making it impractical to run a full COMPUTE STATS operation for each new partition. For unpartitioned tables, or partitioned tables that are loaded once and not updated with new partitions, use the original COMPUTE STATS syntax.

But testing showed that COMPUTE INCREMENTAL STATS takes 8 times longer and a run time of 10 minutes on a wide table with only 100 rows in an otherwise empty table.

Also tried with codegen disabled, which takes about 4 times as long as the normal compute stats statement.

> SET DISABLE_CODEGEN=1;
DISABLE_CODEGEN set to 1
> compute incremental stats stats_test partition(year=2015,month=1,day=1,load_id=29048798);
Query: compute incremental stats stats_test partition(year=2015,month=1,day=1,load_id=29048798)
+-------------------------------------------+
| summary |
+-------------------------------------------+
| Updated 0 partition(s) and 476 column(s). |
+-------------------------------------------+
Fetched 1 row(s) in 15.63s

 > compute incremental stats raw_clickstream partition(year=2015,month=1,day=1,load_id=29048798);
Query: compute incremental stats raw_clickstream partition(year=2015,month=1,day=1,load_id=29048798)
+-------------------------------------------+
| summary |
+-------------------------------------------+
| Updated 1 partition(s) and 476 column(s). |
+-------------------------------------------+
Fetched 1 row(s) in 408.51s

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.