Adding Partitions to a Table with Impala Takes Overly Long on BDA V4.0 as the Number of Partitions Increases
(Doc ID 1967599.1)
Last updated on NOVEMBER 08, 2022
Applies to:
Big Data Appliance Integrated Software - Version 4.0 and laterLinux x86-64
Symptoms
With Impala 1.4.2 and CDH 5.1.2, creating a table with 4000 partitions by running a job to add a partition at a time shows that the process dramatically slows down after about 400 partitions. As the number of partitions added increases the process of adding them slows down.
In the example here:
+---------------------------------+
| count(<TABLE1>) |
+---------------------------------+
| 398 |
+---------------------------------+
with 398 partitions, the insert statement to a new partition takes about 80 seconds while the initial partition inserted to the table in about 5-10 seconds.
The issue seems to be that the hive metastore get_partitions_by_names command is being called every time a partition is added and this is taking upwards of a minute.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |