My Oracle Support Banner

Adding Partitions to a Table with Impala Takes Overly Long on BDA V4.0 as the Number of Partitions Increases (Doc ID 1967599.1)

Last updated on NOVEMBER 08, 2022

Applies to:

Big Data Appliance Integrated Software - Version 4.0 and later
Linux x86-64

Symptoms

With Impala 1.4.2 and CDH 5.1.2, creating a table with 4000 partitions by running a job to add a partition at a time shows that the process dramatically slows down after about 400 partitions.  As the number of partitions added increases the process of adding them slows down.

In the example here:

select count(<TABLE1>) from <TABLE2>
+---------------------------------+
| count(<TABLE1>) |
+---------------------------------+
| 398 |
+---------------------------------+

with 398 partitions, the insert statement to a new partition takes about 80 seconds while the initial partition inserted to the table in about 5-10 seconds.

The issue seems to be that the hive metastore get_partitions_by_names command is being called every time a partition is added and this is taking upwards of a minute.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.