Adding Partitions to a Table with Impala Takes Overly Long on BDA V4.0 as the Number of Partitions Increases (Doc ID 1967599.1)

Last updated on FEBRUARY 04, 2015

Applies to:

Big Data Appliance Integrated Software - Version 4.0 and later
Linux x86-64

Symptoms

With Impala 1.4.2 and CDH 5.1.2, creating a table with 4000 partitions by running a job to add a partition at a time shows that the process dramatically slows down after about 400 partitions.  As the number of partitions added increases the process of adding them slows down.

In the examplehere:

select count(distinct dib_prod_level2) from dib_trans_prod_lev1_final
+---------------------------------+
| count(distinct dib_prod_level2) |
+---------------------------------+
| 398 |
+---------------------------------+

with 398 partitions, the insert statement (see below) to a new partition takes about 80 seconds while the initial partition inserted to the table in about 5-10 seconds.

Insert Statement:

insert into table <name>.x_trans_prod_lev1_final partition(x_prod_level2='46610') select x_prod_id, x_bask_id,x_cust_id,x_quantity,x_shop_date,x_spend,x_store_id,x_time_code,x_pref_store_id from <name>.x.trans_prod_lev1 where x_prod_level2='46610'

The issue seems to be that the hive metastore get_partitions_by_names command is being called every time a partition is added and this is taking upwards of a minute.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms