On Oracle Big Data Appliance Hive Import Command Runs Too Slow With Small Amounts of Data After Upgrading to V4.2.0 (Doc ID 2071176.1)

Last updated on OCTOBER 28, 2015

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 and later
Linux x86-64

Goal

After upgrading to BDA 4.2 / CDH 5.4.4 from BDA 4.0 / CDH 5.1.2 using the beeline export and import functions to export a database to store it on tape and a different behavior is noticed. The table that is being imported is partitioned by date, and on every partition there is very small data. This is not optimal but cannot be changed. It is more verbose than before, but it is observed that upon importing, it creates a MR job on every partition it imports. Since the data is very small it takes a lot more time creating and running the job than what is needed to import the data. This is really unsuitable, since a lot more time is needed to import a table.

On BDA 4.0 / CDH 5.1.2 the imports ran faster.

How to change this behavior, to be able to have it to use more partitions on every job or a way to improve the speed of the process?
 

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms