On Oracle Big Data Appliance Hive Import Command Runs Too Slow With Small Amounts of Data After Upgrading to V4.2.0
Last updated on JANUARY 21, 2018
Applies to:Big Data Appliance Integrated Software - Version 4.2.0 and later
After upgrading to BDA 4.2 / CDH 5.4.4 from BDA 4.0 / CDH 5.1.2 using the beeline export and import functions to export a database to store it on tape and a different behavior is noticed. The table that is being imported is partitioned by date, and on every partition there is very small data. This is not optimal but cannot be changed. It is more verbose than before, but it is observed that upon importing, it creates a MR job on every partition it imports. Since the data is very small it takes a lot more time creating and running the job than what is needed to import the data. This is really unsuitable, since a lot more time is needed to import a table.
On BDA 4.0 / CDH 5.1.2 the imports ran faster.
How to change this behavior, to be able to have it to use more partitions on every job or a way to improve the speed of the process?
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms