On Oracle Big Data Appliance Hive Import Command Runs Too Slow With Small Amounts of Data After Upgrading to V4.2.0
(Doc ID 2071176.1)
Last updated on DECEMBER 13, 2019
Applies to:Big Data Appliance Integrated Software - Version 4.2.0 and later
After upgrading to BDA 4.2 / CDH 5.4.4 from BDA 4.0 / CDH 5.1.2 using the beeline export and import functions to export a database to store it on tape and a different behavior is noticed. The table that is being imported is partitioned by date, and on every partition there is very small data. This is not optimal but cannot be changed. It is more verbose than before, but it is observed that upon importing, it creates a MR job on every partition it imports. Since the data is very small it takes a lot more time creating and running the job than what is needed to import the data. This is really unsuitable, since a lot more time is needed to import a table.
On BDA 4.0 / CDH 5.1.2 the imports ran faster.
How to change this behavior, to be able to have it to use more partitions on every job or a way to improve the speed of the process?
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document