How To Merge Small Parquet Fles into Larger Files.
Last updated on MAY 09, 2017
Applies to:Big Data Appliance Integrated Software - Version 4.5.0 and later
The question raised here is how to merge small parquet files created by Spark into bigger ones.
By default Spark creates 200 reducers and in turn creates 200 small files.
There is a solution available to combine small ORC files into larger ones, but that does not work for parquet files.
For example for ORC you can use:
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms