How To Merge Small Parquet Fles into Larger Files. (Doc ID 2263910.1)

Last updated on MAY 09, 2017

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 and later
Linux x86-64

Goal

The question raised here is how to merge small parquet files created by Spark into bigger ones.

By default Spark creates 200 reducers and in turn creates 200 small files.

There is a solution available to combine small ORC files into larger ones, but that does not work for parquet files.

For example for ORC you can use:

ALTER TABLE table_name [PARTITION (partition_key = 'partition_value')] CONCATENATE


 

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms