How To Modify ODI 12c Hive KMs to Create Parquet Tables Instead of Default Serialization (Text)
(Doc ID 2149684.1)
Last updated on JUNE 16, 2022
Applies to:Oracle Data Integrator - Version 18.104.22.168.0 to 22.214.171.124.0 [Release 12c]
Information in this document applies to any platform.
How to modify Oracle Data Integrator (ODI) Hive KMs to create Parquet tables.
Full support of Hadoop Techology and improved Hive KMs where included starting 126.96.36.199.1 into 12.2.1. In those versions the Hive KMs create and store tables in default serialization (text).
When creating narrow datasets this works fine but with very wide datasets where hundreds of columns is not unusual, this presents a performance issue.
Changing use to parquet type is desirable for the following reasons:
- parquet layout takes up far less space than text
- other workflows that just need to read a few columns from the table can retrieve from Parquet much faster than from text.
- parquet is a columnar storage so all scenarios that need to join parquet tables also run much better
So parquet being a columnar storage is a better implementation.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document