Last updated on JUNE 21, 2016
Applies to:Oracle Data Integrator - Version 184.108.40.206.0 to 220.127.116.11.0 [Release 12c]
Information in this document applies to any platform.
How to modify Oracle Data Integrator (ODI) Hive KMs to create Parquet tables.
Full support of Hadoop Techology and improved Hive KMs where included starting 18.104.22.168.1 into 12.2.1. In those versions the Hive KMs create and store tables in default serialization (text).
When creating narrow datasets this works fine but with very wide datasets where hundreds of columns is not unusual, this presents a performance issue.
Changing use to parquet type is desirable for the following reasons:
- parquet layout takes up far less space than text
- other workflows that just need to read a few columns from the table can retrieve from Parquet much faster than from text.
- parquet is a columnar storage so all scenarios that need to join parquet tables also run much better
So parquet being a columnar storage is a better implementation.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
Million Knowledge Articles and hundreds of Community platforms