Hive 'Insert overwrite' into a Parquet Table Seems to be Hung due to Resource Contention (Doc ID 1986431.1)

Last updated on OCTOBER 11, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.1.0 and later
Linux x86-64

Symptoms

Trying to execute insert overwrite into a parquet table from beeline . Every time the query gets stuck at 49% and then goes on for 5-6 hours without any change in its state.

SET mapred.reduce.tasks=3000;
SET hive.exec.dynamic.partition=true;
SET hive.exec.max.dynamic.partitions.pernode=3000;
SET hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.max.dynamic.partitions =3000;
set mapreduce.reduce.java.opts=-Xmx3276m;
set mapreduce.reduce.memory.mb=4096;
use ...;
INSERT OVERWRITE TABLE GPOS_DO_FADS_parquet PARTITION (SRCE_SYS_ID,SALES_DATE) SELECT ... FROM gpos_do_fads_temporary;


Below is repeated in the job logs

[fetcher#9] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 962, commitMemory -> 781761742, usedMemory ->1091431955
[fetcher#8] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#8 about to shuffle output of map attempt_1424713130367_1972_m_000958_0 decomp: 2 len: 16 to MEMORY
[fetcher#9] org.apache.hadoop.mapreduce.task.reduce.Fetcher: fetcher#9 about to shuffle output of map attempt_1424713130367_1972_m_001216_0 decomp: 2 len: 16 to MEMORY
[fetcher#8] org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput: Read 2 bytes from map-output for attempt_1424713130367_1972_m_000958_0
[fetcher#8] org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 2, inMemoryMapOutputs.size() -> 963, commitMemory -> 781761744, usedMemory ->1091431959

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms