Hive Job Fails With OutOfMemory In Shuffle Phase in Oracle Big Data Appliance Mammoth V4.0.0 (Doc ID 1934690.1)

Last updated on OCTOBER 15, 2014

Applies to:

Big Data Appliance Integrated Software - Version 4.0 and later
Linux x86-64

Symptoms

A "large" Hive job is executing from the Hive Cli shell on Oracle Big Data Appliance Mammoth v4.0.0.

The job processes until the shuffle phase, where it fails with an OutOfMemory error. The Java Client Heap Size had been increased from 2GB to 6GB to 20GB however error still occurs. The job succeeds in HUE however.

Error stack for OutOfMemory:

Diagnostic Messages for this Task:
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#2
     at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
     at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:415)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1554)
     at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.OutOfMemoryError: Java heap space
     at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:56)
     at org.apache.hadoop.io.BoundedByteArrayOutputStream.<init>(BoundedByteArrayOutputStream.java:46)
     at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.<init>(InMemoryMapOutput.java:63)
     at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.unconditionalReserve(MergeManagerImpl.java:297)
     at org.apache.hadoop.mapreduce.task.reduce.MergeManagerImpl.reserve(MergeManagerImpl.java:287)
     at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyMapOutput(Fetcher.java:411)
     at org.apache.hadoop.mapreduce.task.reduce.Fetcher.copyFromHost(Fetcher.java:341)
     at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:165)


FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 8  Reduce: 1   Cumulative CPU: 129.48 sec   HDFS Read: 674964727 HDFS Write: 69605039 SUCCESS
Job 1: Map: 8  Reduce: 2   Cumulative CPU: 165.13 sec   HDFS Read: 1160921292 HDFS Write: 308334845 SUCCESS
Job 2: Map: 1  Reduce: 1   Cumulative CPU: 34.79 sec   HDFS Read: 69605406 HDFS Write: 69605039 SUCCESS
Job 3: Map: 3  Reduce: 1   Cumulative CPU: 118.25 sec   HDFS Read: 607991216 HDFS Write: 144802984 SUCCESS
Job 4: Map: 3  Reduce: 1   Cumulative CPU: 39.56 sec   HDFS Read: 277264031 HDFS Write: 155394962 SUCCESS
Job 5: Map: 1   Cumulative CPU: 13.89 sec   HDFS Read: 155395329 HDFS Write: 164776775 SUCCESS
Job 6: Map: 7  Reduce: 2   Cumulative CPU: 80.4 sec   HDFS Read: 1418400014 HDFS Write: 215346100 SUCCESS
Job 7: Map: 5  Reduce: 1   Cumulative CPU: 106.8 sec   HDFS Read: 953791293 HDFS Write: 264304721 SUCCESS
Job 8: Map: 1   Cumulative CPU: 30.74 sec   HDFS Read: 264307379 HDFS Write: 300180590 SUCCESS
Job 9: Map: 16  Reduce: 4   Cumulative CPU: 257.28 sec   HDFS Read: 4236034590 HDFS Write: 3116981278 SUCCESS
Job 10: Map: 11   Cumulative CPU: 900.51 sec   HDFS Read: 3117035077 HDFS Write: 41102318600 SUCCESS
Job 11: Map: 153  Reduce: 39   Cumulative CPU: 8386.41 sec   HDFS Read: 41172860983 HDFS Write: 181805859450 SUCCESS
Job 12: Map: 673  Reduce: 170   Cumulative CPU: 53823.36 sec   HDFS Read: 181829442073 HDFS Write: 192764984257 SUCCESS
Job 13: Map: 701  Reduce: 181   Cumulative CPU: 31004.33 sec   HDFS Read: 193546350127 HDFS Write: 1012571624 FAIL
Total MapReduce CPU Time Spent: 1 days 2 hours 24 minutes 50 seconds 930 msec

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms