After Upgrade to BDA V4.5 from BDA V4.4 with Big Data SQL Installed Spark Jobs Fail Due to Reduced Container Memory and Reduced Cgroup Memory Hard Limit

(Doc ID 2172153.1)

Last updated on AUGUST 15, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 to 4.5.0 [Release 4.5]
Linux x86-64

Symptoms

After upgrade to BDA V4.5 from BDA V4.4 with Big Data SQL installed Spark jobs fail with:

An error occurred while calling o1315.collectToPython.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 117 in stage 130.0 failed 4 times, most recent failure: Lost task 117.3 in stage 130.0 (TID 9778, bdanode0x.example.com): ExecutorLostFailure (executor 124 exited caused by one of the running tasks) Reason: Container killed by YARN for exceeding memory limits. 1.5 GB of 1.5 GB physical memory used. Consider boosting spark.yarn.executor.memoryOverhead.

Additional symptoms:

1. Container memory of the Nodes was reduced during the upgrade as below.

Property                                     Value After Upgrade       Value Before Upgrade
----------------                             -------------------------     ---------------------------------------

Container Memory                       17664 MiB                     35328 MiB

Cgroup Memory Hard Limit            17664 MiB                     35328 MiB

The changes are seen in CM by navigating: yarn > Configuration > History and Rollback.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms