My Oracle Support Banner

Spark on YARN Frequently Asked Questions (Doc ID 1920743.1)

Last updated on JUNE 05, 2021

Applies to:

Big Data Appliance Integrated Software - Version 3.0.1 and later
Linux x86-64

Purpose

 To provide answers to frequently asked questions for Spark on YARN.

Questions and Answers

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Purpose
Questions and Answers
 How to configure Spark in BDA 3.0.1 to run Spark on YARN?
 What modes can Spark on YARN be run in?
 How to use Spark on Yarn - cluster mode so it uses the same shared resources and allocation specified in dynamic resource pool settings as YARN?
 I see that the yarn.log-aggregation.retain-seconds parameter has a value of 7 days. I was thinking that the value of 7 days is for the MR2 (history server). Does Spark also uses the same parameter even though the Spark History Server is not present? Is it the same whether you run Spark in client mode or cluster mode?
 How long do Yarn container logs remain for Spark? I am assuming the period is defined in one of the parameters in CM. Can you point me where in CM it's defined?
 I don't see the folder /user/spark/applicationHistory. Where is it located?
 How to find logs when running the Spark application?
 The instructions for the SparkPi example uses environment variables; but they are not set in our BDA cluster. Where are they set? For example, $SPARK_HOME is used in many places; but it is empty (echo $SPARK_HOME shows empty). So commands like hdfs dfs -put $SPARK_HOME/assembly/lib/spark-assembly_*.jar /user/spark/share/lib/spark-assembly.jar will fail. Is it because I haven't configured Standalone Spark Service yet?
 After configuring Spark in stand alone mode, the instructions in Doc ID 1916688.1 say shut down the Spark services in Cloudera Manager if you wish to run Spark on YARN.  Does this mean,  1. Configure in stand alone mode 2. Stop the service in CM? 3. No need to start the service, no need to make any configuration changes, we just submit the Spark applications in YARN mode?
 I had to stop the Spark services to run Spark on YARN. When do I re-start the service?
 For YARN cluster mode, the argument in step 6 to run SparkPi exmaple in Doc ID 1916688.1 says, --args yarn-standalone. Is it a typo to specify stand-alone for cluster mode?
 When running SparkPi example in yarn-client mode, I am not getting any error. But I am not sure exactly what it is supposed to display on the console. I don't see the final output (pi value) displayed on the console. Where can I find it?
 On the BDA is upgrading Spark on Yarn supported?
 Is it possible to upgrade Spark2 on the BDA?
 How to change spark logging level from INFO to WARN?
 On BDA V4.10/CDH 5.12.1 what do Spark warnings like "WARN metastore.ObjectStore: Version information not found in metastore" indicate?
 In BDA 5.1 with CDH 6.x is ORC a supported format with Spark applications?
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.