Spark on Yarn History Server Going into Bad Health in Cloudera Manager with Logs Showing "Exception encountered when attempting to load application log" (Doc ID 2275705.1)

Last updated on JUNE 11, 2017

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 and later
Linux x86-64

Symptoms

The Spark on yarn History Server goes into bad health in Cloudera Manager. It is running out of heap memory. This can happen after upgrade.

Perform all steps on the Spark on yarn History Server host, which is Node 3 by default, as 'root' user unless specified otherwise.

Additional symptoms are:

1. Increasing the heap memory for Spark on yarn History Server does not resolve the issue.

2. The Spark History Server log file /var/log/spark-history-server-bdanode03.example.com.log, shows output indicating an exception loading the application log.

2017-06-10 16:45:00,997 ERROR org.apache.spark.deploy.history.FsHistoryProvider: Exception encountered when attempting to load application log hdfs://<cluster_name>-ns/user/spark/applicationHistory/application_<#>_<#>_1
java.io.IOException: Filesystem closed
at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:857)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2118)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1215)
at org.apache.hadoop.hdfs.DistributedFileSystem$19.doCall(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1211)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1414)
at org.apache.spark.scheduler.EventLoggingListener$.openEventLog(EventLoggingListener.scala:313)
at org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:572)
...

3. The CM agent log on the Spark on yarn History Server host, /var/log/cloudera-scm-agent/cloudera-scm-agent.log, shows an
exception like:

[10/Jun/2017 02:45:24 +0000] 22104 MainThread agent ERROR Caught unexpected exception in main loop.

4. Checking the event logs in the HDFS folder, /user/spark/applicationHistory, on the Spark on yarn History Server host, shows information like the following.  The Spark on yarn History Server is on Node 3 by default.

Check the Spark on yarn History Server event logs with:

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms