On Oracle Big Data Appliance BDA 4.1/CDH5.3.0, Yarn NodeManager(s) Crash Intermittently (Doc ID 2024032.1)

Last updated on OCTOBER 11, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.1.0 and later
Linux x86-64

Symptoms

On Oracle Big Data Appliance , yarn Nodemanager(s) go down intermittently say once a week.

Below errors noticed in hadoop-cmf-yarn-NODEMANAGER-<BDANode>.log.out.gz.

2015-06-02 01:11:21,166 ERROR org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: Failed to setup application log directory for application_1431232480009_0002
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 210327 for rondla.s) can't be found in cache
  at org.apache.hadoop.ipc.Client.call(Client.java:1411)
  at org.apache.hadoop.ipc.Client.call(Client.java:1364)
...
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 210327 for rondla.s) can't be found in cache
  at org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService.createAppDir(LogAggregationService.java:301)
...
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 210327 for rondla.s) can't be found in cache
  at org.apache.hadoop.ipc.Client.call(Client.java:1411)
....
2015-06-02 01:11:21,238 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.application.Application: Log Aggregation service failed to initialize, there will be no logs for this application
2015-06-02 01:11:21,240 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Removing uninitialized application application_1431232480009_0002
2015-06-02 01:11:21,241 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices: Got event APPLICATION_STOP for appId application_1431232480009_0002
2015-06-02 01:11:21,245 ERROR org.apache.hadoop.yarn.server.nodemanager.security.NMTokenSecretManagerInNM: No application Attempt for application : application_1431232480009_0002 started on this NM.
...

2015-06-02 01:11:21,303 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Downloading public rsrc:{ hdfs://sda-ns/user/kheti.d/onecp/incremental_pull/application/, 1432898343258, FILE, null }
2015-06-02 01:11:21,309 FATAL org.apache.hadoop.yarn.event.AsyncDispatcher: Error in dispatcher thread
java.lang.IllegalArgumentException: Can not create a Path from an empty string
  at org.apache.hadoop.fs.Path.checkPathArg(Path.java:127)
  at org.apache.hadoop.fs.Path.(Path.java:135)
  at org.apache.hadoop.fs.Path.(Path.java:94)
  at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.LocalResourcesTrackerImpl.getPathForLocalization(LocalResourcesTrackerImpl.java:420)
  at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$PublicLocalizer.addResource(ResourceLocalizationService.java:773)
  at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:687)
  at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerTracker.handle(ResourceLocalizationService.java:629)
  at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:173)

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms