On Oracle Big Data Appliance, how does Resource Manager Choose a Node to Launch Application Master in Case of Failure? (Doc ID 2114362.1)

Last updated on OCTOBER 11, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.1.0 to 4.3.0 [Release 4.1 to 4.3]
Linux x86-64

Goal

On Big Data Appliance, yarn jobs that are failing because the local directory where the job is launched is full and is expected. But the issue the jobs are being reattempted on the same node even after the first failure attempt

Application application_1448106546957_15089 failed 2 times due to AM Container for appattempt_1448106546957_15089_000002 exited with exitCode: -1000
For more detailed output, check application tracking page:http://<RM>:8088/proxy/application_1448106546957_15089/Then, click on links to logs of each attempt.
Diagnostics: Not able to initialize app-log directories in any of the configured local directories for app application_1448106546957_15089
Failing this attempt. Failing the application.


Is there a way to configure yarn to choose a different node after 1 failure attempt, or is it random assignment of the Application Master (AM) container by the Resource Manager(RM)?
 

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms