Upgrading a BDA V4.2 CDH 5.4 Cluster to CDH 5.4.4 Fails After Step 24 of 33 with "Failed to upgrade cluster"-Invalid /etc/hadoop/conf.cloudera.yarn/container-executor.cfg (Doc ID 2065551.1)

Last updated on OCTOBER 14, 2015

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 and later
Linux x86-64

Symptoms

1. Upgrading a BDA V4.2 CDH 5.4 cluster to CDH 5.4.4 following: Steps to Upgrade an Oracle Big Data Appliance Cluster with Mammoth V4.2 / CDH 5.4 to CDH 5.4.4 (Doc ID 2046235.1), fails after step 24 of 33 with "Invalid conf file provided : /etc/hadoop/conf.cloudera.yarn/container-executor.cfg"  like below:

...
at org.apache.hadoop.service.ServiceOperations.stop(ServiceOperations.java:52)
at org.apache.hadoop.service.ServiceOperations.stopQuietly(ServiceOperations.java:80)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:171)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
2015-10-09 20:49:21,904 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Failed to initialize container executor
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:211)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:462)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:509)
Caused by: java.io.IOException: Linux container executor not configured properly (error=24)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:187)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:209)
... 3 more
Caused by: ExitCodeException exitCode=24: Invalid conf file provided : /etc/hadoop/conf.cloudera.yarn/container-executor.cfg

at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.init(LinuxContainerExecutor.java:181)
... 4 more
2015-10-09 20:49:21,908 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at bdanode03.example.com/*.*.*.3
************************************************************/

2. Drilling down into the failing step shows:

Execute command YarnOrderedStart on service yarn
Failed to execute command Start on service yarn


The problem is none of the Node Managers can start.


3. On the cluster the content of the file: /etc/hadoop/conf.cloudera.yarn/container-executor.cfg looks to be ok.

4. Running the Yarn container-executor program with --checksetup, however, on one node indicates a problem as below:

Note: The container-executor binary, /opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor, is used to launch and manage YARN JVM containers for processing of jobs on the NodeManagers. The binary takes care of things such as changing the user id of the JVM in kerberised enviroments and setting logging and scratch directories.  The --checksetup option examines possible configuration errors that could cause Yarn job execution to fail.

The Yarn container-executor program with --checksetup indicates a problem:

 

Repeating the check on all nodes of the cluster indicates that all cluster nodes have the same failure.

In the successful case "/opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor --checksetup", returns nothing.

5. Comparing the permissions for "/opt/cloudera/parcels/CDH/lib/hadoop-yarn/bin/container-executor" to another environment identifies that on the cluster where the CDH 5.4.4 upgrade fails, it is owned by the wrong group, and that the setuid bit is not set.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms