BDA V4.2 Node 4 Migration Fails at Step 10 StartHadoopServices Restarting Yarn; Yarn is Successfully Started but Not Successfully Stopped

(Doc ID 2041489.1)

Last updated on AUGUST 11, 2015

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 and later
Linux x86-64

Symptoms

Node 4 migration fails at step 10 StartHadoopServices, starting Hadoop services, with this error:

ERROR: Puppet agent run on node bdanode0x had errors. List of errors follows

************************************
Error [65852]: (//bdanode0x.example.com//Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns) change from notrun to 0 failed: /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &> /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1438100469.out returned 1 instead of one of [0]
************************************



1. From ./setupscm_1438100469.out the failing command is 1111 restarting yarn:

Command 1111 finished after 5 seconds
Operation failed
Result Message is:   "Failed to restart service.",
API Version used is v10
Succeeded. Output in : /opt/oracle/BDAMammoth/bdaconfig/tmp/clusters_<cluster-name>-cluster_services_yarn_commands_start.out
Command ID is 1120

 

Note: clusters_<cluster_name>-cluster_services_yarn_commands_start.out, contains:
{
 "id" : 1120,
 "name" : "YarnOrderedStart",
 "startTime" : "2015-07-28T16:38:13.580Z",
 "active" : true,
 "serviceRef" : {
   "clusterName" : "<cluster_name>-cluster",
   "serviceName" : "yarn"
 }
}


2. From the commands_1111.out, the problem is failure to stop the yarn service although is successfully starts:

a) From commands_1111.out, yarn fails to stop:

{
 "id" : 1111,
 "name" : "Restart",
 "startTime" : "2015-07-28T16:38:08.439Z",
 "endTime" : "2015-07-28T16:38:10.193Z",
 "active" : false,
 "success" : false,
 "resultMessage" : "Failed to restart service.",
 "serviceRef" : {
   "clusterName" : "<cluster_name>-cluster",
   "serviceName" : "yarn"
 },
 "children" : {
   "items" : [ {
     "id" : 1112,
     "name" : "YarnOrderedStop",
     "startTime" : "2015-07-28T16:38:08.442Z",
     "endTime" : "2015-07-28T16:38:10.192Z",
     "active" : false,
     "success" : false,
     "resultMessage" : "Failed to stop service.",
     "serviceRef" : {
       "clusterName" : "<cluster_name>-cluster",
       "serviceName" : "yarn"
     }
   } ]
 }
}


b) But from the next command, commands_1120.out yarn starts without problem:

{
 "id" : 1120,
 "name" : "YarnOrderedStart",
 "startTime" : "2015-07-28T16:38:13.580Z",
 "endTime" : "2015-07-28T16:39:23.797Z",
 "active" : false,
 "success" : true,
 "resultMessage" : "Successfully started service.",
 "serviceRef" : {
   "clusterName" : "wdch01-cluster",
   "serviceName" : "yarn"
 },


3. In Cloudera Manager the yarn service is in "good" health.

4. Even though the yarn service is in "good" health, rerunning the node 4 migration, fails in exactly the same place. It is not possible to get past this step.

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms