BDA Node 2 Reprovision Fails at Step 11 - Yarn down, NodesManager Fails to Come up with "First error: -8192 is less than the minimum allowed value 0." (Doc ID 1957237.1)

Last updated on DECEMBER 26, 2014

Applies to:

Big Data Appliance Integrated Software - Version 4.0 and later
Linux x86-64

Symptoms

 
1. BDA Node 2 Migration fails at step 11 with:

Error [1171]: (//bdanode04.example.com//Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns) change from n\
otrun to 0 failed: /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &> /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1418944098.out returned 1 instead of one of [0]


2. The /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1418944098.out file shows the yarn service can not start: "Failed to start service"

Operation failed
Result Message is:   "Failed to restart service.",
API Version used is v7
Succeeded. Output in : /opt/oracle/BDAMammoth/bdaconfig/tmp/clusters_gdch01-cluster_services_yarn_commands_start.out
Command ID is 11173
....
Command 11173 finished after 25 seconds
Operation failed
Result Message is:   "Failed to start service.",


3. On Node 4, commands_11173.out, again reports "Failed to start service" for yarn.

# cat commands_11173.out
{
  "id" : 11173,
  "name" : "YarnOrderedStart",
 
"startTime" : "2014-12-18T23:22:12.089Z",
  "endTime" : "2014-12-18T23:22:35.256Z",
  "
active" : false,
  "success" : false,
  "resultMessage" : "Failed to start service.",
 
"serviceRef" : {
    "clusterName" : "<cluster_name>-cluster",
    "serviceName" : "yarn"
  },
 
"children" : {
    "items" : [ {
      "id" : 11174,
      "name" : "Start",
     
 "startTime" : "2014-12-18T23:22:12.118Z",
      
"endTime" : "2014-12-18T23:22:35.255Z",
      "active" : false,
    
  "success" : false,
      "resultMessage" :
"Command completed with 16/17 successful subcommands",
     
 "serviceRef" : {
        "clusterName" : "<cluster_name>-cluster",
      
  "serviceName" : "yarn"
      }
    } ]
  }


4. Cloudera Manager (CM) shows the yarn service down and CM > All Recent Commands shows "Failed to start service" for yarn restart.

5. CM shows both Resource Managers (Nodes 4/5) are down.  The CM logs show:

For Node 4 Resource Manager

5:08:25.145 PM     
ERROR     org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Returning, interrupted : java.lang.InterruptedException
5:08:25.146 PM     
INFO     org.apache.hadoop.yarn.util.AbstractLivelinessMonitor AMLivelinessMonitor thread interrupted
5:08:25.146 PM     
INFO     org.apache.hadoop.yarn.util.AbstractLivelinessMonitor AMLivelinessMonitor thread interrupted
5:08:25.146 PM     
ERROR     org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManagerExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
5:08:25.146 PM     
INFO     org.apache.hadoop.yarn.util.AbstractLivelinessMonitor org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.ContainerAllocationExpirer thread interrupted
5:08:25.150 PM     
INFO     org.apache.hadoop.yarn.server.resourcemanager.ResourceManager Transitioned to standby state
5:08:25.150 PM     
INFO     org.apache.hadoop.yarn.server.resourcemanager.ResourceManager SHUTDOWN_MSG:


For Node 5 Resource Manager

5:07:32.521 PM     
INFO     org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger USER=yarn    OPERATION=transitionToStandby TARGET=RMHAProtocolService    RESULT=SUCCESS
5:08:24.990 PM     
ERROR     org.apache.hadoop.yarn.server.resourcemanager.ResourceManager RECEIVED SIGNAL 15: SIGTERM


6. Both Resource Managers can be started manually in CM, however, the NodeManager on Node 2 fails to come up with CM > Commands reporting:

Command failed to run because this role has invalid configuration. Review and correct its configuration.  First error: -8192 is less than the minimum allowed value 0.


Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms