My Oracle Support Banner

BDA Node 1 Migration Fails at Step 11 - Both NameNodes in Standby and "Failed to create NodeManager remote application log directory" Errors (Doc ID 1957275.1)

Last updated on DECEMBER 11, 2019

Applies to:

Big Data Appliance Integrated Software - Version 4.0 and later
Linux x86-64

Symptoms

NOTE: In the images, examples and document that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner. 

 

1. BDA Node 1 migration fails at step 11 with:

ERROR: Puppet agent run on node bdanode03 had errors. List of errors follows
************************************
Error [379]: (//bdanode03.example.com//Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns)
change from notrun to 0 failed: /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &>
/opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1419003491.out returned 1 instead of one of [0]
************************************


2. From the analysis below we find that Node 2 is in Standby so no Active NameNode as Node 1 has migrated to Node 6 and that is in Standby too:

a) /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1419003491.out shows:

Command 3423 finished after 95 seconds
Operation failed
Result Message is:   "Failed to create NodeManager remote application log directory.",


b) Drilling down into /opt/oracle/BDAMammoth/bdaconfig/tmp/commands_3423.out shows:

{
  "id" : 3423,
  "name" : "CreateLogDir",
  "startTime" : "2014-12-19T16:00:00.291Z",
  "endTime" : "2014-12-19T16:01:34.990Z",
  "active" : false,
  "success" : false,
  "resultMessage" : "Failed to create NodeManager remote application log directory.",
  "serviceRef" : {
    "clusterName" : "<CLUSTER_NAME>",
    "serviceName" : "yarn"
  },
  "children" : {
    "items" : [ {
      "id" : 3424,
      "name" : "CreateDir",
      "startTime" : "2014-12-19T16:00:00.342Z",
      "endTime" : "2014-12-19T16:01:34.988Z",
      "active" : false,
      "success" : false,
      "resultMessage" : "Command aborted because of exception: Command timed-out after 90 seconds",
      "serviceRef" : {
        "clusterName" : "<CLUSTER_NAME>",
        "serviceName" : "hdfs"
      },
      "roleRef" : {
        "clusterName" : "<CLUSTER_NAME>",
        "serviceName" : "hdfs",
        "roleName" : "hdfs-NAMENODE-<ID>"
      }
    } ]


c)  Cloudera Manager (CM) > Commands shows:

Create NodeManager Remote Application Log Directory
Failed to create NodeManager remote application log directory



d) Drilling down into Create NodeManager Remote Application Log Directory shows a problem on Node 2 running hdfs/hdfs.sh

CreateDir fails on Node2
Command aborted because of exception: Command timed-out after 90 seconds

Program: hdfs/hdfs.sh ["mkdir","/tmp/logs","mapred","hadoop","1777"]




e) From stderr the problem is that Node2 is Standby.

hdfs/bdanode02.example.com@EXAMPLE.COM got value #0
14/12/19 11:31:09 DEBUG retry.RetryInvocationHandler: Exception while invoking getFileInfo of class ClientNamenodeProtocolTranslatorPB after 10 fail over attempts. Trying to fail over after sleeping for 22278ms.
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
    at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
    at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1635)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1189)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3576)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:766)
...

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.