BDA Node 3 Migration Fails at Step 10 StartHadoopServices - Both NameNodes in Standby and Command "'HdfsStartWithFailovers' failed for service 'hdfs'" (Doc ID 1982525.1)

Last updated on FEBRUARY 23, 2015

Applies to:

Big Data Appliance Integrated Software - Version 4.1.0 and later
Linux x86-64

Symptoms

1. BDA Node 3 migration fails at step 10 with:

Error [7367]: (//bdanode05.us.oracle.com//Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns)
change from notrun to 0 failed: /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &>
 /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1424457700.out returned 1 instead of one of [0]


2. /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1424457700.out shows:

Command 1018 finished after 90 seconds
Operation failed
Result Message is:   "Command 'Start' failed for cluster '<cluster_name>'",

3. In Cloudera Manager (CM)

a) Both NameNodes are in Standby.


b)  Cloudera Manager (CM) > Commands shows:

Command 'HdfsStartWithFailovers' failed for service 'hdfs'

Command 'Start' failed for service 'hdfs'


c) From CM details:

...
2015-02-20 14:09:55,831 FATAL org.apache.hadoop.ha.ZKFailoverController: Unable to start failover controller. Parent znode does not exist.
Run with -formatZK flag to initialize ZooKeeper.
2015-02-20 14:09:55,832 INFO org.apache.zookeeper.ZooKeeper: Session: 0x24ba85ccd90000f closed
2015-02-20 14:09:55,832 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
...

 d) From CM stderr:

+ '[' mkdir = zkfc ']'
+ '[' nfs3 = zkfc ']'
+ '[' namenode = zkfc -o secondarynamenode = zkfc -o datanode = zkfc ']'
+ exec /opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop-hdfs/bin/hdfs
--config /var/run/cloudera-scm-agent/process/242-hdfs-FAILOVERCONTROLLER zkfc


e) From CM stdout:

/usr/bin/kinit
using hdfs/bdax42bur09node01.us.oracle.com@US.ORACLE.COM as Kerberos principal
using /var/run/cloudera-scm-agent/process/242-hdfs-FAILOVERCONTROLLER/krb5cc_201 as Kerberos ticket cache
Program: hdfs/hdfs.sh ["zkfc"]


f) From CM Role Details:

2:09:55.762 PM     FATAL     org.apache.hadoop.ha.ZKFailoverController     

Unable to start failover controller. Parent znode does not exist.
Run with -formatZK flag to initialize ZooKeeper.

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms