My Oracle Support Banner

After Removing MIT Kerberos on BDA V4.5 with bdacli HDFS is in Bad Health, Failover Controllers are Down, and Both NameNodes in Standby (Doc ID 2197964.1)

Last updated on JANUARY 29, 2020

Applies to:

Big Data Appliance Integrated Software - Version 4.5.0 and later
Linux x86-64


NOTE: In the examples that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner. 

After removing MIT Kerberos with "bdacli disable kerberos" (using: Instructions to Disable Kerberos on Oracle Big Data Appliance with Mammoth V3.*/V4.* Releases (Doc ID 1919431.1)) the hdfs service is in bad health. It is the case that both Failover Controllers will not start and both NameNodes are in standby.  Note you are more likely to see this issue if the cluster has been expanded.

Running "bdacli disable kerberos" finishes successfully but running the cluster verification checks after with "./mammoth -c" shows many failing tests.

1. The Failover Controller log e.g. shows a FATAL error like:

2016-10-27 07:09:05,997 FATAL Got a fatal error, exiting now
java.lang.RuntimeException: ZK Failover Controller failed: Received create error from Zookeeper. code:NOAUTH for path /hadoop-ha/<CLUSTER_NAME>-ns/ActiveStandbyElectorLock
at org.apache.hadoop.ha.ZKFailoverController.mainLoop(
at org.apache.hadoop.ha.ZKFailoverController.doRun(
at org.apache.hadoop.ha.ZKFailoverController.access$000(
at org.apache.hadoop.ha.ZKFailoverController$
at org.apache.hadoop.ha.ZKFailoverController$

2. In CM trying to bring one of the NameNodes into active mode by trying to force a failover fails.  The role log/stderr report:

a) From the role log there is an error like:

unable to failover from namenode<x> to namenode<y> of nameservice <CLUSTER_NAME>-ns; see stderr log.

b) From stderr there is an error like:

+ acquire_kerberos_tgt hdfs.keytab
+ '[' -z hdfs.keytab ']'
+ '[' -n '' ']'
+ '[' validate-writable-empty-dirs = failover ']'
+ '[' file-operation = failover ']'
+ '[' bootstrap = failover ']'
+ '[' failover = failover ']'
+ ACTIVE='Failover failed: Can'\''t failover to an active service'
+ NS=<cluster_name>-ns
+ FROM_NN=namenode<x>
+ TO_NN=namenode<y>
+ FORCE=true

The reference to hdfs.keytab indicates that Kerberos is not fully cleaned up.


To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!

In this Document
 Prerequisite Steps
 Detailed Steps

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.