My Oracle Support Banner

BDA V4.11/BDA V4.12 Cluster Expansion to a Second Rack Fails on Step 8 Trying to Move the NameNode Role from the First Rack to the Second with "Failed to bootstrap Standby NameNode NameNode" (Doc ID 2435846.1)

Last updated on JULY 20, 2024

Applies to:

Big Data Appliance Integrated Software - Version 4.11.0 and later
Linux x86-64

Symptoms

NOTE: In the examples that follow, user details, table name, company name, email, hostnames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

On BDA V4.11/BDA V4.12 expanding a cluster with another rack fails on Step 8 when trying to move the NameNode role from the first rack to the second.  The failure reports "Failed to bootstrap Standby NameNode NameNode".

Mammoth fails with:

************************************
Error [10412]: (//bdanode01.example.com//Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns) change from notrun to 0 failed: /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &> /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_<EPOCH_TIMESTAMP>.out returned 1 instead of one of [0]
************************************

The associated /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_<#>.out files shows that there is a failure when moving the NameNode role from one rack to the other:

Operation failed
Result Message is: "Failed to bootstrap Standby NameNode NameNode (bdanode01): 18/08/10 14:35:30 INFO ipc.Client: Retrying connect to server: bdanode01.example.com/<PRIVATE_IP>:8022. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)\n* 14:35:31 INFO ipc.Client: Retrying connect to server: bdanode01.example.com/<PRIVATE_IP>:8022. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)\n* 14:35:32 INFO ipc.Client: Retrying connect to server: ......MILLISECONDS)\n* 14:35:34 FATAL ha.BootstrapStandby: Unable to fetch namespace information from active NN at bdanode01.example.com/<PRIVATE_IP>:8022: Call From bdanode01.example.com/<PRIVATE_IP> to bdaode01.example.com:8022 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused\ 14:35:34 INFO util.ExitUtil: Exiting with status 2\n* 14:35:34 INFO namenode.NameNode: SHUTDOWN_MSG: \n/********************************
****************************\nSHUTDOWN_MSG: Shutting down NameNode at bdanode01.example.com/<PRIVATE_IP>\n************************************************************/\n.",

A similar scenario can occur in BDA 5.2 and errors can look like:

Failed to bootstrap Standby NameNode NameNode
...
WARN org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby: Unable to fetch namespace information from remote NN at <ACTIVE_NAMENODE_HOST>.<DOMAIN>/<PRIVATE_IP_ACTIVE_NAMENODE_HOST>:8022:
Call From <STANDBY_NAMENODE_HOST>.<DOMAIN>/<PRIVATE_IP_STANDBY_NAMENODE_HOST> to <ACTIVE_NAMENODE_HOST>.<DOMAIN>:8022 failed on connection exception: java.net.ConnectException: Connection refused;

 

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.