My Oracle Support Banner

BDA 4.7 Mammoth Install of Multiple Clusters with Different Domain Names on the Same Rack Fails with "ssh: Could not resolve hostname echo: Name or service not known" (Doc ID 2237878.1)

Last updated on JUNE 08, 2018

Applies to:

Big Data Appliance Integrated Software - Version 4.7.0 and later
Linux x86-64

Symptoms


When installing two BDA 4.7 three node clusters on a starter rack, the first cluster installation is successful, but the second cluster installation fails at Step 2. The clusters are split such that cluster 1 consists of nodes 1,2,3 and cluster 2 consists of nodes 4,5,6. In this case a different domain name is used for each cluster.

Everything in the preinstall preview generated html is found to be correct.

1. The failure on the second cluster is at Step 2 and looks like:

...
INFO: Cluster not fully deployed yet.
Trying getting parameters from Json Configuration Files...
INFO: Cloudera Manager is not available - cannot get version number. Skipping running dumpcluster
INFO: Starting Big Data Appliance diagnose cluster at Tue Feb 21 16:54:36 2017
INFO: Logging results to /tmp/bda_diagcluster_<id>.log
ssh: Could not resolve hostname echo: Name or service not known^M
WARNING: Could not use passwordless SSH to
INFO: Please run "setup-root-ssh -C" to setup passwordless SSH to all cluster hosts
...

2. The error in the associated log file bdanode04-<timestamp>.log (where bdanode04 is the first node on the cluster 2) shows a failure at Step 2 of:

...
INFO: Signing certificates from puppet agents. This will take some time ...
ERROR: Operation timed out
ERROR: Some certificates not signed or certificate requests missing
ERROR: Certificate request for node bdanode04.<domain cluster 2>.com not received
ERROR: Certificate request for node bdanode05.<domain cluster 2>.com not received
ERROR: Certificate request for node bdanode06.<domain cluster 2>.com not received
...

3. The cause of the above errors is that the /etc/hosts on cluster 2 is incorrect. Under:

PUBLIC HOSTNAMES

nodes 4,5,6 are listed with the wrong domain. They have the domain of the first cluster, not the second.

4. It is also found that the cluster 2 <cluster2_name>-consolidated_config.json file has in its "DOMAIN" property, the domain of the first cluster not the second.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.