BDA 4.7 Mammoth Install of Multiple Clusters with Different Domain Names on the Same Rack Fails with "ssh: Could not resolve hostname echo: Name or service not known" (Doc ID 2237878.1)

Last updated on FEBRUARY 26, 2017

Applies to:

Big Data Appliance Integrated Software - Version 4.7.0 and later
Linux x86-64

Symptoms


When installing two BDA 4.7 three node clusters on a starter rack, the first cluster installation is successful, but the second cluster installation fails at Step 2. The clusters are split such that cluster 1 consists of nodes 1,2,3 and cluster 2 consists of nodes 4,5,6. In this case a different domain name is used for each cluster.

Everything in the preinstall preview generated html is found to be correct.

1. The failure on the second cluster is at Step 2 and looks like:

...
INFO: Cluster not fully deployed yet.
Trying getting parameters from Json Configuration Files...
INFO: Cloudera Manager is not available - cannot get version number. Skipping running dumpcluster
INFO: Starting Big Data Appliance diagnose cluster at Tue Feb 21 16:54:36 2017
INFO: Logging results to /tmp/bda_diagcluster_<id>.log
ssh: Could not resolve hostname echo: Name or service not known^M
WARNING: Could not use passwordless SSH to
INFO: Please run "setup-root-ssh -C" to setup passwordless SSH to all cluster hosts
...

2. The error in the associated log file bdanode04-<timestamp>.log (where bdanode04 is the first node on the cluster 2) shows a failure at Step 2 of:

...
INFO: Signing certificates from puppet agents. This will take some time ...
ERROR: Operation timed out
ERROR: Some certificates not signed or certificate requests missing
ERROR: Certificate request for node bdanode04.<domain cluster 2>.com not received
ERROR: Certificate request for node bdanode05.<domain cluster 2>.com not received
ERROR: Certificate request for node bdanode06.<domain cluster 2>.com not received
...

3. The cause of the above errors is that the /etc/hosts on cluster 2 is incorrect. Under:

PUBLIC HOSTNAMES

nodes 4,5,6 are listed with the wrong domain. They have the domain of the first cluster, not the second.

4. It is also found that the cluster 2 <cluster2_name>-consolidated_config.json file has in its "DOMAIN" property, the domain of the first cluster not the second.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms