BDA 4.7 Mammoth Install of Multiple Clusters with Different Domain Names on the Same Rack Fails with "ssh: Could not resolve hostname echo: Name or service not known"
(Doc ID 2237878.1)
Last updated on FEBRUARY 21, 2019
Applies to:
Big Data Appliance Integrated Software - Version 4.7.0 and laterLinux x86-64
Symptoms
When installing two BDA 4.7 three node clusters on a starter rack, the first cluster installation is successful, but the second cluster installation fails at Step 2. The clusters are split such that cluster 1 consists of nodes 1,2,3 and cluster 2 consists of nodes 4,5,6. In this case a different domain name is used for each cluster.
Everything in the preinstall preview generated html is found to be correct.
1. The failure on the second cluster is at Step 2 and looks like:
INFO: Cluster not fully deployed yet.
Trying getting parameters from Json Configuration Files...
INFO: Cloudera Manager is not available - cannot get version number. Skipping running dumpcluster
INFO: Starting Big Data Appliance diagnose cluster at <TIMESTAMP>
INFO: Logging results to /tmp/bda_diagcluster_<ID>.log
ssh: Could not resolve hostname echo: Name or service not known^M
WARNING: Could not use passwordless SSH to
INFO: Please run "setup-root-ssh -C" to setup passwordless SSH to all cluster hosts
...
2. The error in the associated log file <HOSTNAME4>-<TIMESTAMP>.log (where <HOSTNAME4> is the first node on the cluster 2) shows a failure at Step 2 of:
INFO: Signing certificates from puppet agents. This will take some time ...
ERROR: Operation timed out
ERROR: Some certificates not signed or certificate requests missing
ERROR: Certificate request for node <HOSTNAME4>.<DOMAINNAME CLUSTER2> not received
ERROR: Certificate request for node <HOSTNAME5>.<DOMAINNAME CLUSTER2> not received
ERROR: Certificate request for node <HOSTNAME6>.<DOMAINNAME CLUSTER2> not received
...
3. The cause of the above errors is that the /etc/hosts on cluster 2 is incorrect. Under:
PUBLIC HOSTNAMES
nodes 4,5,6 are listed with the wrong domain. They have the domain of the first cluster, not the second.
4. It is also found that the cluster 2 <CLUSTERNAME2>-consolidated_config.json file has in its "DOMAIN" property, the domain of the first cluster not the second.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |