My Oracle Support Banner

Mammoth Migration StartHadoopServices Puppet Actions Loop Continuously Causing Cloudera Manager Services to Restart Over and Over (Doc ID 2861322.1)

Last updated on JULY 20, 2024

Applies to:

Big Data Appliance Integrated Software - Version 4.13.0 and later
Linux x86-64

Symptoms

NOTE: In the examples that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

Performing a mammoth migration for the host with Cloudera Manager (CM) role, Node 3 by default, following:
Node 3 Migration and Recommission on Oracle Big Data Appliance V4.11 and Higher (Doc ID 2524859.1)
or
Node 3 Migration and Reprovision on Oracle Big Data Appliance V4.1 OL6 Hadoop Cluster to Manage a Hardware Failure (Doc ID 1984854.1)

fails in Step 4, StartHadoopServices, with errors like below indicating that CM services are being restarted and that the puppet agent on the target host timed out.

Restarting Cloudera Manager Services, if needed. This will take some time ...
...
************************************
Error [71173]: (//<TARGET_HOSTNAME3>.<DOMAIN>//Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns) change from notrun to 0 failed:
/opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &>
/opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_<TIMESTAMP>.out returned 1 instead of one of [0]
************************************
...
ERROR: Puppet script had errors.
ERROR: Puppet agent on node <TARGET_HOSTNAME3> is not reachable

Further investigation on the target host with the commands below shows that setupscm.sh is continuously running.

Note: In the case here setupscm.sh is continuously running.  Another script could experience the same issue. It is also the case that mammoth actions other than migration can encounter a similar scenario.

1. setupscm.sh is the script which starts hadoop services in CM. The script is being continuously executed. This causes CM services to be restarted over and over.

2. Following, Troubleshooting Mammoth Errors: "ERROR: Puppet agent on node bdanode0x is not reachable" (Doc ID 2164227.1), and killing the running process does not resolve the problem. The script setupscm.sh continues to run.

3. On the Mammoth node, checking /opt/oracle/BDAMammoth/puppet/manifests/site.pp, shows it contains cached puppet commands which invoke the script continuously.

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.