My Oracle Support Banner

Oracle Big Data Appliance V3.0.1 and Higher Mammoth Install Step to Start Hadoop Services Fails - NameNodes report: NameNode is not formatted (Doc ID 1903785.1)

Last updated on NOVEMBER 05, 2019

Applies to:

Big Data Appliance Integrated Software - Version 3.0.1 and later
x86_64

Symptoms

 

NOTE: In the examples that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

  

1. Step 11 (Starting Hadoop Services) fails with:
 

************************************
Error [9910]: (//bdanode03//Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns) change from notrun to 0 failed: /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &>
/opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1404122025.out returned 1 instead of one of [0] at /opt/oracle/BDAMammoth/puppet/modules/hadoop/manifests/startsvc2.pp:350

 

2. /var/log/messages shows:

Jun 30 11:54:30 bdanode03 puppet-agent[10454]: (/Stage[main]/Hadoop::Startsvc2/File[get_mghanged '{md5}<MD5>' to '{md5}<MD5>'
Jun 30 11:54:30 bdanode03 puppet-agent[10454]: (/Stage[main]/Hadoop::Startsvc2/File[setup_'{md5}<MD5>' to '{md5}<MD5>'
Jun 30 12:16:07 bdanode03 puppet-agent[10454]: (/Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns) change from notrun to 0 failed:
  /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &> /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1404122025.out returned 1 instead of one of [0] at
  /opt/oracle/BDAMammoth/puppet/modules/hadoop/manifests/startsvc2.pp:350
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_user_dir]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_user_dir]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_oracle_user_sub_dir]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_oracle_user_sub_dir]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_oracle_user_dir_ownership]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_oracle_user_dir_ownership]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_oozie_user_sub_dir]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_oozie_user_sub_dir]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_oozie_user_dir_ownership]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_oozie_user_dir_ownership]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_hive_user_dir_ownership]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_hive_user_dir_ownership]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[unzip_oozie_sharelib]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[unzip_oozie_sharelib]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[put_oozie_sharelib]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[put_oozie_sharelib]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[finish]/Miscsetup::Endstep/Notify[end_step]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[finish]/Miscsetup::Endstep/Notify[end_step]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: Finished catalog run in 1300.96 seconds
Jun 30 12:16:09 ed-bda-102 puppet-agent[10454]: triggered run
Jun 30 12:16:12 ed-bda-102 puppet-agent[10454]: Finished catalog run in 0.04 seconds



3. In Cloudera Manager (CM) both hdfs and yarn service are in "Bad Health"

4. Further investigating the log files for either NameNode shows:  NameNode is not formatted

From CM:  hdfs > Instances > Any Role Type of NameNode > Select a NameNode > Log File

FATAL org.apache.hadoop.hdfs.server.namenode.NameNode Exception in namenode join java.io.IOException:
NameNode is not formatted. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:216)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:880)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:639)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:440)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:496)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:652)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:637)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1286)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1352)

or From CM commands:

NN fails with: Supervisor returned FATAL. Please check the role log file, stderr, or stdout. due to NameNode is not formatted.

...
2015-03-05 05:45:25,063 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2015-03-05 05:45:25,063 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:212)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1005)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:735)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:531)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:587)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:754)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:738)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1427)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1493)
2015-03-05 05:45:25,065 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2015-03-05 05:45:25,066 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bdanode01.example.com/<PRIVATE_IP_HOST1>
************************************************************/

 

5. The Log file for the JobHistory show:
From CM: yarn > Instances > Any Role Type of JobHistory Server > Select jobhistory > Log File

:00:09.672 PM WARN org.apache.hadoop.io.retry.RetryInvocationHandler Exception while invoking class
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo. Not retrying because failovers (15) exceeded
maximum allowed (15) java.net.ConnectException: Call From bdanode03.example.com/<IP_ADDRESS> to
bdanode01.example.com:8020 failed on connection exception: java.net.ConnectException:  Connection refused; For more details see:
...
com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:699)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
...
org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1528)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:640)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.tryCreatingHistoryDirs(HistoryFileManager.java:570)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.createHistoryDirs(HistoryFileManager.java:533)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:501)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at
...

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
 Background
 Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.