Oracle Big Data Appliance V3.0.1 and Higher Mammoth Install Step to Start Hadoop Services Fails - NameNodes report: NameNode is not formatted (Doc ID 1903785.1)

Last updated on JULY 20, 2017

Applies to:

Big Data Appliance Integrated Software - Version 3.0.1 and later
x86_64

Symptoms

1. Step 11 (Starting Hadoop Services) fails with:
 

************************************
Error [9910]: (//bdanode03//Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns) change from notrun to 0 failed: /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &>
/opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1404122025.out returned 1 instead of one of [0] at /opt/oracle/BDAMammoth/puppet/modules/hadoop/manifests/startsvc2.pp:350

 

2. /var/log/messages shows:

Jun 30 11:54:30 bdanode03 puppet-agent[10454]: (/Stage[main]/Hadoop::Startsvc2/File[get_mghanged '{md5}e114b446ffa8ccade4e82c4afa513189'
  to '{md5}c157b2a1f7dfd5aa9e639b3e13c167e1'
Jun 30 11:54:30 bdanode03 puppet-agent[10454]: (/Stage[main]/Hadoop::Startsvc2/File[setup_'{md5}12402b4408947daf124fa4773e4485ff'
  to '{md5}27c00c08ce2f5236118a89be91bc3d2c'
Jun 30 12:16:07 bdanode03 puppet-agent[10454]: (/Stage[main]/Hadoop::Startsvc2/Exec[setup_scm]/returns) change from notrun to 0 failed:
  /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm.sh &> /opt/oracle/BDAMammoth/bdaconfig/tmp/setupscm_1404122025.out returned 1 instead of one of [0] at
  /opt/oracle/BDAMammoth/puppet/modules/hadoop/manifests/startsvc2.pp:350
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_user_dir]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_user_dir]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_oracle_user_sub_dir]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_oracle_user_sub_dir]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_oracle_user_dir_ownership]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_oracle_user_dir_ownership]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_oozie_user_sub_dir]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[create_oozie_user_sub_dir]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_oozie_user_dir_ownership]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_oozie_user_dir_ownership]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_hive_user_dir_ownership]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[change_hive_user_dir_ownership]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[unzip_oozie_sharelib]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[unzip_oozie_sharelib]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[put_oozie_sharelib]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[post]/Hadoop::Postcdhstart/Exec[put_oozie_sharelib]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[finish]/Miscsetup::Endstep/Notify[end_step]) Dependency Exec[setup_scm] has failures: true
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: (/Stage[finish]/Miscsetup::Endstep/Notify[end_step]) Skipping because of failed dependencies
Jun 30 12:16:08 ed-bda-102 puppet-agent[10454]: Finished catalog run in 1300.96 seconds
Jun 30 12:16:09 ed-bda-102 puppet-agent[10454]: triggered run
Jun 30 12:16:12 ed-bda-102 puppet-agent[10454]: Finished catalog run in 0.04 seconds



3. In Cloudera Manager (CM) both hdfs and yarn service are in "Bad Health"

4. Further investigating the log files for either NameNode shows:  NameNode is not formatted

From CM:  hdfs > Instances > Any Role Type of NameNode > Select a NameNode > Log File

FATAL org.apache.hadoop.hdfs.server.namenode.NameNode Exception in namenode join java.io.IOException:
NameNode is not formatted. at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:216)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:880)
 at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:639)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:440)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:496)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:652)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:637)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1286)
 at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1352)

or From CM commands:

NN fails with: Supervisor returned FATAL. Please check the role log file, stderr, or stdout. due to NameNode is not formatted.

...
2015-03-05 05:45:25,063 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system shutdown complete.
2015-03-05 05:45:25,063 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:212)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1005)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:735)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:531)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:587)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:754)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:738)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1427)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1493)
2015-03-05 05:45:25,065 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2015-03-05 05:45:25,066 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at bdanode01.example.com/*.*.*.1
************************************************************/

 

5. The Log file for the JobHistory show:
From CM: yarn > Instances > Any Role Type of JobHistory Server > Select jobhistory > Log File

:00:09.672 PM WARN org.apache.hadoop.io.retry.RetryInvocationHandler Exception while invoking class
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo. Not retrying because failovers (15) exceeded
maximum allowed (15) java.net.ConnectException: Call From bdanode03.example.com/*.*.*.* to
bdanode01.example.com:8020 failed on connection exception: java.net.ConnectException:  Connection refused; For more details see:
...
com.sun.proxy.$Proxy12.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:699)
at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy13.getFileInfo(Unknown Source)
...
org.apache.hadoop.fs.FileContext$Util.exists(FileContext.java:1528)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.mkdir(HistoryFileManager.java:640)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.tryCreatingHistoryDirs(HistoryFileManager.java:570)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.createHistoryDirs(HistoryFileManager.java:533)
at org.apache.hadoop.mapreduce.v2.hs.HistoryFileManager.serviceInit(HistoryFileManager.java:501)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.mapreduce.v2.hs.JobHistory.serviceInit(JobHistory.java:94) at
...

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms