My Oracle Support Banner

After Cloudera Manager dfs.namenode.acls.enabled Property Change Cluster Fails to Start, No DataNode Logs Generated Since: BPServiceActorActionException: Failed to report bad block on the BDA (Doc ID 2061212.1)

Last updated on JULY 19, 2022

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 and later
Linux x86-64

Symptoms

NOTE: In the examples that follow, user details, cluster names, hostnames, directory paths, filenames, etc. represent a fictitious sample (and are used to provide an illustrative example only). Any similarity to actual persons, or entities, living or dead, is purely coincidental and not intended in any manner.

  

 1. After updating the: NameNode's hdfs-site.xml safety valve, NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml, (Navigate in Cloudera Manager (CM) > Home > hdfs > Configuration > Search for: NameNode Advanced Configuration Snippet):

with:

<property>
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>


restarting the cluster fails; the cluster will not come back up.

2. Navigating to CM > Running Commands, shows:

Command Progress
Completed 1 of 6 steps.

Execute command ZkStartPreservingDatastore on service zookeeper
Failed to execute command Start on service zookeeper
...


3. Reverting the above change, still does not allow the cluster to come up.  In particular, DataNode 3, the Cloudera Manager Node, will not start.

4. On Node 3, the Cloudera Manager Node, the DataNode log and JournalNode log show the errors below:

a) DataNode: /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-bdanode03.example.com.log.out:

Failed to report bad block BP-<ID>:blk_<ID> to namenode:
bdanode01.example.com/<PRIVATE_IP_HOST1>:8020
org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block
BP-<BLOCK_ID>-<PRIVATE_IP_HOST1>-<ID>:blk_<BLOCK_ID> to namenode:
at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1025)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:771)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:861)
at java.lang.Thread.run(Thread.java:745) 2:49:31.886 PM ERROR org.apache.hadoop.hdfs.server.datanode.DataNode
RECEIVED SIGNAL 15: SIGTERM 2:49:31.891
PM INFO org.apache.hadoop.hdfs.server.datanode.DataNode SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at bdanode03.example.com/<PRIVATE_IP_HOST3>



b) JournalNode: /var/log/hadoop-hdfs/hadoop-cmf-hdfs-JOURNALNODE-bdanode01.example.com.log.out:

2015-09-28 14:47:58,939 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager:
Finalizing edits file /opt/hadoop/dfs/jn/<CLUSTER_NAME>-journal/current/edits_inprogress_0000000000126045315 ->
/opt/hadoop/dfs/jn/<CLUSTER_NAME>-journal/current/edits_0000000000126045315-0000000000126045576
2015-09-28 14:49:33,547 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: RECEIVED SIGNAL 15:
SIGTERM 2015-09-28 14:49:33,549 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down JournalNode at bdanode03.example.com/<PRIVATE_IP_HOST3>
************************************************************/



5. The DataNode for Node 3 is decommissioned, and 'hadoop fsck /' shows the many associated missing/corrupt blocks: 


6.  On Node 3 it is observed that no updates are being made to the DataNode log file.

a) The timestamp of the DataNode log file in /var/log/hadoop-hdfs shows no updates made for several hours.

b) Running "tail -f" on the DataNode log file in /var/log/hadoop-hdfs, and restarting the DataNode in CM shows no updates being made to the DataNode log file.

7. The status of the DataNode on Node 3 and other nodes in CM shows a long timeout when restarting.  


Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.