After Cloudera Manager dfs.namenode.acls.enabled Property Change Cluster Fails to Start, No DataNode Logs Generated Since: BPServiceActorActionException: Failed to report bad block on the BDA
(Doc ID 2061212.1)
Last updated on JULY 19, 2022
Applies to:
Big Data Appliance Integrated Software - Version 4.2.0 and laterLinux x86-64
Symptoms
1. After updating the: NameNode's hdfs-site.xml safety valve, NameNode Advanced Configuration Snippet (Safety Valve) for hdfs-site.xml, (Navigate in Cloudera Manager (CM) > Home > hdfs > Configuration > Search for: NameNode Advanced Configuration Snippet):
with:
<name>dfs.namenode.acls.enabled</name>
<value>true</value>
</property>
restarting the cluster fails; the cluster will not come back up.
2. Navigating to CM > Running Commands, shows:
Completed 1 of 6 steps.
Execute command ZkStartPreservingDatastore on service zookeeper
Failed to execute command Start on service zookeeper
...
3. Reverting the above change, still does not allow the cluster to come up. In particular, DataNode 3, the Cloudera Manager Node, will not start.
4. On Node 3, the Cloudera Manager Node, the DataNode log and JournalNode log show the errors below:
a) DataNode: /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-bdanode03.example.com.log.out:
bdanode01.example.com/<PRIVATE_IP_HOST1>:8020
org.apache.hadoop.hdfs.server.datanode.BPServiceActorActionException: Failed to report bad block
BP-<BLOCK_ID>-<PRIVATE_IP_HOST1>-<ID>:blk_<BLOCK_ID> to namenode:
at org.apache.hadoop.hdfs.server.datanode.ReportBadBlockAction.reportTo(ReportBadBlockAction.java:63)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.processQueueMessages(BPServiceActor.java:1025)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:771)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:861)
at java.lang.Thread.run(Thread.java:745) 2:49:31.886 PM ERROR org.apache.hadoop.hdfs.server.datanode.DataNode
RECEIVED SIGNAL 15: SIGTERM 2:49:31.891
PM INFO org.apache.hadoop.hdfs.server.datanode.DataNode SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at bdanode03.example.com/<PRIVATE_IP_HOST3>
b) JournalNode: /var/log/hadoop-hdfs/hadoop-cmf-hdfs-JOURNALNODE-bdanode01.example.com.log.out:
Finalizing edits file /opt/hadoop/dfs/jn/<CLUSTER_NAME>-journal/current/edits_inprogress_0000000000126045315 ->
/opt/hadoop/dfs/jn/<CLUSTER_NAME>-journal/current/edits_0000000000126045315-0000000000126045576
2015-09-28 14:49:33,547 ERROR org.apache.hadoop.hdfs.qjournal.server.JournalNode: RECEIVED SIGNAL 15:
SIGTERM 2015-09-28 14:49:33,549 INFO org.apache.hadoop.hdfs.qjournal.server.JournalNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down JournalNode at bdanode03.example.com/<PRIVATE_IP_HOST3>
************************************************************/
5. The DataNode for Node 3 is decommissioned, and 'hadoop fsck /' shows the many associated missing/corrupt blocks:
6. On Node 3 it is observed that no updates are being made to the DataNode log file.
a) The timestamp of the DataNode log file in /var/log/hadoop-hdfs shows no updates made for several hours.
b) Running "tail -f" on the DataNode log file in /var/log/hadoop-hdfs, and restarting the DataNode in CM shows no updates being made to the DataNode log file.
7. The status of the DataNode on Node 3 and other nodes in CM shows a long timeout when restarting.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |