DataNode Health Fails to Start with 'Block pool ID needed' and 'HDFS Partions already locked' Errors on a Cluster without On-Disk Encryption Enabled
(Doc ID 1671000.1)
Last updated on NOVEMBER 08, 2022
Applies to:
Big Data Appliance Integrated Software - Version 2.5.0 and laterLinux x86-64
Symptoms
On V2.5.0/V3.0 Oracle Big Data Appliance(BDA) CDH Cluster, DataNode(s) (DN) is in BAD Health.
Trying to restart the DN also fails with errors.
From /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-<BDANode>.log.out some errors noticed are:
<Timestamp> INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage /u0*/hadoop/dfs. The directory is already locked
<Timestamp> WARN org.apache.hadoop.hdfs.server.common.Storage: Ignoring storage directory /u0*/hadoop/dfs due to an exception
java.io.IOException: Cannot lock storage /u03/hadoop/dfs. The directory is already locked
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:634)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:457)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:152)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:916)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:887)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
at java.lang.Thread.run(Thread.java:724)
<Timestamp> FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool <registering> (storage id unknown) service to <BDANodex>/<NodeIP>:8022
java.io.IOException: All specified directories are not accessible or do not exist.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:183)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:916)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:887)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
at java.lang.Thread.run(Thread.java:724)
........
org.apache.hadoop.hdfs.server.datanode.DataNode Block pool ID needed, but service not yet registered with NN
java.lang.Exception: trace
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:856)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:617)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:835)
at java.lang.Thread.run(Thread.java:724)
<Timestamp> WARN org.apache.hadoop.hdfs.server.common.Storage: Ignoring storage directory /u0*/hadoop/dfs due to an exception
java.io.IOException: Cannot lock storage /u03/hadoop/dfs. The directory is already locked
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.lock(Storage.java:634)
at org.apache.hadoop.hdfs.server.common.Storage$StorageDirectory.analyzeStorage(Storage.java:457)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:152)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:916)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:887)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
at java.lang.Thread.run(Thread.java:724)
<Timestamp> FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for block pool Block pool <registering> (storage id unknown) service to <BDANodex>/<NodeIP>:8022
java.io.IOException: All specified directories are not accessible or do not exist.
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:183)
at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:219)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:916)
at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:887)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:311)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:218)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:660)
at java.lang.Thread.run(Thread.java:724)
........
org.apache.hadoop.hdfs.server.datanode.DataNode Block pool ID needed, but service not yet registered with NN
java.lang.Exception: trace
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.getBlockPoolId(BPOfferService.java:143)
at org.apache.hadoop.hdfs.server.datanode.DataNode.shutdownBlockPool(DataNode.java:856)
at org.apache.hadoop.hdfs.server.datanode.BPOfferService.shutdownActor(BPOfferService.java:350)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.cleanUp(BPServiceActor.java:617)
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:835)
at java.lang.Thread.run(Thread.java:724)
On-Disk Encryption is NOT enabled on the cluster.
# bdacli getinfo cluster_disk_encryption_enabled
false
false
Though On-Disk Encryption is not enabled, executing 'mount -l' command shows that HDFS Partitions are mounted using ecryptfs option.
/u<nn>/hadoop on /u<nn>/hadoop type ecryptfs (rw,ecryptfs_sig=53be2d8148c3f5d4,ecryptfs_unlink_sigs,ecryptfs_cipher=aes,ecryptfs_key_bytes=16)
Also trying to check Version details of HDFS data fails with errors:
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |