A BDA DataNode Fails to Start - Can Not Bind to DataNode Port as it is Already in Use - No Log Files Exist
(Doc ID 2042065.1)
Last updated on FEBRUARY 06, 2022
Applies to:
Big Data Appliance Integrated Software - Version 4.2.0 to 4.4.0 [Release 4.2 to 4.4]Linux x86-64
Symptoms
A DataNode (DN) can not start. This leads to hdfs being in "bad" health. This can happen during a Mammoth action as well when the services are restarted.
Using an example of DataNode port 1004:
/var/log/hadoop-hdfs/jsvc.err shows that the process can not bind to the DataNode port 1004 (0.0.0.0:1004) as the port is already in use. Note however, that there is no other obvious logging on the system to indicate why the DN is failing to come up.
Additional Symptoms look like:
1. CM reports a DataNode failing to start, bringing hdfs into "bad" health.
2. Initially no obvious logging is found to explain the problem.
a) No updates at all are being made to the DataNode log file at: /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-<FQDN>-log.out.
Restarting the DataNode in Cloudera Manager while tailing the associated DataNode log, hadoop-cmf-hdfs-DATANODE-<HOSTNAME>.<DOMAIN>.log.out with ,"tail -f" as below shows no updates at all being made to the DataNode log.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
No Custom Mount Points Set Up |
Custom Mount Points Set Up |
Verify that Unmounting an NFS Mount Point Will Not Impact the Cloudera Services on the Host with the Failed DataNode |
Detailed Steps to Unmount if Custom NFS Mounts Occupying DataNode Ports |
Stop the NFS Service |
Reboot the Server |