My Oracle Support Banner

A BDA DataNode Fails to Start - Can Not Bind to DataNode Port as it is Already in Use - No Log Files Exist (Doc ID 2042065.1)

Last updated on OCTOBER 09, 2018

Applies to:

Big Data Appliance Integrated Software - Version 4.2.0 to 4.4.0 [Release 4.2 to 4.4]
Linux x86-64


A DataNode (DN) can not start.  This leads to hdfs being in "bad" health.

Note: This can also happen with any DataNode port for example, depending on version, port 1006 which is also a DataNode port, and port 50010, etc.  The example here uses DataNode port 1004.

Using an example of DataNode port 1004:

/var/log/hadoop-hdfs/jsvc.err shows that the process can not bind to the DataNode port 1004 ( as the port is already in use. Note however, that there is no other obvious logging on the system to indicate why the DN is failing to come up.

1. CM reports a DataNode failing to start, bringing hdfs into "bad" health.

2. Initially no obvious logging is found to explain the problem.

a) No updates at all are being made to the DataNode log file at: /var/log/hadoop-hdfs/hadoop-cmf-hdfs-DATANODE-<FQDN>-log.out.

i. The latest update to that log is from the initial time the DataNode went down.  There is nothing from subsequent restarts.  The initial error shows a lack of connectivity
between the DataNode and NameNode like: End of File Exception between local host is: "<FQDN-DN>/<IB IP>"; destination host is: "<FQDN-NN>":8022; :; For more details see:
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(
at java.lang.reflect.Constructor.newInstance(
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(
at com.sun.proxy.$Proxy16.sendHeartbeat(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(
at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(
Caused by:
at org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(
at org.apache.hadoop.ipc.Client$
2015-07-28 12:05:17,708 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2015-07-28 12:05:17,710 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down DataNode at <FQDN-DN>/<IB IP>

ii.  The DataNode log can also show output like:

2018-04-03 04:41:15,059 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /*.*.*.1:29720, dest:
/*.*.*.x:1004, bytes: 91, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-62235432_1, offset: 0, srvID: ***, blockid: BP-1955944501-*.*.*.1-**:blk_1073782646_41857, duration: 904198242477

2018-04-03 04:41:15,059 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1955944501-*.*.*.1-
**:blk_1073782646_41857, type=HAS_DOWNSTREAM_IN_PIPELINE terminating
2018-04-03 04:41:23,816 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: RECEIVED SIGNAL 15: SIGTERM
2018-04-03 04:41:23,819 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
SHUTDOWN_MSG: Shutting down DataNode at*.*.*.x

b) The associated agent logs are updated,  /var/run/cloudera-scm-agent/process/<latest>-hdfs-DATANODE, but nothing sheds light on the inability to start up.

c) The Cloudera Manager(CM) agent logs at /var/log/cloudera-scm-agent do not indicate a problem either.

d) The CM Host Inspector does not show a problem.

3. The output in /var/log/hadoop-hdfs/jsvc.err shows the DN can not start because it can not bind to port 1004 as it is in use:

Initializing secure datanode resources
Opened streaming server at / Address already in use
    at Method)
    at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.getSecureResources(
    at org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.init(
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(
    at java.lang.reflect.Method.invoke(
Cannot load daemon
Service exit with a return value of 3

4. After the failure to start the port 1004 appears to be free:

a) "netstat -pan | grep 1004" does not return anything for port 1004.


The same can be the case with port 1006, port 50010, or any DataNode port.


To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!

In this Document
 Verify that Unmounting an NFS Mount Point Will Not Impact the Cloudera Services on the Host with the Failed DataNode
 Detailed Steps to Unmount NFS Mounts Occupying DataNode Ports

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.