Using Distcp to Copy Data Between Two CDH Clusters Located in different DMZ Fails (Doc ID 1627255.1)

Last updated on FEBRUARY 24, 2014

Applies to:

Big Data Appliance Integrated Software - Version 2.0.1 to 2.4.0 [Release 2.0 to 2.4]
Linux x86-64

Symptoms

Using distcp command to copy data from a source cluster which runs CDH4.2 (not BDA cluster) to a BDA cluster which runs on CDH4.4.  The source cluster is on a different DMZ than the BDA destination cluster and they are connected through admin network only.


The distcp command failed with the following error:

Lab_cluster# hadoop distcp -i -overwrite hdfs://cloudera1:8020/user/hive/warehouse/try hdfs://xx.xx.xxx.xx:8020/user/hive/warehouse/DEV_Pac
14/01/13 16:28:10 INFO tools.DistCp: srcPaths=[hdfs://cloudera1:8020/user/hive/warehouse/try]
14/01/13 16:28:10 INFO tools.DistCp: destPath=hdfs://xx.xx.xxx.xx:8020/user/hive/warehouse/DEV_Pac
14/01/13 16:28:17 INFO tools.DistCp: sourcePathsCount=3
14/01/13 16:28:17 INFO tools.DistCp: filesToCopyCount=2
14/01/13 16:28:17 INFO tools.DistCp: bytesToCopyCount=11.0m
14/01/13 16:28:17 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/13 16:32:30 INFO mapred.JobClient: Running job: job_201401130945_0049
14/01/13 16:32:31 INFO mapred.JobClient: map 0% reduce 0%
14/01/13 16:34:31 INFO mapred.JobClient: map 45% reduce 0%
14/01/13 16:38:49 INFO mapred.JobClient: map 100% reduce 0%
14/01/13 16:47:13 INFO mapred.JobClient: Task Id : attempt_201401130945_0049_m_000000_0, Status : FAILED
java.net.ConnectException: Connection timed out
  at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
  at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)
  at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:207)
  at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:525)
  at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(DFSOutputStream.java:1227)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1053)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1013)
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463)

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms