Using Distcp to Copy Data Between Two CDH Clusters Located in different DMZ Fails

(Doc ID 1627255.1)

Last updated on FEBRUARY 24, 2014

Applies to:

Big Data Appliance Integrated Software - Version 2.0.1 to 2.4.0 [Release 2.0 to 2.4]
Linux x86-64


Using distcp command to copy data from a source cluster which runs CDH4.2 (not BDA cluster) to a BDA cluster which runs on CDH4.4.  The source cluster is on a different DMZ than the BDA destination cluster and they are connected through admin network only.

The distcp command failed with the following error:

Lab_cluster# hadoop distcp -i -overwrite hdfs://cloudera1:8020/user/hive/warehouse/try hdfs://
14/01/13 16:28:10 INFO tools.DistCp: srcPaths=[hdfs://cloudera1:8020/user/hive/warehouse/try]
14/01/13 16:28:10 INFO tools.DistCp: destPath=hdfs://
14/01/13 16:28:17 INFO tools.DistCp: sourcePathsCount=3
14/01/13 16:28:17 INFO tools.DistCp: filesToCopyCount=2
14/01/13 16:28:17 INFO tools.DistCp: bytesToCopyCount=11.0m
14/01/13 16:28:17 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/01/13 16:32:30 INFO mapred.JobClient: Running job: job_201401130945_0049
14/01/13 16:32:31 INFO mapred.JobClient: map 0% reduce 0%
14/01/13 16:34:31 INFO mapred.JobClient: map 45% reduce 0%
14/01/13 16:38:49 INFO mapred.JobClient: map 100% reduce 0%
14/01/13 16:47:13 INFO mapred.JobClient: Task Id : attempt_201401130945_0049_m_000000_0, Status : FAILED Connection timed out
  at Method)
  at org.apache.hadoop.hdfs.DFSOutputStream.createSocketForPipeline(
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(
  at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(
  at org.apache.hadoop.hdfs.DFSOutputStream$



Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms