Hadoop Archive Creation Fails With Error "Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#6" (Doc ID 2141372.1)

Last updated on MAY 29, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.4.0 and later
Linux x86-64

Symptoms

Hadoop Archive creation fails with errors:

bash-4.1$ hadoop archive -archiveName test.har -p /user/<user>/QA/ /user/<user>
16/05/18 16:23:52 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 1300 for <user> on <cluster name>-cluster-ns
16/05/18 16:23:52 INFO security.TokenCache: Got dt for hdfs://<cluser name>; Kind: HDFS_DELEGATION_TOKEN, Service: <cluster name>-cluster-ns, Ident: (HDFS_DELEGATION_TOKEN token 1300 for <user>)
16/05/18 16:23:53 INFO mapreduce.JobSubmitter: number of splits:1
16/05/18 16:23:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1463179478788_0158
16/05/18 16:23:53 INFO mapreduce.JobSubmitter: Kind: HDFS_DELEGATION_TOKEN, Service: <cluster name>-cluster-ns, Ident: (HDFS_DELEGATION_TOKEN token 1300 for edirdgfbdarwuat)
16/05/18 16:23:54 INFO impl.YarnClientImpl: Submitted application application_1463179478788_0158
16/05/18 16:23:54 INFO mapreduce.Job: The url to track the job: http://bdanode3.example.com:8088/proxy/application_1463179478788_0158/
16/05/18 16:23:54 INFO mapreduce.Job: Running job: job_1463179478788_0158
16/05/18 16:24:01 INFO mapreduce.Job: Job job_1463179478788_0158 running in uber mode : false
16/05/18 16:24:01 INFO mapreduce.Job: map 0% reduce 0%
16/05/18 16:24:08 INFO mapreduce.Job: map 100% reduce 0%
16/05/18 16:24:14 INFO mapreduce.Job: Task Id : attempt_1463179478788_0158_r_000000_0, Status : FAILED
Error: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in fetcher#6
  at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134)
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376)
  at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:422)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
  at org.apache.hadoop.mapred.YarnChild.main(YarnCh
...

 

The NodeManager logs showed the following error:

ERROR org.apache.hadoop.yarn.logaggregation.AggregatedLogFormat: Error aggregating log file. Log file : uid not found: 23551
java.io.IOException: uid not found: 23551 at org.apache.io.nativeio.NativeIO$POSIX.getUserName(Native Method)

This correlates to the following c code from: ./hadoop-common-project/hadoop-common/src/main/native/src/org/apache/hadoop/io/nativeio/NativeIO.c

641 // Grab username
642 struct passwd pwd, *pwdp;
643 while ((rc = getpwuid_r((uid_t)uid, &pwd, pw_buf, pw_buflen, &pwdp)) != 0) {
644 if (rc != ERANGE) {
645 throw_ioe(env, rc);
646 goto cleanup;
647 }

 

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms