Oozie Jobs Fail with JA017 (Doc ID 2130207.1)

Last updated on MAY 09, 2016

Applies to:

Big Data Appliance Integrated Software - Version 4.4.0 and later
Linux x86-64

Symptoms

Oozie jobs can fail with a JA017 error trying to access Yarn Job History Server with incorrect permissions.  The symptom observed is that Oozie jobs intermittently or continuously fail with the following error returned from the workflow in the Oozie launcher job:

JA017: Unknown hadoop job [job_#_#] associated with action [#-#-oozie-oozi-W@JobName]. Failing this action!

This can show up on the BDA when running the oozie cluster validation checks.

Below is an example from the Oozie cluster validation check run by "./mammoth -c" or run by the Oozie test from that suite standalone:

# cat ooziewf_test.out
Running oozie test workflow which includes a map-reduce step, a sqoop step, a hive step, a streaming step and a pig step
existing file removed
local temp dir created
staring oozie workflow

oozie job ID is: 0000000-160422185532921-oozie-oozi-W
oozie job runing...
oozie job runing...
oozie job runing...

Job ID : 0000000-160422185532921-oozie-oozi-W
------------------------------------------------------------------------------------------------------------------------------------
Workflow Name : combine-wf
App Path : hdfs://<cluster name>-ns/user/oracle/oozie-example/apps/combine
Status : FAILED
Run : 0
User : oracle
Group : -
Created : 2016-04-22 23:02 GMT
Started : 2016-04-22 23:02 GMT
Last Modified : 2016-04-22 23:02 GMT
Ended : 2016-04-22 23:02 GMT
CoordAction ID: -

Actions
------------------------------------------------------------------------------------------------------------------------------------
ID Status Ext ID Ext Status Err Code
------------------------------------------------------------------------------------------------------------------------------------
0000000-160422185532921-oozie-oozi-W@:start: OK - OK -
------------------------------------------------------------------------------------------------------------------------------------
0000000-160422185532921-oozie-oozi-W@pig-node FAILED job_1461365710249_0001 FAILED JA017
-------------------------------------------------------------------------------------------------------
0000000-160422185532921-oozie-oozi-W@cleanup-node OK -

Further checking the failed job e.g.  job_1461365710249_0001 in the Job History browser shows the error stack in detail.

JOB[0000003-160422185532921-oozie-oozi-W] ACTION[0000003-160422185532921-oozie-oozi-W@pig-node]
Exception in check(). Message[JA017: Could not lookup launched hadoop Job ID [job_1461365710249_0014]
which was associated with action [0000003-160422185532921-oozie-oozi-W@pig-node]. Failing this action!]

org.apache.oozie.action.ActionExecutorException: JA017: Could not lookup launched hadoop Job ID [job_1461365710249_0014]
which was associated with action [0000003-160422185532921-oozie-oozi-W@pig-node]. Failing this action!
at org.apache.oozie.action.hadoop.JavaActionExecutor.check(JavaActionExecutor.java:1274)
at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:182)
at org.apache.oozie.command.wf.ActionCheckXCommand.execute(ActionCheckXCommand.java:56)
at org.apache.oozie.command.XCommand.call(XCommand.java:286)
at org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:175)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms