Typo in the field name in loaderMap Causes Parse Error (Doc ID 1456342.1)

Last updated on FEBRUARY 08, 2017

Applies to:

Oracle Loader for Hadoop - Version 1.1.0.0.0 and later
Linux x86-64

Symptoms

Running a job using Oracle Loader for Hadoop, the job completes however there are parse errors occurring during insert and thus NO data is inserted into Oracle table.

$  hadoop jar ${OLH_HOME}/jlib/oraloader.jar oracle.hadoop.loader.OraLoader -D mapred.reduce.tasks=2  -conf MyConf.xml
Oracle Loader for Hadoop Release 1.1.0.0.1 - Production

Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.

12/07/19 16:19:13 INFO loader.OraLoader: Oracle Loader for Hadoop Release 1.1.0.0.1 - Production

Copyright (c) 2011, Oracle and/or its affiliates. All rights reserved.
...............
.........
12/07/19 16:19:23 WARN sampler.DBSplitSampler: Skipping record: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"
12/07/19 16:19:23 WARN loader.OraLoader: Sampler error: need more samples for confident load balancing. Job will continue without sampled information.
12/07/19 16:19:23 INFO loader.OraLoader: Sampling time=0D:0h:0m:0s:0ms (601 ms)
12/07/19 16:19:23 INFO loader.OraLoader: Submitting OraLoader job OraLoader
12/07/19 16:19:34 INFO input.FileInputFormat: Total input paths to process : 1
12/07/19 16:19:36 INFO mapred.JobClient: Running job: job_201206270622_0127
12/07/19 16:19:37 INFO mapred.JobClient:  map 0% reduce 0%
12/07/19 16:19:42 INFO mapred.JobClient:  map 100% reduce 0%
12/07/19 16:19:50 INFO mapred.JobClient:  map 100% reduce 16%
12/07/19 16:19:51 INFO mapred.JobClient:  map 100% reduce 33%
12/07/19 16:19:52 INFO mapred.JobClient:  map 100% reduce 66%
12/07/19 16:19:53 INFO mapred.JobClient:  map 100% reduce 100%
12/07/19 16:19:54 INFO mapred.JobClient: Job complete: job_201206270622_0127
.........................
................
12/07/19 16:19:54 INFO mapred.JobClient:   Rows skipped by input error
12/07/19 16:19:54 INFO mapred.JobClient:     Parse Error=20
12/07/19 16:19:54 INFO mapred.JobClient:     Total rows skipped by input error=20
12/07/19 16:19:54 INFO mapred.JobClient:   Map-Reduce Framework
12/07/19 16:19:54 INFO mapred.JobClient:     Map input records=20
12/07/19 16:19:54 INFO mapred.JobClient:     Reduce shuffle bytes=32
12/07/19 16:19:54 INFO mapred.JobClient:     Spilled Records=0
12/07/19 16:19:54 INFO mapred.JobClient:     Map output bytes=0
12/07/19 16:19:54 INFO mapred.JobClient:     CPU time spent (ms)=4790
12/07/19 16:19:54 INFO mapred.JobClient:     Total committed heap usage (bytes)=2267283456
12/07/19 16:19:54 INFO mapred.JobClient:     Combine input records=0
12/07/19 16:19:54 INFO mapred.JobClient:     SPLIT_RAW_BYTES=133
12/07/19 16:19:54 INFO mapred.JobClient:     Reduce input records=0
12/07/19 16:19:54 INFO mapred.JobClient:     Reduce input groups=0
12/07/19 16:19:54 INFO mapred.JobClient:     Combine output records=0
12/07/19 16:19:54 INFO mapred.JobClient:     Physical memory (bytes) snapshot=833957888
12/07/19 16:19:54 INFO mapred.JobClient:     Reduce output records=0
12/07/19 16:19:54 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=4463357952
12/07/19 16:19:54 INFO mapred.JobClient:     Map output records=0

 

On Hadoop Cluster(CDH3) you can check MAP task logs to find out warnings and number of rows failed for the job Id reported in Oracle Loader output

1) Manually check for Warning/Error's in the hadoop logs (usually in /var/log/hadoop) directory for the job Id reported in Oracle Loader output

For example in /var/log/hadoop/userlogs/job_201206270622_0127/attempt_201206270622_0127_m_000000_0 you will find a syslog file which contains below warnings

2012-07-18 10:30:11,428 WARN oracle.hadoop.loader.OraLoaderMapper: skipping record...oracle.hadoop.loader.OraLoaderException: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"
2012-07-18 10:30:11,428 WARN oracle.hadoop.loader.OraLoaderMapper: skipping record...oracle.hadoop.loader.OraLoaderException: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"
2012-07-18 10:30:11,428 WARN oracle.hadoop.loader.OraLoaderMapper: skipping record...oracle.hadoop.loader.OraLoaderException: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"
2012-07-18 10:30:11,428 WARN oracle.hadoop.loader.OraLoaderMapper: skipping record...oracle.hadoop.loader.OraLoaderException: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"

OR

2) Open JobTracker Web UI , which generally runs on port 50030.

For ex:-

 http://<nodename>:50030/jobtracker.jsp

Click on the job Id

a) Check for the Rows Skipped

The Job Tracker UI shows the following detail for the job:

Rows skipped by input error Parse Error 20 0 20
Total rows skipped by input error 20 0 20

b) To navigate to the map task log follow below steps

i) Click on map

Kind% CompleteNum TasksPendingRunningCompleteKilledFailed/Killed
Task Attempts
map 100.00%
 
1 0 0 1 0 0 / 0
reduce 100.00%
 
1 0 0 1 0 0 / 0


ii) Now click on the task associated with the Map

TaskCompleteStatusStart TimeFinish TimeErrorsCounters
task_201206270622_0126_m_000000 100.00%
 
  19-Jul-2012 08:50:34 19-Jul-2012 08:50:36 (1sec)   15

iii) In the Task Attempts page click on ALL or 'Last 4KB' in 'Task Logs' column

Task AttemptsMaschineStatusProgressStart TimeFinish TimeErrorsTask LogsCountersActions
attempt_201206270622_0126_m_000000_0 /switch1/scaj01/****.us.oracle.com SUCCEEDED 100.00%
 
19-Jul-2012 08:50:34 19-Jul-2012 08:50:36 (1sec)   Last 4KB
Last 8KB
All
15
 

iv) Review the syslog for WARN associated with oracle.hadoop.loader.OraLoaderMapper in page similar to below url ...

http://*****.us.oracle.com:50060/tasklog?attemptid=attempt_201206270622_0126_m_000000_0&all=true

2012-07-19 08:50:36,232 WARN oracle.hadoop.loader.OraLoaderMapper: skipping record...oracle.hadoop.loader.OraLoaderException: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"
2012-07-19 08:50:36,233 WARN oracle.hadoop.loader.OraLoaderMapper: skipping record...oracle.hadoop.loader.OraLoaderException: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"
2012-07-19 08:50:36,233 WARN oracle.hadoop.loader.OraLoaderMapper: skipping record...oracle.hadoop.loader.OraLoaderException: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"
2012-07-19 08:50:36,233 WARN oracle.hadoop.loader.OraLoaderMapper: skipping record...oracle.hadoop.loader.OraLoaderException: field "FIRST_NAME" not found in input record with fields "F0", "F1", "F2", "F3", "F4"

 OR

 3) If using Cloudera Manager, follow below steps to find the syslog of a particular job.

Open Cloudera Manager Web UI , which generally runs on port 7180.

For ex:-

 http://<nodename>:7180/cmf/services/status

Here are steps to navigate to syslog file

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms