After Crawler The Log Files Misses To Identify The Documents That Have Conversion Failures
Last updated on SEPTEMBER 21, 2017
Applies to:Oracle Secure Enterprise Search - Version 220.127.116.11.0 to 18.104.22.168.0
Information in this document applies to any platform.
This documents addresses the need to rely on the crawler log-entries in order to resolve problems reported in the Crawler Summary Page.
In particular conversion failures are reported when documents can not be converted to HTML files, examples of such files are binary files, images, formats that are no known from specific applications we don't have filters for, etc.
The summary of metrics found in the crawler log at end of process show the following info:
========== Crawling results ===================
Crawling started at 2/23/15 9:07 AM
Crawling stopped at 2/24/15 6:21 AM
Total number of documents discovered = 498,452
Total number of documents updated = 498,424
Total number of documents deleted = 467
Total number of documents excluded = 0
Total number of documents with processing errors = 0
Total number of documents with conversion failures = 1,612 (*)
(*) Conversion failures metric used to include both attached documents and embedded documents (content in a zip or jar file)
Problem was the crawler log file missed to identify which specific attached documents or which specific embedded files had the conversion failure, the log entries always referenced only the parent document.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms