When using WebCenter Content 11.1.1.9.0, and OracleTextSearch to Index a PDF file of Version 1.3 (Acrobat 4.x) the word 'nobody' gets Split Apart into 'nob' 'ody' Instead of Remaining Whole (Doc ID 2025531.1)

Last updated on JANUARY 27, 2017

Applies to:

Oracle WebCenter Content - Version 11.1.1.9.0 and later
Information in this document applies to any platform.

Symptoms

NOTE:  If you are running 11.1.1.9.0 of WebCenter Content, you need to upgrade to Bundle Patch 2 immediately!  This is a required upgrade and it will take you to 11.1.1.9.2!

 

On : 11.1.1.9.0 version, Content Server,

ACTUAL BEHAVIOR
---------------
When checking in a PDF file of version 1.3 (Acrobat 4.x), containing the word 'nobody', the word gets indexed as 'no' 'body' instead of the whole word.

EXPECTED BEHAVIOR
-----------------------
It is expected that if the PDF searching tool can select and find the word 'nobody' in the original document, that the extracted and indexed text be also indexed as the word 'nobody'

STEPS
-----------------------
The issue can be reproduced at will with the following steps:


1. Create a test PDF file containing the searchable word 'nobody'
2. Verify that the PDF viewer can search for an find the word 'nobody' in the document
3. Check the document into the content server with index tracing on level trace
4. Once released, see that searching for the word 'nobody' does not return the document
5. Looking at the text extracted from the file it is seen that the word 'nobody' was extracted as 'no' 'body'

BUSINESS IMPACT
-----------------------
The issue has the following business impact:
Due to this issue, WebCenter content 11.1.1.9.0 cannot be rolled out. If the text they are searching for has not been indexed properly, users will never find the content they checked in causing massive delays in their daily tasks.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms