My Oracle Support Banner

When using WebCenter Content, and OracleTextSearch to Index a PDF file of Version 1.3 (Acrobat 4.x) the word 'nobody' gets Split Apart into 'nob' 'ody' Instead of Remaining Whole (Doc ID 2025531.1)

Last updated on FEBRUARY 16, 2023

Applies to:

Oracle WebCenter Content - Version and later
Information in this document applies to any platform.


NOTE:  If you are running of WebCenter Content, you need to upgrade to Bundle Patch 2 immediately!  This is a required upgrade and it will take you to!


On : version, Content Server,

When checking in a PDF file of version 1.3 (Acrobat 4.x), containing the word 'nobody', the word gets indexed as 'no' 'body' instead of the whole word.

It is expected that if the PDF searching tool can select and find the word 'nobody' in the original document, that the extracted and indexed text be also indexed as the word 'nobody'

The issue can be reproduced at will with the following steps:

1. Create a test PDF file containing the searchable word 'nobody'
2. Verify that the PDF viewer can search for an find the word 'nobody' in the document
3. Check the document into the content server with index tracing on level trace
4. Once released, see that searching for the word 'nobody' does not return the document
5. Looking at the text extracted from the file it is seen that the word 'nobody' was extracted as 'no' 'body'

The issue has the following business impact:
Due to this issue, WebCenter content cannot be rolled out. If the text they are searching for has not been indexed properly, users will never find the content they checked in causing massive delays in their daily tasks.


To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!

In this Document

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.