Oracle Text Is Not Indexing Certain Pdf Files In Portal (Doc ID 1102066.1)

Last updated on NOVEMBER 08, 2016

Applies to:

Portal - Version 10.1.2.0.2 to 10.1.4.2 [Release 10gR2]
Information in this document applies to any platform.
Checked for relevance on 25-Mar-2015


Symptoms

Some PDF type files are not being indexed on portal.  This is manifesting itself in the documents not being found in searches on terms within them.
Doing the text search directly within Oracle Text is not working either.

How to Reproduce:

1.- Create a temporary table:

SQL> create table test_pdf      (id number, pdf_file varchar2(100));
Table created.

2.- Insert reference to a pdf file on the server:

SQL> insert into pdf_test values
2 (1,'/path/Documentnotindexed.pdf')
3 /
1 row created.

SQL> create index pdf_idx on pdf_test(pdf_file)
2 indextype is ctxsys.context
3 parameters ('datastore ctxsys.file_datastore')
4 /
Index created.

SQL> select token_text from dr$pdf_idx$i
2 where token_text like 'TEXT%'
3 /
no rows selected

Note: Change 'TEXT%' to some text found in the PDF file.

SQL> select * from pdf_test
2 where contains(pdf_file,'TEXT%') > 0
3 /

no rows selected

       This confirms that the text search doesn't work with the current version of the filter used.



Changes

The PDF files being indexed may be using a custom font.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms