Stem Search Without Diacritics Returns Unexpected Results when Both INDEX_STEMS + BASE_LETTER are Enabled (Doc ID 566931.1)

Last updated on SEPTEMBER 03, 2010

Applies to:

Oracle Text - Version: 10.2.0.2 to 11.2.0.1 - Release: 10.2 to 11.2
Information in this document applies to any platform.

Symptoms

-- Problem Statement:
We need AUTO_LEXER to be able to automatically search within text from multiple languages. We need stemming and base letter functions activated as we need to be able to search words without diacritics.

If we use AUTO_LEXER and activate both options, base_letter and index_stems enabled, indexing works fine but search does not return all the results.

For example if we have the word 'řece' the index stores the word 'rece' (without diacritics) and 'řeka' (the stemming form, with diacritics, the base form of the word). However, if we search for 'řece' only the words 'řece' and 'rece' are being returned, whilst 'řeka' and 'reka' are not being retrieved.

Same problem exists with BASIC_LEXER or if we use MULTI_LEXER.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms