Stem Search Without Diacritics Returns Unexpected Results when Both INDEX_STEMS + BASE_LETTER are Enabled
Last updated on SEPTEMBER 03, 2010
Applies to:Oracle Text - Version: 10.2.0.2 to 220.127.116.11 - Release: 10.2 to 11.2
Information in this document applies to any platform.
We need AUTO_LEXER to be able to automatically search within text from multiple languages. We need stemming and base letter functions activated as we need to be able to search words without diacritics.
If we use AUTO_LEXER and activate both options, base_letter and index_stems enabled, indexing works fine but search does not return all the results.
For example if we have the word 'řece' the index stores the word 'rece' (without diacritics) and 'řeka' (the stemming form, with diacritics, the base form of the word). However, if we search for 'řece' only the words 'řece' and 'rece' are being returned, whilst 'řeka' and 'reka' are not being retrieved.
Same problem exists with BASIC_LEXER or if we use MULTI_LEXER.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms