Unlike Basis (used in all languages in InQuira v8.4.x and prior as well as most locales in OK v8.5.x) , new OLT used in OKv8.6x no longer split compound words into multiple tokens/stems

(Doc ID 2151974.1)

Last updated on JULY 19, 2016

Applies to:

Oracle Knowledge - Version 8.6 and later
Information in this document applies to any platform.

Symptoms

- In old InQuira/OK releases that used the 3rd-party BASIS technology, the compound-words (which are very common in some languages like Dutch and German) were being tokenized as multiple tokens/stems with the RegexTokenizer feature. So, they can get large set of search results that they may see and prioritize.


- In new OK v8.6.0 release that uses the new OLT (Oracle Language Technologies), those compound-words are being tokenized as single but not multiple tokens/stems because the RegexTokenizer feature is disabled by default for performance concern.

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms