My Oracle Support Banner

How to Make - or _ or Other Chars Part of a Token and Prevent them from being Tokenized into Different Words During Indexing (Doc ID 1040641.1)

Last updated on NOVEMBER 30, 2018

Applies to:

Oracle Knowledge - Version 8.0.x to 8.5 [Release 8.0 to 8.5]
Information in this document applies to any platform.
Information in this document applies to any platform.

Symptoms

How to make - or _ or other punctuation chars part of a token and prevent them from being tokenized?

Tokenization is the process of dividing words into distinct words that are indexed.  Words are created if characters are separated by spaces, non-alpha numeric chars, or alpha numeric transitions.

There are two very specific types of customizations that can be made to the tokenizer for handling specific cases of tokenization around non alpha characters and alpha numeric transitions.

Some examples of their uses are social security numbers, words with - or _, filenames or error codes that contain underscores, phone numbers etc.  By tokenizing these the search will use them as a single token rather than searching on individual pieces of the word, filename, social security number etc.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.