Oracle Text and UCM - Stop Word Management
(Doc ID 870122.1)
Last updated on JULY 20, 2024
Applies to:
Oracle WebCenter Content - Version 10.0 to 12.2.1.3.0 [Release 10gR3 to 12c]Information in this document applies to any platform.
Goal
Stop-words are used within Oracle database indexes that use Oracle Text to control excessive index population by common words that appear in nearly every full text document.
Stop words are words that Oracle Text will not index when processing extracted text from UCM documents. The default stoplist that gets installed with Oracle Text has more than 100 words defined. These are common words and common parts of speech, such as articles, conjunctions, prepositions, linking verbs, and adverbs. The reason these words are not indexed as valid tokens is because when Oracle Text indexes full text, these words are nearly always present. What happens then is that the index may contain a reference to nearly every row in the base table, making it bloated and ultimately slow query times.
For example, the word “a” is likely to appear in nearly every document checked into UCM. Were this word indexed, a search string that included the word "a" would produce a long list of documents. To think of this in terms of a book index, the word “a” would likely need a page reference for every page and paragraph of text, and it would not be useful in searching and finding valuable content in any case. Likewise, in Oracle Text, stoplists are designed to make the “tokens” that get indexed more useful and meaningful.
Solution
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Goal |
Solution |
Reading the default stoplist and stopwords from the database |
Add a stopword |
Remove a stopword |
Create a custom stoplist for use with UCM indexing |
Create the custom stoplist |
Using the custom stoplist as the database default for Oracle Text indexes |
Using the custom stoplist only for UCM index creation |