Oracle Text and UCM - Stop Word Management
Last updated on MAY 01, 2017
Applies to:Oracle WebCenter Content - Version 10.1.3.3.0 to 10.1.3.5.0 [Release 10gR3]
Information in this document applies to any platform.
Stop words are used within Oracle database indexes that use Oracle Text to control excessive index population by common words that appear in nearly every full text document.
Stop words are words that Oracle Text will not index when processing extracted text from UCM documents. The default stoplist that gets installed with Oracle Text has more than 100 words defined. These are common words and common parts of speech, such as articles, conjunctions, prepositions, linking verbs, and adverbs. The reason these words are not indexed as valid tokens is because when Oracle Text indexes full text, these words are nearly always present. What happens then is that the index may contain a reference to nearly every row in the base table, making it bloated and ultimately slow query times.
For example, the word “a” is likely to appear in nearly every document checked into UCM. Were this word indexed, a search string that included the word "a" would produce a long list of documents. To think of this in terms of a book index, the word “a” would likely need a page reference for every page and paragraph of text, and it would not be useful in searching and finding valuable content in any case. Likewise, in Oracle Text, stoplists are designed to make the “tokens” that get indexed more useful and meaningful.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms