Oracle Text and UCM - Stop Word Management (Doc ID 870122.1)

Last updated on MAY 01, 2017

Applies to:

Oracle WebCenter Content - Version 10.1.3.3.0 to 10.1.3.5.0 [Release 10gR3]
Information in this document applies to any platform.

Goal

Stop words are used within Oracle database indexes that use Oracle Text to control excessive index population by common words that appear in nearly every full text document. 

Stop words are words that Oracle Text will not index when processing extracted text from UCM documents.  The default stoplist that gets installed with Oracle Text has more than 100 words defined. These are common words and common parts of speech, such as articles, conjunctions, prepositions, linking verbs, and adverbs. The reason these words are not indexed as valid tokens is because when Oracle Text indexes full text, these words are nearly always present. What happens then is that the index may contain a reference to nearly every row in the base table, making it bloated and ultimately slow query times.

For example, the word “a” is likely to appear in nearly every document checked into UCM.  Were this word indexed, a search string that included the word "a" would produce a long list of documents.  To think of this in terms of a book index, the word “a” would likely need a page reference for every page and paragraph of text, and it would not be useful in searching and finding valuable content in any case. Likewise, in Oracle Text, stoplists are designed to make the “tokens” that get indexed more useful and meaningful.

 

The examples in this note were used on a UCM environment using OracleTextSearch as the search indexer engine.  (SearchIndexerEngineName=OracleTextSearch)  However, much of this also applies when using DATABASE.FULLTEXT since this option also creates CONTEXT indexes on the database that are affected by the same stoplists and stopwords. 

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms