My Oracle Support Banner

When Using the WORLD_LEXER, Indexing Certain 3-byte UTF8 Characters Fails with DRG-11302, DRG-11428: document contains invalid characters (Doc ID 1551860.1)

Last updated on MAY 08, 2013

Applies to:

Oracle Text - Version 11.1.0.7 to 11.2.0.3 [Release 11.1 to 11.2]
Information in this document applies to any platform.

Symptoms

When the database character set is AL32UTF8 and the WORLD_LEXER is used, indexing documents with certain 3-byte characters fails with the following errors:

DRG-11301: error while indexing document
DRG-11302: document may be partially indexed
DRG-11428: document contains invalid characters

Below are some of the characters which cannot be indexed with the WORLD_LEXER:

U+215E 0xE2859E  VULGAR FRACTION SEVEN EIGHTHS
U+215D 0xE2859D  VULGAR FRACTION FIVE EIGHTHS
U+215C 0xE2859C  VULGAR FRACTION THREE EIGHTHS
U+2158 0xE28598  VULGAR FRACTION FOUR FIFTHS

U+2140 0xE28580  DOUBLE-STRUCK N-ARY SUMMATION
U+2141 0xE28581  TURNED SANS-SERIF CAPITAL G
U+2149 0xE28589  DOUBLE-STRUCK ITALIC SMALL J
U+215F 0xE2859F  FRACTION NUMERATOR ONE

When the AUTO_LEXER is used, these characters can be indexed and are text-searchable.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.
My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.