When Using the WORLD_LEXER, Indexing Certain 3-byte UTF8 Characters Fails with DRG-11302, DRG-11428: document contains invalid characters
(Doc ID 1551860.1)
Last updated on MAY 08, 2013
Oracle Text - Version 188.8.131.52 to 184.108.40.206 [Release 11.1 to 11.2]
Information in this document applies to any platform.
When the database character set is AL32UTF8 and the WORLD_LEXER is used, indexing documents with certain 3-byte characters fails with the following errors:
DRG-11301: error while indexing document
DRG-11302: document may be partially indexed
DRG-11428: document contains invalid characters
Below are some of the characters which cannot be indexed with the WORLD_LEXER:
U+215E 0xE2859E VULGAR FRACTION SEVEN EIGHTHS
U+215D 0xE2859D VULGAR FRACTION FIVE EIGHTHS
U+215C 0xE2859C VULGAR FRACTION THREE EIGHTHS
U+2158 0xE28598 VULGAR FRACTION FOUR FIFTHS
U+2140 0xE28580 DOUBLE-STRUCK N-ARY SUMMATION
U+2141 0xE28581 TURNED SANS-SERIF CAPITAL G
U+2149 0xE28589 DOUBLE-STRUCK ITALIC SMALL J
U+215F 0xE2859F FRACTION NUMERATOR ONE
When the AUTO_LEXER is used, these characters can be indexed and are text-searchable.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document
My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.
|This document is being delivered to you via Oracle Support's Rapid Visibility (RaV) process and therefore has not been subject to an independent technical review.|