When Using the WORLD_LEXER, Indexing Certain 3-byte UTF8 Characters Fails with DRG-11302, DRG-11428: document contains invalid characters
(Doc ID 1551860.1)
Last updated on APRIL 04, 2019
Applies to:
Oracle Text - Version 11.1.0.7 to 12.1.0.2 [Release 11.1 to 12.1]Information in this document applies to any platform.
Symptoms
When the database character set is AL32UTF8 and the WORLD_LEXER is used, indexing documents with certain 3-byte characters fails with the following errors:
DRG-11301: error while indexing document
DRG-11302: document may be partially indexed
DRG-11428: document contains invalid characters
Below are some of the characters which cannot be indexed with the WORLD_LEXER:
U+215E 0xE2859E VULGAR FRACTION SEVEN EIGHTHS
U+215D 0xE2859D VULGAR FRACTION FIVE EIGHTHS
U+215C 0xE2859C VULGAR FRACTION THREE EIGHTHS
U+2158 0xE28598 VULGAR FRACTION FOUR FIFTHS
U+2140 0xE28580 DOUBLE-STRUCK N-ARY SUMMATION
U+2141 0xE28581 TURNED SANS-SERIF CAPITAL G
U+2149 0xE28589 DOUBLE-STRUCK ITALIC SMALL J
U+215F 0xE2859F FRACTION NUMERATOR ONE
When the AUTO_LEXER is used, these characters can be indexed and are text-searchable.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |