My Oracle Support Banner

AL32UTF8 / UTF8 (Unicode) Database Character Set Implications (Doc ID 788156.1)

Last updated on SEPTEMBER 04, 2023

Applies to:

Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Information in this document applies to any platform.

Purpose

To provide some practical hints on how to deal with the effects of moving to an AL32UTF8 database character set and using Unicode clients.
To do the actual conversion to AL32UTF8 see <Note 260192.1> Changing the NLS_CHARACTERSET to AL32UTF8/UTF8 (Unicode)  in 8i, 9i , 10g and 11g or <Note 1272374.1> The Database Migration Assistant for Unicode (DMU) Tool

While this note is written for going to AL32UTF8/UTF8 most of the facts are also applicable when changing to any other Multibyte characterset (ZHS16GBK, ZHT16MSWIN950, ZHT16HKSCS, ZHT16HKSCS31,KO16MSWIN949, JA16SJIS ...), simply substitute AL32UTF8 with the xx16xxxx target characterset. But in that case going to AL32UTF8 would be simply a far better idea.

Scope

This note uses AL32UTF8, all information in this note is  the same for UTF8 unless explicitly stated.
Note that UTF8 and AL32UTF8 are Oracle specific names and UTF-8 (with a -) refers to the Unicode standard UTF-8 encoding scheme.
 

IMPORTANT: Do NOT use Expdp/Impdp when going to (AL32)UTF8 or an other multibyte characterset on ALL 10g versions lower than 10.2.0.4 (including 10.1.0.5). Also 11.1.0.6 is affected.
It will provoke data corruption unless you applied <Patch 5874989> on the Impdp side, Expdp is not affected. The "old" exp/imp tools are not affected. This problem is fixed in the 10.2.0.4 and 11.1.0.7 patch set.
For windows the fix is included in
10.1.0.5.0 Patch 20 (10.1.0.5.20P) or later, see <Note 276548.1>
10.2.0.3.0 Patch 11 (10.2.0.3.11P) or later, see <Note 342443.1>

Details

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Purpose
Scope
Details
 A) Often asked questions:
 A.1) Do I need to use Nchar, Nvarchar2 or Nclob?
 A.2) Does my Operating System need to support Unicode or do I need to install character sets on OS level?
 A.3) What are the Unicode character sets and the Unicode versions in Oracle RDBMS?
 A.4) Is -insert language or character here- supported/defined/known in an Oracle AL32UTF8/UTF8 database?
 A.5) How to convert from a non Unicode NLS_CHARACTERSET to Unicode ( AL32UTF8 or UTF8 ) NLS_CHARACTERSET?
 A.6) I also want to upgrade to a new Oracle version, do I go to AL32UTF8 before or after the upgrade?
 A.7) How can I see my application(s) will work fine with an AL32UTF8 or UTF8 database?
 A.8) How can I use Flashback DB technology and DMU tool to convert characterset to Unicode (AL32UTF8), So that If any issue I can quickly revert ?
 B) Server side implications.
 B.1) Storage of data in AL32UTF8 ( ORA-01401 / ORA-12899 during insert).
 B.2) How much will my database grow when going to AL32UTF8?
 B.3) Codepoints for characters may change in AL32UTF8.
 B.4) The meaning of SP2-0784, ORA-29275 and ORA-600 [kole_t2u], [34] errors / losing characters when using convert.
 B.5) Do I need to convert from UTF8 to AL32UTF8 ? How to convert from UTF8 NLS_CHARACTERSET to AL32UTF8 NLS_CHARACTERSET?
 B.6) ORA-01401 / ORA-12899 while importing data in an AL32UTF8 database (or move data using dblinks).
 B.7) Object and user names using non-US7ASCII characters.
 B.8) The password of an user can only contain single byte data in 10g and below.
 B.9) When using DBMS_LOB.LOADFROMFILE.
 B.10) When using UTL_FILE
 B.11) When using sqlldr or external tables.
 B.12) Make sure you do not store "binary" (  pdf , doc, docx, jpeg, png , etc files) or Encrypted data (passwords) in character datatypes (CHAR, VARCHAR2, LONG, CLOB).
 B.13) String functions work with characters not byte (length,like,substr ...).
 B.14) LPad and Rpad count in "display units" not characters.
 B.15) Using LIKE and INSTR.
 B.16) Character functions that are returning character values might silently truncate data.
 B.17) Column size double or triple when using (Materialized) Views / CTAS in an AL32UTF8 db and the dblink points to a non-AL32UTF8 db (or inverse).
 B.18) ORA-01406 or ORA-06502 when fetching data from non-AL32UTF8 databases using cursors (PL/SQL)
 B.19) When using HTMLDB.
 B.20) When using non-US7ASCII names in directory's or file names.
 B.21) When using XDB (xmltype).
 B.22) Upper and NLS_upper give unexpected results on the Micro symbol or turkish i and I characters.
 B.23) Lower and NLS_lower do not handle Greek Sigma Uppercase / capital Σ to lowercase conversion based on position of the Sigma symbol.
 B.24) After going to AL32UTF8 ORA-24816: Expanded non LONG bind data supplied after actual LONG or LOB column error may be seen
 B.25) What is the impact on CPU and memory usage?
 C) The Client side.
 C.1) Common misconceptions about NLS_LANG.
 C.2) Configuring your UNIX client to be an UTF-8 (Unicode) client.
 C.3) Configuring your Microsoft Windows client to be an UTF-8 (Unicode) client.
 C.4) The default column width of output in sqlplus will change.
 C.5) Configuring your web based client to be a Unicode client.
 C.6) Using Sqlplus or Oracle SQLDeveloper to run scripts inserting non-US7ASCII data.
 C.7) Spooling files using sqlplus is much slower using NLS_LANG set to UTF8 or AL32UTF8
 C.8) Using Oracle Applications.
 C.9) Using Portal.
 C.10) Oracle Forms PDF and Unicode
 C.11) Changing a database to AL32UTF8 hosting an OracleAS 10g Metadata Repository.
 D) Known Issues
 E) Other useful references
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.