Expdp Corrupts The Long Datatype

(Doc ID 443551.1)

Last updated on APRIL 16, 2010

Applies to:

Oracle Server - Enterprise Edition - Version: 10.1.0.2 to 10.2.0.4 - Release: 10.1 to 10.2
Information in this document applies to any platform.

Symptoms

When importing data from a LONG column and the export database character set is multibyte (such as UTF8, AL32UTF8) the long data would not be imported correctly if the access method used by Data Pump Import was external tables.

If the LONG data with special characters (ex: mâché) is encoded in Unicode then special characters would be handled following Unicode handling schemes. For instance, to check that get a block dump and look at the code points. You can use the following steps:

a) Find a row that contains a 'special' character within the source table that isn't properly handled by your data correction routine.
b) Determine the row id of that row by getting the ROWID column value of that row.
SELECT ROWID FROM source_table_name WHERE <condition_is_true>;
c) Get a block dump by specifying owner and souce table name, and rowid:
connect / as sysdba
select dbms_rowid.ROWID_TO_ABSOLUTE_FNO(rowid,'<OWNER>','<TABLE NAME') as file_id, dbms_rowid.rowid_block_number(rowid) as block_id from owner.source_table_name;

then use file_id and block_id values on :

alter system dump datafile <file_id> block <block_id>;

We can see that special characters â and é in "mâché"
are stored as "00 e2" and "00 e9":
00 6d 00 e2 00 63 00 68 00 e9

These are characters â and é that are stored as 0xe2 and 0xe9. Problem here is that database character set is AL32UTF8. So, these chacacters should be stored as 0xC3A2 and 0xC3A9.

Unicode Name Character Unicode Codepoint Charset
LATIN SMALL LETTER A WITH CIRCUMFLEX â U+00E2 C3A2 Detail AL32UTF8
LATIN SMALL LETTER E WITH ACUTE é U+00E9 C3A9 Detail AL32UTF8

So what is stored is a corrupt unicode value and not the expected AL32UTF8 code point.

Changes

A data corruption problem exists after a Data Pump Import under the following conditions:

1. The character set of the export database was a multibyte character set (such as UTF8, AL32UTF8). The issue does not reproduce in a WE8DEC database.

2. The table contained a column of data type LONG

3. The table had columns added after it was created using the SQL alter table statement.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms