Possible datacorruption with Impdp if importing NLS_CHARACTERSET is different from exporting NLS_CHARACTERSET (Doc ID 457526.1)

Last updated on DECEMBER 05, 2016

Applies to:

Oracle Server - Enterprise Edition - Version 10.1.0.2 to 10.2.0.3 [Release 10.1 to 10.2]
Oracle Server - Enterprise Edition - Version 11.1.0.6 to 11.1.0.6 [Release 11.1]
Information in this document applies to any platform.
Oracle Server - Enterprise Edition - Version: 11.1.0.6 to 11.1.0.6
This problem can occur on any platform.
Affects Impdp (Import Datapump) only.
Affected versions:
Database 10.1.0.2 to 10.1.0.5 and 10.2.0.1 to 10.2.0.3
Database 11.1.0.6
Fixed in 10.2.0.4 (and 10.2.0.5) and 11.1.0.7 patchset
Fixed in 11.2.0.1 and up
***Checked for relevance on 22-Jun-2012***
The original import tool (Imp) is not affected.

Symptoms

When the target (impdp side) NLS_CHARACTERSET is different from the source (expdp side) NLS_CHARACTERSET  then Impdp (Import Datapump) may provoke random silent corruption of some CHAR, VARCHAR2, CLOB and LONG data.

Please note that:

* The original imp tool is not affected by this problem.
* Impdp will not give errors or warnings, the corruption can only been seen on some rows when selecting the imported dataset.
* Not all data is corrupted, only a certain part. Typical figures are around 50 rows on 500.000 rows for 8 bit to multibyte (AL32UTF8), but this depends also on your dataset so it can be more or less. When going from 8bit to 8 bit the possibility to have a corruption is very low.
* This is a impdp problem, the expdp dmp files are not corrupted.
* It only happens during characterset conversions, EBCDIC characterset are not affected.

This bug may provoke two similar kinds of silent corruption:

A) Data may be corrupted around the first non-US7ASCII character but selecting will NOT provoke a error because the corrupted data is not longer then the defined column length.

Typically this will provoke corruption that repeats a part of the row's contents.
An example:
In the original database the data looks like:

This is a test sentence that has here -> è wich is a non-US7ASCII character and here is the rest of the data in this row


After the impdp it may look like:

This is a test sentence that has here -> This is a test sentence that has here -> è wich is a non-US7ASCII character and here is the rest of the data in this row


or even missing a part of the data:

This is a test sentence that has here ->This is a test sentence that has here ->and here is the rest of the data in this row



B)In addition to the corrupted data some of the corrupted rows may contain more data then the column length alows, for example if you do a

select length(<colum>) from <table>;


you may see that for some of the corrupted rows the length is longer then the defined column length.
For example you have a row that has a length of 155 but the column is only defined as VARCHAR2 150.

When selecting this data or when creating indexes on this data you may see a ora-3113 or ORA-7445: EXCEPTION ENCOUNTERED: CORE DUMP [KDSTSRP0KM()+760]
This are rows that also suffer from the corruption in point A), it's just more visible because the length is longer then the defined column size.

A part of the corruption can be found by the script in <Note 428526.1> Baddata Script To Check Database For Corrupt column data

Changes

The problem occures when:
A) the dataset starts with a US7ASCII character and then contains non-US7ASCII characters in the same row

This are all US7ASCII character:
!"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

Note: If the entire row contains non-US7ASCII characters there will be no corruption, so this will mostly affect databases who store Latin languages like Spanish, French,Dutch etc or English combined with currency symbols like the £ or the € sign. The $ sign is a US7ASCII character and will not provoke a problem.


AND

B) the source NLS_CHARACTERSET is different from the target NLS_CHARACTERSET and the non-US7ASCII character need conversion.

An example of this is the letter "è". In the WE8ISO8859P1 / WE8MSWIN1252 characterset the "é" is code E8 (hex) - 232 (decimal) but in AL32UTF8 the letter "è" uses 2 bytes : C3,A8 (hex) - 195,232 (decimal).
A other example is the € when going from WE8ISO8859P15 to WE8MSWIN1252 (or inverse). The euro is code 164 (decimal) in WE8ISO8859P15 but code 128 (decimal) in WE8MSWIN1252.


To find your NLS_CHARACTERSET use this select:

select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';


There will be NO corruption if the source and target NLS_CHARACTERSET are the same.
The corruption may happen when you go from any NLS_CHARACTERSET to UTF8 / AL32UTF8.

The following table gives a overview of often used NLS_CHARACTERSET combinations:

Source NLS_CHARACTERSETTarget NLS_CHARACTERSETCorruption can happen?
NLS_CHARACTERSET is the same as target NLS_CHARACTERSET is the same as source No
US7ASCII xx8ISO8859Px / xx8MSWIN12xx No
US7ASCII JA16xxxx / KO16xxxx / ZHS16xxxx / ZHT16xxxx No
US7ASCII AL32UTF8 / UTF8 No
xx8ISO8859Px xx8MSWIN12xx Yes, probability is very low.  NOT possible for WE8ISO8859P1 to WE8MSWIN1252
xx8MSWIN12xx xx8ISO8859Px Yes, probability is very low.
xx8ISO8859Px / xx8MSWIN12xx AL32UTF8 / UTF8 Yes
JA16xxxx / KO16xxxx / ZHS16xxxx / ZHT16xxxx AL32UTF8 / UTF8 Yes
AL32UTF8 / UTF8 xx8ISO8859Px / xx8MSWIN12xx /JA16xxxx, KO16xxxx / ZHS16xxxx / ZHT16xxxx Yes
AL32UTF8 UTF8 Normally NO, Yes if you have "convertible" in csscan - wich is very rare.
UTF8 AL32UTF8 Normally NO, Yes if you have "convertible" in csscan - which is very rare.



Note that it is still possible to "lose" data by doing a incorrect conversion. If you export Portuguese data from a WE8ISO8859P1 database and import it into a JA16SJIS database you will lose the Portuguese simply because JA16SJIS only support English and Japanese. This has nothing to do with the problem described here.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms