External Table Using COLUMN TRANSFORMS Doesn't Work When Database Character Set Is Multibyte (AL32UTF8) (Doc ID 1359244.1)

Last updated on NOVEMBER 28, 2016

Applies to:

Oracle Server - Enterprise Edition - Version: 10.2.0.1 to 11.2.0.2 - Release: 10.2 to 11.2
Information in this document applies to any platform.

Symptoms

You created an external table that uses the option COLUMN TRANSFORMS to load data from a flat file containing special characters. The select from external table doesn't show the special characters correctly. The database character set is multibyte (AL32UTF8), NLS_LANG environment variable is properly set.

Errors like:

ORA-29275: partial multibyte character

are also possible.

The next test case demonstrates this:

connect / as sysdba

create or replace directory data_dir as '/tmp';

-- create the environment
create user test identified by test default tablespace users temporary tablespace temp;
grant connect, resource to test;
grant read, write on directory data_dir to test;

connect test/test

create table extdata
(
   name varchar2(50)
)
organization external
(
   type oracle_loader
   default directory data_dir
   access parameters
   (
      records delimited by newline
      characterset we8iso8859p1
      badfile data_dir:'data.bad'
      logfile data_dir:'data.log'
      fields terminated by '|'
      (
         first_name char(20),
         last_name char(20)
      )
      column transforms
      (
         name from concat (constant '<', first_name, constant ' ', last_name, constant '>')
      )
   )
   location (data_dir:'data.txt')
)
reject limit unlimited noparallel nomonitoring;

The content of input file data.txt is the following one:

Hans|Müller
San|Sebastián

The codepoints of special characters 'ü' and 'á' are conform with specified character set WE8ISO8859P1. You can verify this either using a hex editor (on Windows platforms) or simply perform on Unix platforms:

#> od -c -t x1 data.txt

0000000 H  a  n  s  |  M  ü  l  l  e  r  \n S  a  n  |
        48 61 6e 73 7c 4d fc 6c 6c 65 72 0a 53 61 6e 7c
0000020 S  e  b  a  s  t  i  á  n  \n
        53 65 62 61 73 74 69 e1 6e 0a

The selects from external table return:

select * from extdata;

NAME
--------------------------------------------------
<Hans M¿ller>
<San Sebasti¿

select name, dump (name, 16) from extdata1;

NAME
--------------------------------------------------
DUMP(NAME,16)
--------------------------------------------------------------------------------
<Hans M¿ller>
Typ=1 Len=13: 3c,48,61,6e,73,20,4d,fc,6c,6c,65,72,3e

<San Sebasti¿
Typ=1 Len=15: 3c,53,61,6e,20,53,65,62,61,73,74,69,e1,6e,3e

=> The codepoints ('ü' = 0xFC and 'á' = 0xE1) verified by function DUMP are correct conform with character set WE8ISO8859P1) but these are not displayed correctly (the database character set is AL32UTF8 and the codepoints should be 'ü' = 0xC3BC and 'á' = 0xC3A1).

If you exclude from external table the option COLUMN TRANSFORMS, like:

create table extdata1
(
   first_name varchar2(20),
   last_name varchar2(20)
)
organization external
(
   type oracle_loader
   default directory data_dir
   access parameters
   (
      records delimited by newline
      characterset we8iso8859p1
      badfile data_dir:'data1.bad'
      logfile data_dir:'data1.log'
      fields terminated by '|'
      (
         first_name char(20),
         last_name char(20)
      )
   )
   location (data_dir:'data.txt')
)
reject limit unlimited noparallel nomonitoring;

then the selects show:

select * from extdata1;

FIRST_NAME           LAST_NAME
-------------------- --------------------
Hans                 Müller
San                  Sebastián

select '<'||first_name||' '||last_name||'>' name from extdata1;

NAME
-------------------------------------------
<Hans Müller>
<San Sebastián>

select '<'||first_name||' '||last_name||'>' name, dump ('<'||first_name||' '||last_name||'>', 16) dump from extdata1;

NAME
-------------------------------------------
DUMP
-----------------------------------------------------
<Hans Müller>
Typ=1 Len=14: 3c,48,61,6e,73,20,4d,c3,bc,6c,6c,65,72,3e

<San Sebastián>
Typ=1 Len=16: 3c,53,61,6e,20,53,65,62,61,73,74,69,c3,a1,6e,3e

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms