Use of unpartitioned ECKD swap disk on zLinux causes overlay of heap storage and ORA-07445 (Doc ID 1360796.1)

Last updated on SEPTEMBER 11, 2017

Applies to:

Oracle Database - Enterprise Edition - Version 10.1.0.2 and later
IBM: Linux on System z

Symptoms

The Alert log contains many occurrances of ORA-7445 errors of the following form:

"ORA-07445: exception encountered: core dump [ routine ] [SIGSEGV]
[Address not mapped to object] [0xE5E5E5E5E5E5E000]"
with the same "Arg [a]" or [0xE5E5E5E5E5E5E000] pointer to heap structures.

The following is a sample of   routine  names found in ORA-07445 errors with SIGSEGV exception:

kqrpre1()+598 , kglsim_upd_newhp()+588 , kglobcl()+222 , kglhdiv_callback()+224 , kksIterCursorStat()+1136 ,
kghalo()+2388 , kglscn()+456 , ksfhlt()+34 , kqrshu()+872 , kglic0()+1218 , kgscDump()+184 ,
ksmdget()+236 , ksmdcln()+1422 , kksFreeHeapGetMutex()+60 , opifcs()+282 , kxsDumpCursors()+360 ,
strcpy()+8 , kglsim_fr_simhp()+1136 , kglobf0()+1244 , kxsnfy()+208 , kgidmp()+1656 , kgiind()+334 ,
kgllccl()+1694 , kafgec()+294 , xsstatedump()+228 , opiexe()+3796 , nlqudeq()+24 , kksFreeCursorStat()+90 ,
kglLockCursor()+860 , kglget()+7184 , kksCursorFreeCallBack()+100 , sltstcu()+36 , kghadjust()+136 ,
sltstcu()+36 , kocdsgt()+458 , pfrinstr_MOVNU()+100 , kolredur()+478 , kkmvcs()+168 , kglsscn()+442.

Many of the corresponding trace  files for each  ORA-07445 show segmentation violations have
occurred because one of the registers contains "e5e5e5e5e5e5e5e5" when any of the heap processing modules that begin with "KG" attempts to reference an address using this register value, it gets SIGSEGV exception because "Address not mapped to object";

Part of SGA data, ie., heap, is being overlayed with 'E5' and "Bug 12827345 - OVERLAY OF
HEAP HEADERS CAUSES ORA-07445 SIGSEGV ADDRESS NOT MAPPED TO OBJECT" indicates one of the trc showed an E5 overlay extended for 4012 decimal bytes long.

E5 does not make sense in any code page other than EBCDIC, where it is "V".

The overlayed data is always E5 and always ends on a 4k page boundary.

Once heap corruption is detected, every session gets ORA-07445 SIGSEGV exception
when heap headers are referenced.

Almost every ORA-07445 that overlays heap storage with E5 causes production database
to eventually crash and must be restarted.

After zLinux-SLES10 OS was upgraded to SP4, and production 10g node was restarted,
during one 10 minute period, get 65 occurrances of SIGILL (Illegal operand) exception
when heap module kghfrunp() attempts to free space and register 4 contains invalid address e5e5e5e5e5e5e5e5, and register 5 contains 72617465206f6d65:
ORA-07445: exception encountered: core dump [72617465206F6D67] [SIGILL] [Illegal operand] [0x72617465206F6D65] [] []




Many of the ORA-07445 trc contain secondary symptoms of ORA-00600 with "Arg [a]"
containing [17147], [17114], [kghfrh:ds], [kggsmGetString:1], [17105], [17128], [17148]. 

The following Heap management notes describes these secondary symptoms
as a heap (in-memory) corruption: 
"ORA-600 [17114] "KGH Bad magic number in header" (Doc ID 34782.1)"
"ORA-600 [17147] (Doc ID 138580.1)", "ORA-600 [17105] (Doc ID 138566.1)",
"ORA-600 [17128] (Doc ID 138576.1)"
"ORA-600 [17148] "KGH Bad magic number (Recreatable chunk)" (Doc ID 34781.1)"

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms