Memory Corruption on HP ProLiant Servers Running High Load DataBases Leading to "BUG: Bad page map in process" Errors and System Panics

(Doc ID 2324382.1)

Last updated on DECEMBER 03, 2017

Applies to:

Linux OS - Version Oracle Linux 6.7 with Unbreakable Enterprise Kernel [4.1.12] and later
Linux x86-64

Symptoms

HP ProLiant DL580 Generation 9 server with BIOS version U17 02/17/2017, running Oracle Linux 6.9 with UEK4 kernel 4.1.12-61.1.28.el6uek.x86_64 crashes constantly with the following symptoms

1. 'Out of memory' errors triggered by OOM Killer as the following example demonstrate:

Jul 21 05:50:58 xx.server.com kernel: [39512.669393] Out of memory: Kill process 11701 (java) score 0 or sacrifice child
Jul 21 05:50:58 xx.server.com kernel: [39512.688572] Killed process 11701 (java) total-vm:9909464kB, anon-rss:436032kB, file-rss:12212kB
Jul 21 05:50:59 xx.server.com kernel: [39512.808940] Out of memory: Kill process 20620 (ocssd.bin) score 0 or sacrifice child
Jul 21 05:50:59 xx.server.com kernel: [39512.826006] Killed process 20620 (ocssd.bin) total-vm:2437992kB, anon-rss:161328kB, file-rss:89420kB

2. Different Call Traces generated under /var/log/messages with 'pte' errors similar to the following examples:

BUG: Bad page map in process oracle pte:3912343733363831 pmd:13e8f840067
BUG: Bad page map in process perl pte:2020202020202020 pmd:13ee91ea067
BUG: Bad page map in process oraagent.bin pte:18bc1b2a062f0055 pmd:13f1a56b067

The following are examples of three different call traces generated:

Jul 21 00:21:36 xx.server.com kernel: [19736.260141] BUG: Bad page map in process oracle pte:3912343733363831 pmd:13e8f840067
Jul 21 00:21:36 xx.server.com kernel: [19736.260623] addr:0000000001719000 vm_flags:08000875 anon_vma: (null) mapping:ffff8979a19d7860 index:1319
Jul 21 00:21:36 xx.server.com kernel: [19736.261290] file:oracle fault:ext4_filemap_fault [ext4] mmap:ext4_file_mmap [ext4] readpage:ext4_readpage [ext4]
Jul 21 00:21:36 xx.server.com kernel: [19736.261903] CPU: 31 PID: 6682 Comm: oracle Tainted: P B O 4.1.12-61.1.28.el6uek.x86_64 #2
Jul 21 00:21:36 xx.server.com kernel: [19736.261912] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 02/17/2017
Jul 21 00:21:36 xx.server.com kernel: [19736.261913] 0000000000000000 ffff893c39dc3b08 ffffffff816c6e40 ffff893c4bf804b0
Jul 21 00:21:36 xx.server.com kernel: [19736.261916] 0000000001719000 ffff893c39dc3b58 ffffffff811b7356 ffff893c39dc3b68
Jul 21 00:21:36 xx.server.com kernel: [19736.261918] ffffffff811b7365 ffff893c39dc3b48 000000000171a000 ffff893e8f8408c8
Jul 21 00:21:36 xx.server.com kernel: [19736.261919] Call Trace:
Jul 21 00:21:36 xx.server.com kernel: [19736.261921] [<ffffffff816c6e40>] dump_stack+0x63/0x83
Jul 21 00:21:36 xx.server.com kernel: [19736.261923] [<ffffffff811b7356>] print_bad_pte+0x1e6/0x280
Jul 21 00:21:36 xx.server.com kernel: [19736.261924] [<ffffffff811b7365>] ? print_bad_pte+0x1f5/0x280
Jul 21 00:21:36 xx.server.com kernel: [19736.261925] [<ffffffff811b745e>] vm_normal_page+0x6e/0x80
Jul 21 00:21:36 xx.server.com kernel: [19736.261926] [<ffffffff811b78a2>] zap_pte_range+0x212/0x510
Jul 21 00:21:36 xx.server.com kernel: [19736.261928] [<ffffffff811b8efd>] unmap_page_range+0x1cd/0x300
Jul 21 00:21:36 xx.server.com kernel: [19736.261929] [<ffffffff811b90b7>] unmap_single_vma+0x87/0x100
Jul 21 00:21:36 xx.server.com kernel: [19736.261930] [<ffffffff811b9624>] unmap_vmas+0x54/0xa0
Jul 21 00:21:36 xx.server.com kernel: [19736.261932] [<ffffffff811bf0ba>] exit_mmap+0x9a/0x150
Jul 21 00:21:36 xx.server.com kernel: [19736.261934] [<ffffffff81082473>] mmput+0x73/0x110
Jul 21 00:21:36 xx.server.com kernel: [19736.261935] [<ffffffff810875ac>] exit_mm+0x13c/0x1d0
Jul 21 00:21:36 xx.server.com kernel: [19736.261937] [<ffffffff8112c11c>] ? __audit_free+0x1cc/0x220
Jul 21 00:21:36 xx.server.com kernel: [19736.261939] [<ffffffff810877f4>] do_exit+0x1b4/0x510
Jul 21 00:21:36 xx.server.com kernel: [19736.261940] [<ffffffff8106de9b>] ? __do_page_fault+0x18b/0x480
Jul 21 00:21:36 xx.server.com kernel: [19736.261941] [<ffffffff8112c21c>] ? __audit_syscall_entry+0xac/0x110
Jul 21 00:21:36 xx.server.com kernel: [19736.261943] [<ffffffff8102587c>] ? do_audit_syscall_entry+0x6c/0x70
Jul 21 00:21:36 xx.server.com kernel: [19736.261944] [<ffffffff81087ba6>] do_group_exit+0x56/0x100
Jul 21 00:21:36 xx.server.com kernel: [19736.261946] [<ffffffff81087c67>] SyS_exit_group+0x17/0x20
Jul 21 00:21:36 xx.server.com kernel: [19736.261947] [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71
Jul 21 00:21:36 xx.server.com kernel: [19736.262523] BUG: Bad rss-counter state mm:ffff893f8c905a00 idx:2 val:-1
Jul 21 01:14:16 xx.server.com kernel: [22898.316376] swap_free: Bad swap file entry 3018181918180518

Jul 23 14:34:39 xx.server.com kernel: [15395.553148] BUG: Bad page map in process perl pte:2020202020202020 pmd:13ee91ea067
Jul 23 14:34:39 xx.server.com kernel: [15395.573820] addr:00007fd37d928000 vm_flags:08000070 anon_vma: (null) mapping:ffff8979b373fcc0 index:d6
Jul 23 14:34:39 xx.server.com kernel: [15395.594657] file:Hostname.so fault:ext4_filemap_fault [ext4] mmap:ext4_file_mmap [ext4] readpage:ext4_readpage [ext4]
Jul 23 14:34:39 xx.server.com kernel: [15395.615733] CPU: 31 PID: 23770 Comm: perl Tainted: P B O 4.1.12-61.1.28.el6uek.x86_64 #2
Jul 23 14:34:39 xx.server.com kernel: [15395.615735] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 02/17/2017
Jul 23 14:34:39 xx.server.com kernel: [15395.615737] 0000000000000000 ffff893df9267940 ffffffff816c6e40 ffff893da6839ce8
Jul 23 14:34:39 xx.server.com kernel: [15395.615754] 00007fd37d928000 ffff893df9267990 ffffffff811b7356 ffff893df9267970
Jul 23 14:34:39 xx.server.com kernel: [15395.615759] 2010101010101010 ffff893df9267990 00007fd37d929000 ffff893ee91ea940
Jul 23 14:34:39 xx.server.com kernel: [15395.615765] Call Trace:
Jul 23 14:34:39 xx.server.com kernel: [15395.615775] [<ffffffff816c6e40>] dump_stack+0x63/0x83
Jul 23 14:34:39 xx.server.com kernel: [15395.615785] [<ffffffff811b7356>] print_bad_pte+0x1e6/0x280
Jul 23 14:34:39 xx.server.com kernel: [15395.615787] [<ffffffff811b7b5a>] zap_pte_range+0x4ca/0x510
Jul 23 14:34:39 xx.server.com kernel: [15395.615793] [<ffffffff811966b9>] ? file_ra_state_init+0x19/0x30
Jul 23 14:34:39 xx.server.com kernel: [15395.615795] [<ffffffff811b8efd>] unmap_page_range+0x1cd/0x300
Jul 23 14:34:39 xx.server.com kernel: [15395.615797] [<ffffffff811b90b7>] unmap_single_vma+0x87/0x100
Jul 23 14:34:39 xx.server.com kernel: [15395.615798] [<ffffffff811b9624>] unmap_vmas+0x54/0xa0
Jul 23 14:34:39 xx.server.com kernel: [15395.615800] [<ffffffff811bf0ba>] exit_mmap+0x9a/0x150
Jul 23 14:34:39 xx.server.com kernel: [15395.615819] [<ffffffff81082473>] mmput+0x73/0x110
Jul 23 14:34:39 xx.server.com kernel: [15395.615823] [<ffffffff8120d4e0>] exec_mmap+0x210/0x480
Jul 23 14:34:39 xx.server.com kernel: [15395.615824] [<ffffffff8120d7db>] flush_old_exec+0x8b/0xf0
Jul 23 14:34:39 xx.server.com kernel: [15395.615827] [<ffffffff8125cf63>] load_elf_binary+0x393/0xf70
Jul 23 14:34:39 xx.server.com kernel: [15395.615829] [<ffffffff811b5b72>] ? get_user_pages+0x52/0x60
Jul 23 14:34:39 xx.server.com kernel: [15395.615833] [<ffffffff8120b6c6>] search_binary_handler+0xb6/0x1e0
Jul 23 14:34:39 xx.server.com kernel: [15395.615834] [<ffffffff8120b848>] exec_binprm+0x58/0x170
Jul 23 14:34:39 xx.server.com kernel: [15395.615836] [<ffffffff8120cf04>] do_execveat_common+0x404/0x570
Jul 23 14:34:39 xx.server.com kernel: [15395.615839] [<ffffffff811e7a15>] ? kmem_cache_alloc+0x195/0x210
Jul 23 14:34:39 xx.server.com kernel: [15395.615841] [<ffffffff8120d284>] do_execve+0x44/0x50
Jul 23 14:34:39 xx.server.com kernel: [15395.615842] [<ffffffff8120d2bf>] SyS_execve+0x2f/0x40
Jul 23 14:34:39 xx.server.com kernel: [15395.615846] [<ffffffff816cbe55>] stub_execve+0x5/0x5
Jul 23 14:34:39 xx.server.com kernel: [15395.615847] [<ffffffff816cbb2e>] ? system_call_fastpath+0x12/0x71
Jul 23 14:34:39 xx.server.com kernel: [15395.615848] swap_free: Bad swap file entry 2024809010101010

Jul 25 01:19:03 xx.server.com kernel: [ 251.526172] BUG: Bad page map in process oraagent.bin pte:18bc1b2a062f0055 pmd:13f1a56b067
Jul 25 01:19:03 xx.server.com kernel: [ 251.526661] addr:00007f0aa774b000 vm_flags:08100077 anon_vma:ffff894173854870 mapping: (null) index:7f0aa774b
Jul 25 01:19:03 xx.server.com kernel: [ 251.527271] file: (null) fault: (null) mmap: (null) readpage: (null)
Jul 25 01:19:03 xx.server.com kernel: [ 251.527826] CPU: 28 PID: 27055 Comm: oraagent.bin Tainted: P B O 4.1.12-61.1.28.el6uek.x86_64 #2
Jul 25 01:19:03 xx.server.com kernel: [ 251.527828] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 02/17/2017
Jul 25 01:19:03 xx.server.com kernel: [ 251.527829] 0000000000000000 ffff893f15b83b78 ffffffff816c6e40 ffff893f1ed42320
Jul 25 01:19:03 xx.server.com kernel: [ 251.527831] 00007f0aa774b000 ffff893f15b83bc8 ffffffff811b7356 ffff89807ffd5e00
Jul 25 01:19:03 xx.server.com kernel: [ 251.527833] 0057790271c07800 ffff893f1a56ba50 18bc1b2a062f0055 ffff893f1a56ba58
Jul 25 01:19:03 xx.server.com kernel: [ 251.527835] Call Trace:
Jul 25 01:19:03 xx.server.com kernel: [ 251.527845] [<ffffffff816c6e40>] dump_stack+0x63/0x83
Jul 25 01:19:03 xx.server.com kernel: [ 251.527850] [<ffffffff811b7356>] print_bad_pte+0x1e6/0x280
Jul 25 01:19:03 xx.server.com kernel: [ 251.527861] [<ffffffff811b745e>] vm_normal_page+0x6e/0x80
Jul 25 01:19:03 xx.server.com kernel: [ 251.527869] [<ffffffff811b9866>] copy_pte_range+0x1f6/0x580
Jul 25 01:19:03 xx.server.com kernel: [ 251.527877] [<ffffffff811b9e76>] copy_page_range+0x286/0x4b0
Jul 25 01:19:03 xx.server.com kernel: [ 251.527881] [<ffffffff811c5a11>] ? anon_vma_chain_link+0x41/0x50
Jul 25 01:19:03 xx.server.com kernel: [ 251.527888] [<ffffffff81081f83>] dup_mmap+0x243/0x3e0
Jul 25 01:19:03 xx.server.com kernel: [ 251.527890] [<ffffffff810825a8>] dup_mm+0x98/0x110
Jul 25 01:19:03 xx.server.com kernel: [ 251.527891] [<ffffffff810839bd>] copy_process+0x11ed/0x1240
Jul 25 01:19:03 xx.server.com kernel: [ 251.527893] [<ffffffff81083ea9>] do_fork+0x79/0x280
Jul 25 01:19:03 xx.server.com kernel: [ 251.527897] [<ffffffff810259d3>] ? syscall_trace_enter_phase1+0x153/0x180
Jul 25 01:19:03 xx.server.com kernel: [ 251.527899] [<ffffffff810840c6>] SyS_clone+0x16/0x20
Jul 25 01:19:03 xx.server.com kernel: [ 251.527914] [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71
Jul 25 01:19:03 xx.server.com kernel: [ 251.530125] swap_free: Bad swap offset entry 1100800e000500

3. vmcore generated with similar call traces but same 'pte' errors like in the example below:

log

[108358.597469] swap_free: Unused swap offset entry 00008080
[108358.597794] BUG: Bad page map in process oracle pte:01010000pmd:173e801e067
[108358.598216] addr:0000003484d10000 vm_flags:08000070 anon_vma: (null) mapping:ffff8979a1e8a0e0 index:310
[108358.598906] file:libc-2.12.so fault:ext4_filemap_fault [ext4] mmap:ext4_file_mmap [ext4] readpage:ext4_readpage [ext4]
[108358.599567] CPU: 25 PID: 20326 Comm: oracle Tainted: P O 4.1.12-61.1.28.el6uek.x86_64 #2
[108358.599569] Hardware name: HP ProLiant DL580 Gen9/ProLiant DL580 Gen9, BIOS U17 02/17/2017
[108358.599570] 0000000000000000 ffff896f194d3b18 ffffffff816c6e40ffff8974085167d0
[108358.599573] 0000003484d10000 ffff896f194d3b68 ffffffff811b7356ffff896f194d3b48
[108358.599575] 0000000000008080 ffff896f194d3b68 0000003484d11000ffff8973e801e880
[108358.599577] Call Trace:
[108358.599592] [<ffffffff816c6e40>] dump_stack+0x63/0x83
[108358.599598] [<ffffffff811b7356>] print_bad_pte+0x1e6/0x280
[108358.599600] [<ffffffff811b7b5a>] zap_pte_range+0x4ca/0x510
[108358.599604] [<ffffffff811b8efd>] unmap_page_range+0x1cd/0x300
[108358.599606] [<ffffffff811b90b7>] unmap_single_vma+0x87/0x100
[108358.599608] [<ffffffff811b9624>] unmap_vmas+0x54/0xa0
[108358.599612] [<ffffffff811bf0ba>] exit_mmap+0x9a/0x150
[108358.599619] [<ffffffff81082473>] mmput+0x73/0x110
[108358.599623] [<ffffffff810875ac>] exit_mm+0x13c/0x1d0
[108358.599631] [<ffffffff8112c11c>] ? __audit_free+0x1cc/0x220
[108358.599633] [<ffffffff810877f4>] do_exit+0x1b4/0x510
[108358.599636] [<ffffffff8106de9b>] ? __do_page_fault+0x18b/0x480
[108358.599638] [<ffffffff8112c21c>] ? __audit_syscall_entry+0xac/0x110
[108358.599643] [<ffffffff8102587c>] ? do_audit_syscall_entry+0x6c/0x70
[108358.599646] [<ffffffff81087ba6>] do_group_exit+0x56/0x100
[108358.599648] [<ffffffff81087c67>] SyS_exit_group+0x17/0x20
[108358.599652] [<ffffffff816cbb2e>] system_call_fastpath+0x12/0x71
[108358.599659] swap_free: Bad swap offset entry 22a622a9a07f00

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms