My Oracle Support Banner

Oracle VM Server Panic at "kernel BUG at mm/slab.c:3059!" With mlx4_ib in Call Stack (Doc ID 2334610.1)

Last updated on AUGUST 04, 2018

Applies to:

Oracle VM - Version 3.2.9 and later
Linux x86-64

Symptoms

The Oracle VM Server restarted unexpectedly.

By checking ILOM console log, you could see system panic at __list_del_entry.

mlx4_core 0000:13:00.0: Received reset from slave:8
mlx4_core 0000:13:00.0: Received reset from slave:9
------------[ cut here ]------------
WARNING: at lib/list_debug.c:47 __list_del_entry+0x63/0xd0()
Hardware name: ORACLE SERVER X5-2
list_del corruption, ffff8801f499cf30->next is LIST_POISON1
(dead000000100100)
Modules linked in: iptable_filter ip_tables mptctl mptbase tun xen_pciback
dm_nfs xen_blkback xen_netback xen_gntdev xen_evtchn cdc_ether usbnet mii
i2c_dev i2c_core ipmi_devintf ipmi_si nfs fscache auth_rpcgss nfs_acl
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm lockd sunrpc bridge stp llc bonding
be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio
libiscsi_tcp libiscsi scsi_transport_iscsi rdma_ucm(U) ib_sdp(U) rdma_cm(U)
iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ipv6 ib_uverbs(U) ib_umad(U)
mlx4_vnic(U) mlx4_vnic_helper(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U)
mlx4_core(U) xenfs xen_privcmd ocfs2 jbd2 ocfs2_nodemanager configfs
ocfs2_stackglue video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler
parport_pc lp parport wmi ixgbe hwmon dca snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc ghes pcspkr hed dm_snapshot dm_zero
dm_mirror dm_region_hash dm_log dm_mod ahci libahci sg shpchp megaraid_sas
sd_mod crc_t10dif ext3 jbd mbcache
Pid: 211, comm: kworker/1:1 Not tainted 2.6.39-400.276.1.el5uek #1
Call Trace:
[<ffffffff81264a73>] ? __list_del_entry+0x63/0xd0
[<ffffffff8106f3a0>] warn_slowpath_common+0x90/0xc0
[<ffffffff8106f4ce>] warn_slowpath_fmt+0x6e/0x70
[<ffffffff81507834>] ? __schedule+0x364/0x6d0
[<ffffffff81264a73>] __list_del_entry+0x63/0xd0
[<ffffffff81264af1>] list_del+0x11/0x40
[<ffffffffa03a31fb>] id_map_ent_timeout+0xab/0x170 [mlx4_ib]
[<ffffffff8108c5e9>] process_one_work+0xf9/0x370
[<ffffffffa03a3150>] ? id_map_alloc+0x200/0x200 [mlx4_ib]
[<ffffffff8108cf2a>] worker_thread+0xca/0x240
[<ffffffff8108ce60>] ? manage_workers+0x90/0x90
[<ffffffff81091507>] kthread+0x97/0xa0
[<ffffffff81513584>] kernel_thread_helper+0x4/0x10
[<ffffffff81512683>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff8150a221>] ? retint_restore_args+0x5/0x6
[<ffffffff81513580>] ? gs_change+0x13/0x13
---[ end trace cea842e90ad165d1 ]---
------------[ cut here ]------------
kernel BUG at mm/slab.c:3059!
invalid opcode: 0000 [#1] SMP
CPU 0
Modules linked in: iptable_filter ip_tables mptctl mptbase tun xen_pciback
dm_nfs xen_blkback xen_netback xen_gntdev xen_evtchn cdc_ether usbnet mii
i2c_dev i2c_core ipmi_devintf ipmi_si nfs fscache auth_rpcgss nfs_acl
ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm lockd sunrpc bridge stp llc bonding
be2iscsi iscsi_boot_sysfs iscsi_tcp bnx2i cnic uio cxgb3i libcxgbi cxgb3 mdio
libiscsi_tcp libiscsi scsi_transport_iscsi rdma_ucm(U) ib_sdp(U) rdma_cm(U)
iw_cm(U) ib_addr(U) ib_ipoib(U) ib_cm(U) ipv6 ib_uverbs(U) ib_umad(U)
mlx4_vnic(U) mlx4_vnic_helper(U) mlx4_ib(U) ib_sa(U) ib_mad(U) ib_core(U)
mlx4_core(U) xenfs xen_privcmd ocfs2 jbd2 ocfs2_nodemanager configfs
ocfs2_stackglue video sbs sbshc acpi_memhotplug acpi_ipmi ipmi_msghandler
parport_pc lp parport wmi ixgbe hwmon dca snd_seq_dummy snd_seq_oss
snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_mixer_oss snd_pcm
snd_timer snd soundcore snd_page_alloc ghes pcspkr hed dm_snapshot dm_zero
dm_mirror dm_region_hash dm_log dm_mod ahci libahci sg shpchp megaraid_sas
sd_mod crc_t10dif ext3 jbd mbcache
Pid: 3976, comm: kworker/u:2 Tainted: G W 2.6.39-400.276.1.el5uek #1
Oracle Corporation ORACLE SERVER X5-2/ASM,MOTHERBOARD,1U
RIP: e030:[<ffffffff8115c14b>] [<ffffffff8115c14b>]
cache_alloc_refill+0x1ab/0x240
RSP: e02b:ffff8801e044bb90 EFLAGS: 00010046
RAX: 0000000000000014 RBX: 0000000000000034 RCX: ffff8802260212c0
RDX: ffff8801f4837000 RSI: ffff8802260212c0 RDI: ffff8802260212c0
RBP: ffff8801e044bbe0 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8801f499c040
R13: 0000000000000008 R14: ffff8802262ac000 R15: ffff8802260701c0
FS: 00007fc95adf66e0(0000) GS:ffff880226400000(0000)
knlGS:0000000000000000
CS: e033 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 000000376c2e705c CR3: 00000001ceb31000 CR4: 0000000000002660
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/u:2 (pid: 3976, threadinfo ffff8801e044a000, task
ffff8801f085e2c0)
Stack:
ffff8801e044bbb0 000000d08107ff2a ffff8802260212c0 000000000abc8000
ffff880226021300 ffff8802260701c0 0000000000000000 00000000000000d0
0000000000000200 00000000000000d0 ffff8801e044bc20 ffffffff8115c36c
Call Trace:
[<ffffffff8115c36c>] kmem_cache_alloc_trace+0x18c/0x1a0
[<ffffffffa03a2f8f>] id_map_alloc+0x3f/0x200 [mlx4_ib]
[<ffffffff81509cbe>] ? _raw_spin_lock+0xe/0x20
[<ffffffffa03a3360>] ? id_map_get+0xa0/0x190 [mlx4_ib]
[<ffffffffa03a3763>] mlx4_ib_multiplex_cm_handler+0xf3/0x1d0 [mlx4_ib]
[<ffffffffa0392894>] mlx4_ib_multiplex_mad+0x354/0x410 [mlx4_ib]
[<ffffffffa038e6f6>] ? get_sw_cqe+0x26/0x50 [mlx4_ib]
[<ffffffffa038f11d>] ? mlx4_ib_poll_one+0x33d/0x670 [mlx4_ib]
[<ffffffff81509d5e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[<ffffffffa038f99e>] ? mlx4_ib_poll_cq+0x8e/0xf0 [mlx4_ib]
[<ffffffffa0392f8f>] mlx4_ib_tunnel_comp_worker+0x8f/0x190 [mlx4_ib]
[<ffffffff8108c5e9>] process_one_work+0xf9/0x370
[<ffffffffa0392f00>] ? create_pv_sqp+0x340/0x340 [mlx4_ib]
[<ffffffff8108cf2a>] worker_thread+0xca/0x240
[<ffffffff8108ce60>] ? manage_workers+0x90/0x90
[<ffffffff81091507>] kthread+0x97/0xa0
[<ffffffff81513584>] kernel_thread_helper+0x4/0x10
[<ffffffff81512683>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff8150a221>] ? retint_restore_args+0x5/0x6
[<ffffffff81513580>] ? gs_change+0x13/0x13
Code: c9 c3 4c 8b 67 20 48 89 f8 c7 47 60 01 00 00 00 48 83 c0 20 4c 39 e0 74
a0 41 8b 87 18 80 00 00 41 39 44 24 20 0f 82 55 ff ff ff <0f> 0b eb fe 90 48
8b 45 c0 48 8b 75 c0 4c 89 e7 48 8b 50 10 48
RIP [<ffffffff8115c14b>] cache_alloc_refill+0x1ab/0x240
RSP <ffff8801e044bb90>
---[ end trace cea842e90ad165d2 ]---
Kernel panic - not syncing: Fatal exception
Pid: 3976, comm: kworker/u:2 Tainted: G D W 2.6.39-400.276.1.el5uek
#1
Call Trace:
[<ffffffff8106f5c4>] panic+0xd4/0x200
[<ffffffff81070f65>] ? kmsg_dump+0xb5/0x100
[<ffffffff8150af0c>] oops_end+0xbc/0x100
[<ffffffff8101877b>] die+0x5b/0x90
[<ffffffff8150a900>] do_trap+0x140/0x160
[<ffffffff81016785>] do_invalid_op+0x95/0xb0
[<ffffffff8115c14b>] ? cache_alloc_refill+0x1ab/0x240
[<ffffffff81507834>] ? __schedule+0x364/0x6d0
[<ffffffff81509d5e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[<ffffffff81509d04>] ? _raw_spin_lock_irqsave+0x34/0x50
[<ffffffff815133fb>] invalid_op+0x1b/0x20
[<ffffffff8115c14b>] ? cache_alloc_refill+0x1ab/0x240
[<ffffffff8115c36c>] kmem_cache_alloc_trace+0x18c/0x1a0
[<ffffffffa03a2f8f>] id_map_alloc+0x3f/0x200 [mlx4_ib]
[<ffffffff81509cbe>] ? _raw_spin_lock+0xe/0x20
[<ffffffffa03a3360>] ? id_map_get+0xa0/0x190 [mlx4_ib]
[<ffffffffa03a3763>] mlx4_ib_multiplex_cm_handler+0xf3/0x1d0 [mlx4_ib]
[<ffffffffa0392894>] mlx4_ib_multiplex_mad+0x354/0x410 [mlx4_ib]
[<ffffffffa038e6f6>] ? get_sw_cqe+0x26/0x50 [mlx4_ib]
[<ffffffffa038f11d>] ? mlx4_ib_poll_one+0x33d/0x670 [mlx4_ib]
[<ffffffff81509d5e>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
[<ffffffffa038f99e>] ? mlx4_ib_poll_cq+0x8e/0xf0 [mlx4_ib]
[<ffffffffa0392f8f>] mlx4_ib_tunnel_comp_worker+0x8f/0x190 [mlx4_ib]
[<ffffffff8108c5e9>] process_one_work+0xf9/0x370
[<ffffffffa0392f00>] ? create_pv_sqp+0x340/0x340 [mlx4_ib]
[<ffffffff8108cf2a>] worker_thread+0xca/0x240
[<ffffffff8108ce60>] ? manage_workers+0x90/0x90
[<ffffffff81091507>] kthread+0x97/0xa0
[<ffffffff81513584>] kernel_thread_helper+0x4/0x10
[<ffffffff81512683>] ? int_ret_from_sys_call+0x7/0x1b
[<ffffffff8150a221>] ? retint_restore_args+0x5/0x6
[<ffffffff81513580>] ? gs_change+0x13/0x13
(XEN) Domain 0 crashed: rebooting machine in 5 seconds.
(XEN) Resetting with ACPI MEMORY or I/O RESET_REG.
=============================================

By checking the installed rpm packages, you could see the version of kernel-ib and kernel-ib-devel are as below

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.