My Oracle Support Banner

Oracle Linux: Server Crashing Frequently with Call Trace "WARNING: CPU: 0 PID: 75317 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0" and "Workqueue: lpfc_wq lpfc_sli4_sp_process_cq [lpfc]" (Doc ID 2530485.1)

Last updated on MAY 24, 2020

Applies to:

Linux OS - Version Oracle Linux 7.5 and later
Information in this document applies to any platform.

Symptoms

Node running Oracle Linux 7.x and RHCK 3.10.0-862.6.3.el7.x86_64, panics and reboots frequently with the below Call Trace:

[140425.144338] WARNING: CPU: 0 PID: 75317 at lib/list_debug.c:59 __list_del_entry+0xa1/0xd0
[140425.144341] list_del corruption. prev->next should be ffff9c7c3ed1ac70, but was ffff9c7c3f2c0470 <<<<<<<<<<<<<<<<<<<<<
[140425.144344] Modules linked in: mptctl mptbase fuse bonding sunrpc ext4 mbcache jbd2 iTCO_wdt iTCO_vendor_support intel_powerclamp coretemp intel_rapl iosf_mbi kvm_intel kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd igb pcspkr ixgbe ipmi_ssif i2c_i801 hpwdt lpc_ich hpilo ioatdma ptp pps_core mdio dca ipmi_si ipmi_devintf ipmi_msghandler dm_service_time pcc_cpufreq shpchp acpi_power_meter wmi sg binfmt_misc ip_tables xfs libcrc32c sd_mod lpfc mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm nvmet_fc(T) nvmet dm_multipath crc_t10dif crct10dif_generic crct10dif_pclmul nvme_fc(T) crc32c_intel nvme_fabrics serio_raw nvme_core hpsa scsi_transport_fc i2c_core scsi_tgt scsi_transport_sas crct10dif_common dm_mirror
[140425.144414] dm_region_hash dm_log dm_mod
[140425.144421] CPU: 0 PID: 75317 Comm: kworker/0:3 Kdump: loaded Tainted: G W ------------ T 3.10.0-862.6.3.el7.x86_64 #1
[140425.144424] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 10/17/2018
[140425.144436] Workqueue: lpfc_wq lpfc_sli4_hba_process_cq [lpfc]
[140425.144438] Call Trace:
[140425.144447] [<ffffffffaa90e80e>] dump_stack+0x19/0x1b
[140425.144452] [<ffffffffaa291e18>] __warn+0xd8/0x100
[140425.144457] [<ffffffffaa291e9f>] warn_slowpath_fmt+0x5f/0x80
[140425.144471] [<ffffffffc02fabb9>] ? lpfc_sli_release_iocbq+0x49/0x60 [lpfc]
[140425.144476] [<ffffffffaa568e61>] __list_del_entry+0xa1/0xd0
[140425.144503] [<ffffffffc02f6df2>] lpfc_sli_iocbq_lookup_by_tag.isra.20+0x42/0xb0 [lpfc]
[140425.144508] [<ffffffffc02f6ed1>] lpfc_sli4_fp_handle_fcp_wcqe.isra.24+0x71/0x2f0 [lpfc]
[140425.144511] [<ffffffffaa2d895c>] ? update_curr+0x14c/0x1e0
[140425.144514] [<ffffffffaa2d52ce>] ? account_entity_dequeue+0xae/0xd0
[140425.144516] [<ffffffffaa2d8e4c>] ? dequeue_entity+0x11c/0x5e0
[140425.144522] [<ffffffffc02fad20>] ? lpfc_sli_abort_els_cmpl+0x150/0x150 [lpfc]
[140425.144527] [<ffffffffc02f7b52>] lpfc_sli4_fp_handle_cqe+0x242/0x4b0 [lpfc]
[140425.144529] [<ffffffffaa22959e>] ? __switch_to+0xce/0x580
[140425.144535] [<ffffffffc02f9359>] lpfc_sli4_hba_process_cq+0x99/0x1a0 [lpfc]
[140425.144537] [<ffffffffaa2b35ef>] process_one_work+0x17f/0x440
[140425.144539] [<ffffffffaa2b4686>] worker_thread+0x126/0x3c0
[140425.144542] [<ffffffffaa2b4560>] ? manage_workers.isra.24+0x2a0/0x2a0
[140425.144544] [<ffffffffaa2bb621>] kthread+0xd1/0xe0
[140425.144546] [<ffffffffaa2bb550>] ? insert_kthread_work+0x40/0x40
[140425.144549] [<ffffffffaa9205f7>] ret_from_fork_nospec_begin+0x21/0x21
[140425.144551] [<ffffffffaa2bb550>] ? insert_kthread_work+0x40/0x40
[140425.144552] ---[ end trace 7ae478793469d4e1 ]---
[140425.144557] ------------[ cut here ]------------

The log in the vmcore showed that there were many "list corruption errors" and "invalid memory references" before the system panic.

crash7latest> log|grep -i corruption
[140425.144341] list_del corruption. prev->next should be ffff9c7c3ed1ac70, but was ffff9c7c3f2c0470
[140425.144561] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).
[140425.145200] list_del corruption. prev->next should be ffff9c7c39374470, but was ffff9c7c3ed1ac70
[140425.145310] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).
[140426.128326] list_del corruption. prev->next should be ffff9c7c3ed18470, but was ffff9c7c39374470
[140426.128626] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).
[140426.129500] list_del corruption. prev->next should be ffff9c7c3f2c6870, but was ffff9c7c3ed18470
[140426.129609] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).
[140426.168343] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).
[140426.169140] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).
[140426.170016] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).
[140426.170708] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).
[140427.096337] list_add corruption. prev->next should be next (ffff9c7c3ac39150), but was ffff9c7c3848d200. (prev=ffff9c7c3848d200).

crash7latest> log|grep -i RIP
[140722.623891] RIP: 0010:[<0000000000000000>] [< (null)>] (null)
[140722.647345] Code: Bad RIP value.
[140722.648270] RIP [< (null)>] (null)

 

Changes

 No changes made

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.