My Oracle Support Banner

Btrfs list_add Corruption and Soft Lockups while Testing Writeback Error Handling. (Doc ID 2455550.1)

Last updated on OCTOBER 08, 2018

Applies to:

Linux OS - Version Oracle Linux 6.10 with Unbreakable Enterprise Kernel [4.1.12] to Oracle Linux 7.5 with Unbreakable Enterprise Kernel [4.1.12] [Release OL6U10 to OL7U5]
Linux x86-64

Symptoms

Host with BTRFS filesystems crashes occasionally getting list corruption logs.

The vmcore shows a list corruption in btrfs_sync_file. Example:

 

KERNEL: /share/linuxrpm/vmlinux_repo/64/4.1.12-94.3.8.el7uek.x86_64/vmlinux
DUMPFILE: SR_3-17605243111_vmcore [PARTIAL DUMP]
CPUS: 24
DATE: Thu May 31 08:46:02 2018
UPTIME: 12 days, 20:07:27
LOAD AVERAGE: 26.37, 25.69, 19.46
TASKS: 18080
NODENAME: server1
RELEASE: 4.1.12-94.3.8.el7uek.x86_64
VERSION: #2 SMP Fri Jun 30 10:40:13 PDT 2017
MACHINE: x86_64 (2925 Mhz)
MEMORY: 192 GB
PANIC: "Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff811f8a15" <<<<<<<<<
PID: 27235
COMMAND: "java"
TASK: ffff880800bfaa00 [THREAD_INFO: ffff880020e18000]
CPU: 10
STATE: TASK_RUNNING (PANIC)

crash64> bt -T
PID: 16976 TASK: ffff882a92a94600 CPU: 0 COMMAND: "java"
[ffff881ba48af3b0] mga_dirty_update at ffffffffa02cb56f [mgag200]
[ffff881ba48af410] mga_imageblit at ffffffffa02cb6af [mgag200]
[ffff881ba48af430] bit_putcs at ffffffff813b1e2a
[ffff881ba48af490] mga_imageblit at ffffffffa02cb6af [mgag200]
[ffff881ba48af520] sys_fillrect at ffffffffa021f1a8 [sysfillrect]
[ffff881ba48af540] mga_dirty_update at ffffffffa02cb56f [mgag200]
[ffff881ba48af5a0] mga_dirty_update at ffffffffa02cb56f [mgag200]
[ffff881ba48af5e0] mga_dirty_update at ffffffffa02cb56f [mgag200]
[ffff881ba48af610] mga_dirty_update at ffffffffa02cb56f [mgag200]
[ffff881ba48af670] mga_imageblit at ffffffffa02cb6af [mgag200]
[ffff881ba48af690] bit_putcs at ffffffff813b1e2a
[ffff881ba48af6f0] mga_imageblit at ffffffffa02cb6af [mgag200]
[ffff881ba48af780] sys_fillrect at ffffffffa021f1a8 [sysfillrect]
[ffff881ba48af7a0] mga_dirty_update at ffffffffa02cb56f [mgag200]
[ffff881ba48af800] mga_fillrect at ffffffffa02cb72f [mgag200]
[ffff881ba48af820] append_elf_note at ffffffff81114454
[ffff881ba48af890] crash_save_cpu at ffffffff81116099
[ffff881ba48af9a0] futex_wake at ffffffff81106b60
[ffff881ba48afa10] machine_kexec at ffffffff8105f1cb
[ffff881ba48afa80] crash_kexec at ffffffff81116252
[ffff881ba48afb08] futex_wake at ffffffff81106b60
[ffff881ba48afb50] oops_end at ffffffff8101b868
[ffff881ba48afb80] no_context at ffffffff8172b36b
[ffff881ba48afbe0] __bad_area_nosemaphore at ffffffff8172b453
[ffff881ba48afc30] bad_area at ffffffff8172b791
[ffff881ba48afc60] __do_page_fault at ffffffff8106e12c
[ffff881ba48afcd0] do_page_fault at ffffffff8106e280
[ffff881ba48afd10] page_fault at ffffffff8173af68
[ffff881ba48afd98] futex_wake at ffffffff81106b60
[ffff881ba48afdc0] futex_wake at ffffffff81106b3c
[ffff881ba48afe30] do_futex at ffffffff811096a2
[ffff881ba48afe40] seccomp_phase1 at ffffffff81144e2e
[ffff881ba48afed0] sys_futex at ffffffff81109c30
[ffff881ba48aff50] system_call_fastpath at ffffffff81738bee
RIP: 00007fd35aaa825a RSP: 00007fd0037b9770 RFLAGS: 00000206
RAX: ffffffffffffffda RBX: 00007fd19e5ef160 RCX: 00007fd35aaa825a
RDX: 0000000000000001 RSI: 0000000000000081 RDI: 00007fd19e5ef168
RBP: 00007fd0037b9890 R8: 0000000000000000 R9: 00000000000003c1
R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
R13: 0000000000000000 R14: 00007fd278dea000 R15: 00007fd19e5ef168
ORIG_RAX: 00000000000000ca CS: 0033 SS: 002b
crash64>

386684.182967] ------------[ cut here ]------------
[386684.182988] WARNING: CPU: 1 PID: 32382 at lib/list_debug.c:33
__list_add+0xb4/0xc0()
[386684.182992] list_add corruption. prev->next should be next
(ffff8817b6884b08), but was ffffc900192551c8. (prev=ffff882e691dfc28).
[386684.182993] Modules linked in: ip6table_filter ip6_tables veth nfsv3 nfs
fscache xt_nat ipt_MASQUERADE nf_nat_masquerade_ipv4 nf_conntrack_netlink
nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_
ipv4 nf_nat_ipv4 xt_addrtype iptable_filter xt_conntrack nf_nat nf_conntrack
br_netfilter bridge stp llc bonding rpcrdma ib_isert iscsi_target_mod ib_iser
libiscsi scsi_transport_iscsi ib_srpt target_c
ore_mod ib_srp scsi_transport_srp ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad
rdma_cm ib_cm iw_cm ib_sa ib_mad ib_core ib_addr xfs libcrc32c btrfs xor
raid6_pq intel_powerclamp coretemp kvm_intel kvm cr
ct10dif_pclmul crc32_pclmul ipmi_ssif ghash_clmulni_intel aesni_intel sg lrw
ipmi_si gf128mul iTCO_wdt glue_helper ablk_helper cryptd i7core_edac
acpi_power_meter ipmi_msghandler iTCO_vendor_support io
atdma shpchp
[386684.183044] edac_core lpc_ich i2c_i801 dca pcspkr mfd_core wmi
acpi_cpufreq i5500_temp nfsd auth_rpcgss nfs_acl lockd grace sunrpc
binfmt_misc ip_tables ext4 mbcache2 jbd2 dm_round_robin sd_mod mg
ag200 syscopyarea sysfillrect sysimgblt i2c_algo_bit drm_kms_helper ttm drm
fnic crc32c_intel libfcoe mptsas i2c_core libfc scsi_transport_sas mptscsih
dm_multipath scsi_transport_fc enic mptbase dm_mi
rror dm_region_hash dm_log dm_mod
[386684.183078] CPU: 1 PID: 32382 Comm: java Tainted: G W
4.1.12-94.3.8.el7uek.x86_64 #2
[386684.183080] Hardware name: Cisco Systems Inc N20-B6625-1/N20-B6625-1,
BIOS S5500.2.1.3.0.081620131102 08/16/2013
[386684.183082] 0000000000000286 00000000dc2585fa ffff8828824b3ad8 ffffffff81730f98
[386684.183084] ffff8828824b3b30 ffffffff81a3ae40 ffff8828824b3b18 ffffffff8108604a
[386684.183086] ffff8817b6884a9c ffff8828824b3c28 ffff8817b6884b08 ffff882e691dfc28
[386684.183088] Call Trace:
[386684.183100] [<ffffffff81730f98>] dump_stack+0x63/0x81
[386684.183108] [<ffffffff8108604a>] warn_slowpath_common+0x8a/0xc0
[386684.183110] [<ffffffff810860d5>] warn_slowpath_fmt+0x55/0x70
[386684.183114] [<ffffffff813580a4>] __list_add+0xb4/0xc0
[386684.183168] [<ffffffffa07b4efe>] btrfs_sync_log+0x27e/0x9a0 [btrfs] <<<<<<<<<
[386684.183183] [<ffffffffa0788c96>] btrfs_sync_file+0x386/0x3c0 [btrfs] <<<<<<<<<
[386684.183192] [<ffffffff8124611d>] vfs_fsync_range+0x3d/0xb0
[386684.183207] [<ffffffffa0788fb2>] btrfs_file_write_iter+0x2e2/0x5a0[btrfs] <<<<<<<<<
[386684.183214] [<ffffffff812129de>] __vfs_write+0xce/0x120 <<<<<<<<<
[386684.183217] [<ffffffff81213089>] vfs_write+0xa9/0x1b0
[386684.183222] [<ffffffff81026b1c>] ? do_audit_syscall_entry+0x6c/0x70
[386684.183225] [<ffffffff8121413a>] SyS_pwrite64+0x8a/0xc0
[386684.183229] [<ffffffff81738bee>] system_call_fastpath+0x12/0x71
[386684.183244] ---[ end trace b92ac42b1210cb20 ]---
[386684.183497] ------------[ cut here ]------------
[386684.183509] WARNING: CPU: 16 PID: 30845 at lib/list_debug.c:33__list_add+0xb4/0xc0()
[386684.183511] list_add corruption. prev->next should be next (ffff8817b6884b08), but was ffff8828824b3c98. (prev=ffff8828824b3c28). <<<<<<<<<

 

 

Changes

 N/A

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.