CPU Softlock Messages with 'kernel: NMI watchdog' and OfsFcbPruneVCB in the Stack

(Doc ID 2401874.1)

Last updated on MAY 29, 2018

Applies to:

Oracle Database - Enterprise Edition - Version 12.1.0.2 and later
Information in this document applies to any platform.

Symptoms

CPU appears hang with os message log indicating messages like 'kernel: NMI watchdog: BUG: soft lockup - CPU#29 stuck for 22s' etc when ACFS is mounted.

CPU softlock message is displayed with OfsFcbPruneVCB displayed in the stack trace for the CPU.

/var/log/messages:

Apr 27 11:35:20 rachost1 systemd: Removed slice User Slice of ogi4si.
Apr 27 11:35:20 rachost1 systemd: Stopping User Slice of ogi4si.
Apr 27 11:37:02 rachost1 rsyslogd-2066: could not load module '/usr/lib64/rsyslog/lmnsd_gtls.so', dlopen: /usr/lib64/rsyslog/lmnsd_gtls.so: cannot open shared object file: No such file or directory
[try http://www.rsyslog.com/e/2066 ]
Apr 27 11:38:29 rachost1 kernel: NMI watchdog: BUG: soft lockup - CPU#29 stuck for 22s! [ofsFcbPrCb:895]
Apr 27 11:38:29 rachost1 rsyslogd-2066: could not load module '/usr/lib64/rsyslog/lmnsd_gtls.so', dlopen: /usr/lib64/rsyslog/lmnsd_gtls.so: cannot open shared object file: No such file or directory
[try http://www.rsyslog.com/e/2066 ]
Apr 27 11:38:29 rachost1 kernel: Modules linked in: 8021q garp mrp stp llc nfsv3 nfs fscache oracleacfs(POE) oracleadvm(POE) oracleoks(POE) nf_log_ipv4 nf_log_common xt_LOG nf_conntrack_ipv4 nf_defrag_ipv4 bonding xt_conntrack nf_conntrack iptable_filter dm_service_time sb_edac edac_core intel_powerclamp coretemp intel_rapl iTCO_wdt iTCO_vendor_support iosf_mbi kvm irqbypass crc32_pclmul ghash_clmulni_intel aesni_intel dm_multipath lrw gf128mul glue_helper ablk_helper cryptd pcspkr ipmi_ssif hpilo hpwdt ioatdma shpchp i2c_i801 lpc_ich dca wmi ipmi_si ipmi_devintf ipmi_msghandler pcc_cpufreq acpi_power_meter nfsd auth_rpcgss sg nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs sd_mod mgag200 i2c_algo_bit lpfc bnx2x drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm crc32c_intel serio_raw crct10dif_generic
Apr 27 11:38:29 rachost1 kernel: crct10dif_pclmul hpsa crc_t10dif mdio scsi_transport_fc i2c_core ptp pps_core scsi_tgt scsi_transport_sas libcrc32c crct10dif_common dm_mirror dm_region_hash dm_log dm_mod
Apr 27 11:38:29 rachost1 kernel: CPU: 29 PID: 895 Comm: ofsFcbPrCb Tainted: P OE ------------ 3.10.0-693.1.1.el7.x86_64 #1
Apr 27 11:38:29 rachost1 kernel: Hardware name: HP ProLiant DL380 Gen9/ProLiant DL380 Gen9, BIOS P89 02/17/2017
Apr 27 11:38:29 rachost1 kernel: task: ffff8817a7458fd0 ti: ffff887629a34000 task.ti: ffff887629a34000
Apr 27 11:38:29 rachost1 kernel: RIP: 0010:[<ffffffff81332128>] [<ffffffff81332128>] lockref_put_or_lock+0x8/0x80
Apr 27 11:38:29 rachost1 kernel: RSP: 0018:ffff887629a37c88 EFLAGS: 00000286
.
Apr 27 11:38:29 rachost1 kernel: Stack:
Apr 27 11:38:29 rachost1 kernel: ffff88608e9e6300 ffff887629a37cb0 ffffffff812188c0 ffff88608e9e6300
Apr 27 11:38:29 rachost1 kernel: ffff8877bb27c6b0 ffff887629a37d28 ffffffffc0c9d9f0 ffff8877bb27c000
Apr 27 11:38:29 rachost1 kernel: ffff880109ac5210 ffff880109ac5080 ffff8873d600a480 00000000a7459648
Apr 27 11:38:29 rachost1 kernel: Call Trace:
Apr 27 11:38:29 rachost1 kernel: [<ffffffff812188c0>] dput+0x20/0xd0
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc0c9d9f0>] OfsFcbPruneVCB+0x5c0/0x6a0 [oracleacfs] <<<<<<<<<<<<<<<<<<<< HERE
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc0c9db7f>] OfsFcbPrunerCb+0xaf/0x1d0 [oracleacfs]
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc0c9dad0>] ? OfsFcbPruneVCB+0x6a0/0x6a0 [oracleacfs]
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc08c60aa>] KsOsdInvokeCallback+0x7a/0xc0 [oracleoks]
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc08fd0ea>] Ks_invokeCallback+0x7a/0x120 [oracleoks]
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc08fece3>] Ks_processRequests+0x443/0x920 [oracleoks]
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc08ff1c0>] ? Ks_processRequests+0x920/0x920 [oracleoks]
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc08ff27a>] Ks_main+0xba/0x120 [oracleoks]
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc08c600a>] KsKthreadRun+0x4a/0x70 [oracleoks]
Apr 27 11:38:29 rachost1 kernel: [<ffffffffc08c5fc0>] ? __KsPanic+0xb0/0xb0 [oracleoks]
Apr 27 11:38:29 rachost1 kernel: [<ffffffff810b098f>] kthread+0xcf/0xe0
Apr 27 11:38:29 rachost1 kernel: [<ffffffff810b08c0>] ? insert_kthread_work+0x40/0x40

os message log further also indicates processes going stuck ('D' state) messages

Apr 27 11:39:05 rachost1 kernel: ofsFcbPrCb R running task 0 895 2 0x00000088
.
Apr 27 11:40:10 rachost1 kernel: INFO: task oracle_52796_sa:52796 blocked for more than 120 seconds.
Apr 27 11:40:10 rachost1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Apr 27 11:40:10 rachost1 kernel: oracle_52796_sa D 00007fff5f049d48 0 52796 1 0x00000080
.
Apr 27 11:40:10 rachost1 kernel: oracle_53892_tc D 00007ffdc060f248 0 53892 1 0x00000080
.
Apr 27 11:40:10 rachost1 kernel: oracle_57538_tc D 00007ffd3ff97dc8 0 57538 1 0x00000080

 

Changes

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms