Performance Issues When Using OCFS2 and Cluster Lock Errors Being Reported in the System Logs.

(Doc ID 2375753.1)

Last updated on APRIL 05, 2018

Applies to:

Linux OS - Version Oracle Linux 5.0 and later
Linux x86-64

Symptoms

OCFS2 mounts are encountering performance issues, similar to experiencing High I/O Wait, system load increases, and application processes are going into D state.

Below are a few examples that can be seen if your server is encountering this issue that is being worked on this document: 

 

1. When reviewing the performance at a system level, while the applications are running and using the OCFS2 mount points, the system experiences high Load Averages as seen below.

top - 07:56:43 up 3 days, 23:36, 1 user, load average: 5505.39, 5495.29, 5448.68
Tasks: 1302 total, 3 running, 1292 sleeping, 0 stopped, 7 zombie
%Cpu(s): 7.0 us, 4.2 sy, 0.0 ni, 88.6 id, 0.1 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 26358913+total, 11537800 free, 35044084 used, 21700724+buff/cache
KiB Swap: 3899388 total, 3899308 free, 80 used. 20804944+avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1360 appuser 20 0 11.274g 4.716g 14128 S 106.6 1.9 48:25.89 java
11620 appuser 20 0 11.399g 4.719g 14420 S 53.5 1.9 49:18.58 java
57278 root 39 19 35.493g 696080 17748 S 40.3 0.3 78:47.69 java
12191 appuser 20 0 13.557g 4.784g 13776 S 15.2 1.9 46:37.46 java
7902 root 20 0 0 0 0 S 13.2 0.0 0:17.44 kworker/u128:0
13865 root 20 0 0 0 0 R 10.9 0.0 0:49.35 kworker/u128:2
53086 root 20 0 0 0 0 D 9.9 0.0 2:19.76 kworker/u128:1
52416 appuser 20 0 5339468 297308 6640 S 2.6 0.1 10:13.73 java

 

2. The application processes that are encountering the issues when using the OCFS2 mounts are found to be going into D state and displaying, "ocfs2," and, "o2net," values under WCHAN.

   USER     PID     PPID  PRI %CPU %MEM VSZ RSS WCHAN  S  STARTED    TIME    COMMAND
appuser 72536 47074 19   0.0  0.0  21884 10740 ocfs2_   D 12:48:39 00:00:10 FNDLIBR
appuser 72535 47074 19   0.0  0.0  21708 10640 do_unl   D 12:48:39 00:00:05 FNDLIBR
appuser 71845 47074 19   0.0  0.0  21820 10796 o2net_  D 12:46:26 00:00:09 FNDLIBR
appuser 71844 47074 19   0.0  0.0  21868 10852 sk_wai  R 12:46:26 00:00:13 FNDLIBR
appuser 71843 47074 19   0.0  0.0  21692 10508 do_las  D 12:46:26 00:00:02 FNDLIBR

 

3. Listing the stack and wchan from the processes that which are under D state show below information which points back to OCFS2 Locks issues. (In this example PID 47676 is being viewed)

root@hostname ~]# cat /proc/47676/wchan
__ocfs2_cluster_lock.isra.36

[root@hostname ~]# cat /proc/47676/stack
[<ffffffffa09f64d1>] __ocfs2_cluster_lock.isra.36+0x231/0x9c0 [ocfs2]
[<ffffffffa09f7bfa>] ocfs2_inode_lock_full_nested+0x1da/0x530 [ocfs2]
[<ffffffffa0a1591e>] ocfs2_lookup_lock_orphan_dir.constprop.30+0x6e/0x1f0 [ocfs2]
[<ffffffffa0a15af1>] ocfs2_prepare_orphan_dir+0x51/0x2d0 [ocfs2]
[<ffffffffa0a182cb>] ocfs2_unlink+0x88b/0xd30 [ocfs2]
[<ffffffff8121f4b6>] vfs_unlink+0xe6/0x1a0
[<ffffffff81224782>] do_unlinkat+0x2e2/0x330
[<ffffffff812258e6>] SyS_unlink+0x16/0x20
[<ffffffff81740f25>] sysenter_dispatch+0x7/0x21
[<ffffffffffffffff>] 0xffffffffffffffff

 

4. Call traces can also be reported under /var/log/messages when the system is encountering this issue which also point back to OCFS2 Lock issues:

Jan 26 04:02:38 hostname rhnsd[23043]: running program /usr/sbin/rhn_check with PID 23043
Jan 26 07:07:43 hostname kernel: INFO: task cp:29026 blocked for more than 120 seconds.
Jan 26 07:07:43 hostname kernel: Not tainted 4.1.12-103.9.2.el7uek.x86_64 #2
Jan 26 07:07:43 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 26 07:07:43 hostname kernel: cp D ffff88203e517800 0 29026 28957 0x00000080
Jan 26 07:07:43 hostname kernel: ffff881d2b60b6c8 0000000000000086 ffff881ff8323800 ffff881ff1c2b800
Jan 26 07:07:43 hostname kernel: 0000000000000046 ffff881d2b60c000 ffff881d2b60b890 7fffffffffffffff
Jan 26 07:07:43 hostname kernel: ffff881ff1c2b800 0000000000000000 ffff881d2b60b6e8 ffffffff81739f17
Jan 26 07:07:43 hostname kernel: Call Trace:
Jan 26 07:07:43 hostname kernel: [<ffffffff81739f17>] schedule+0x37/0x90
Jan 26 07:07:43 hostname kernel: [<ffffffff8173cffc>] schedule_timeout+0x24c/0x2c0
Jan 26 07:07:43 hostname kernel: [<ffffffff810b65d2>] ? default_wake_function+0x12/0x20
Jan 26 07:07:43 hostname kernel: [<ffffffff810cce9b>] ? autoremove_wake_function+0x2b/0x40
Jan 26 07:07:43 hostname kernel: [<ffffffff810cc6c8>] ? __wake_up_common+0x58/0x90
Jan 26 07:07:43 hostname kernel: [<ffffffff8173ab04>] wait_for_completion+0x134/0x190
Jan 26 07:07:43 hostname kernel: [<ffffffff810b65c0>] ? wake_up_state+0x20/0x20
Jan 26 07:07:43 hostname kernel: [<ffffffffa0c534d1>] __ocfs2_cluster_lock.isra.36+0x231/0x9c0 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffffa0cb0b96>] ? ocfs2_buffer_cached.isra.6+0xb6/0x240 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffffa0c54bfa>] ocfs2_inode_lock_full_nested+0x1da/0x530 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffffa0c5500b>] ocfs2_inode_lock_tracker+0xbb/0x1c0 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffffa0ccc99d>] ocfs2_iop_get_acl+0x5d/0x25e [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffffa0c6e92d>] ? ocfs2_reserve_local_alloc_bits+0x8d/0x380 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffff812742e7>] get_acl+0x47/0x70
Jan 26 07:07:43 hostname kernel: [<ffffffff812749ca>] posix_acl_create+0x5a/0x160
Jan 26 07:07:43 hostname kernel: [<ffffffffa0c778f8>] ocfs2_mknod+0x938/0x1620 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffffa0c50599>] ? ocfs2_wake_downconvert_thread+0x49/0x50 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffffa0c787b6>] ocfs2_create+0x66/0x170 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffff8121fa55>] vfs_create+0xd5/0x140
Jan 26 07:07:43 hostname kernel: [<ffffffff81222c4d>] do_last+0x9ed/0x1270
Jan 26 07:07:43 hostname kernel: [<ffffffff8122355f>] path_openat+0x8f/0x630
Jan 26 07:07:43 hostname kernel: [<ffffffffa0c54999>] ? ocfs2_inode_unlock+0xa9/0x130 [ocfs2]
Jan 26 07:07:43 hostname kernel: [<ffffffff812250d9>] do_filp_open+0x49/0xc0
Jan 26 07:07:43 hostname kernel: [<ffffffff81351855>] ? find_next_zero_bit+0x25/0x30
Jan 26 07:07:43 hostname kernel: [<ffffffff81232297>] ? __alloc_fd+0xa7/0x130
Jan 26 07:07:43 hostname kernel: [<ffffffff81212517>] do_sys_open+0x137/0x240
Jan 26 07:07:43 hostname kernel: [<ffffffff8121263e>] SyS_open+0x1e/0x20
Jan 26 07:07:43 hostname kernel: [<ffffffff8173e1ae>] system_call_fastpath+0x12/0x71
Jan 26 07:09:43 hostname kernel: INFO: task java:31564 blocked for more than 120 seconds.
Jan 26 07:09:43 hostname kernel: Not tainted 4.1.12-103.9.2.el7uek.x86_64 #2
Jan 26 07:09:43 hostname kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jan 26 07:09:43 hostname kernel: java D ffff880fff797800 0 31564 27763 0x00000080
Jan 26 07:09:43 hostname kernel: ffff881ff6b67918 0000000000000086 ffff881ff8321c00 ffff881fe96b8e00
Jan 26 07:09:43 hostname kernel: ffff881ff6b67978 ffff881ff6b68000 ffff880fd85a4b2c ffff881fe96b8e00
Jan 26 07:09:43 hostname kernel: 00000000ffffffff ffff880fd85a4b30 ffff881ff6b67938 ffffffff81739f17
Jan 26 07:09:43 hostname kernel: Call Trace:
Jan 26 07:09:43 hostname kernel: [<ffffffff81739f17>] schedule+0x37/0x90
Jan 26 07:09:43 hostname kernel: [<ffffffff8173a24e>] schedule_preempt_disabled+0xe/0x10
Jan 26 07:09:43 hostname kernel: [<ffffffff8173c0a5>] __mutex_lock_slowpath+0xb5/0x130
Jan 26 07:09:43 hostname kernel: [<ffffffff8173c143>] mutex_lock+0x23/0x37
Jan 26 07:09:43 hostname kernel: [<ffffffffa0c94eea>] ocfs2_reserve_suballoc_bits+0x6a/0x570 [ocfs2]
Jan 26 07:09:43 hostname kernel: [<ffffffffa0c95a96>] ocfs2_reserve_new_inode+0x146/0x4a0 [ocfs2]
Jan 26 07:09:43 hostname kernel: [<ffffffffa0c7760f>] ocfs2_mknod+0x64f/0x1620 [ocfs2]
Jan 26 07:09:43 hostname kernel: [<ffffffffa0c787b6>] ocfs2_create+0x66/0x170 [ocfs2]
Jan 26 07:09:43 hostname kernel: [<ffffffff8121fa55>] vfs_create+0xd5/0x140
Jan 26 07:09:43 hostname kernel: [<ffffffff81222c4d>] do_last+0x9ed/0x1270
Jan 26 07:09:43 hostname kernel: [<ffffffff8122355f>] path_openat+0x8f/0x630
Jan 26 07:09:43 hostname kernel: [<ffffffffa0c54999>] ? ocfs2_inode_unlock+0xa9/0x130 [ocfs2]
Jan 26 07:09:43 hostname kernel: [<ffffffffa0c61d8f>] ? ocfs2_inode_revalidate+0x8f/0x1c0 [ocfs2]
Jan 26 07:09:43 hostname kernel: [<ffffffff812250d9>] do_filp_open+0x49/0xc0
Jan 26 07:09:43 hostname kernel: [<ffffffff81351855>] ? find_next_zero_bit+0x25/0x30
Jan 26 07:09:43 hostname kernel: [<ffffffff81232297>] ? __alloc_fd+0xa7/0x130
Jan 26 07:09:43 hostname kernel: [<ffffffff81212517>] do_sys_open+0x137/0x240
Jan 26 07:09:43 hostname kernel: [<ffffffff8121263e>] SyS_open+0x1e/0x20
Jan 26 07:09:43 hostname kernel: [<ffffffff8173e1ae>] system_call_fastpath+0x12/0x71

 

5. Updatedb can also be seen running and encountering issues on this system. Below errors can be recorded under /var/log/messages as well as the example #ps output which displays the running updatedb processes with the respective options.

Feb 25 04:39:58 hostname kernel: INFO: task updatedb:13249 blocked for more than 120 seconds.
Feb 25 04:41:58 hostname kernel: INFO: task updatedb:13249 blocked for more than 120 seconds.
Feb 25 04:43:58 hostname kernel: INFO: task updatedb:13249 blocked for more than 120 seconds.
Feb 25 04:45:58 hostname kernel: INFO: task updatedb:13249 blocked for more than 120 seconds.
Feb 25 04:47:58 hostname kernel: INFO: task updatedb:13249 blocked for more than 120 seconds.

root 13249 13243 0 1.7 0.0 108408 2636 o2net_ D 03:41:22 00:00:55 /usr/bin/updatedb -f sysfs ramfs bdev proc cpuset cgroup tmpfs devtmpfs configfs debugfs tracefs securityfs sockfs pipefs devpts hugetlbfs autofs pstore mqueue binfmt_misc rpc_pipefs nfsd asmdisk oracleasmfs nfs nfs4 ocfs2_dlmfs

 

Changes

 N/A

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms