My Oracle Support Banner

Siebel Process (siebmtshmw or flush) Hang Issue With High %IOWAIT Running On Exalogic Elastic Cloud Software (EECS) 1.x Releases (Doc ID 1472663.1)

Last updated on OCTOBER 31, 2017

Applies to:

Oracle Exalogic Elastic Cloud Software - Version 1.0.0.0.0 to 1.0.0.2.4
Information in this document applies to any platform.

Symptoms

In Exalogic environments with Exalogic Elastic Cloud Software (EECS) 1.0 version when running Siebel applications together with potentially other products from the Oracle Fusion Middleware product family such as Oracle Business Process Management (BPM)/Oracle SOA Suite (SOA)/Oracle Data Integrator (ODI), problem of Siebel process getting hung is observed.

At the time of the issue while monitoring the system by executing iostat -c command over time you see that the value reported for %iowait is continually growing and reaches a very high percentage on the Exalogic compute node on which Siebel is running. In addition, output from the ps command shows many Siebel related process have reached a defunct state.

Below is snippet of outputs you can observe using following commands at the time of the issue.

  1. Output from iostat -c shows %iowait increasing over time:

    [root@elorl01cn09 ~]# iostat -c
    Linux 2.6.32-200.21.1.el5uek (elorl01cn09)      07/04/2012
     
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              0.00    0.00  0.03    0.00     0.00     99.96

    [root@elorl01cn09 ~]# iostat -c
    Linux 2.6.32-200.21.1.el5uek (elorl01cn09)      07/04/2012
      
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              0.17    0.17  0.47    12.61    0.00     86.58

    [root@elorl01cn09 ~]# iostat -c
    Linux 2.6.32-200.21.1.el5uek (elorl01cn09)      07/04/2012
     
    avg-cpu:  %user   %nice %system %iowait  %steal   %idle
              0.11    0.05  0.11    21.93    0.00     77.80

  2. Output from ps -ef|grep 'defunct' shows Siebel related process, such as siebmtshmw, have reached a defunct state:

    F S UID        PID  PPID  C PRI  NI ADDR SZ WCHAN  STIME TTY          TIME CMD
    0 Z 501       5785     1  0  80   0 -     0 exit   May11 ?        00:00:34 [siebmtshmw] <defunct>
    0 Z 501       5794     1  0  80   0 -     0 exit   May11 ?        00:00:42 [siebmtshmw] <defunct>
    0 Z 501       5815     1  0  80   0 -     0 exit   May11 ?        00:00:29 [siebmtshmw] <defunct>
    0 Z 501       5836     1  0  80   0 -     0 exit   May11 ?        00:00:38 [siebmtshmw] <defunct>
    0 Z 501       5864     1  0  80   0 -     0 exit   May11 ?        00:00:38 [siebmtshmw] <defunct>
    0 Z 501       5871     1  0  80   0 -     0 exit   May11 ?        00:00:42 [siebmtshmw] <defunct>
    0 Z 501       5893     1  0  80   0 -     0 exit   May11 ?        00:00:45 [siebmtshmw] <defunct>
    0 Z 501       5898     1  0  80   0 -     0 exit   May11 ?        00:00:44 [siebmtshmw] <defunct>

  3. If you have configured detection and logging of hung process to /etc/syslog, you may find instances of blocked siebmtshmw and NFS flush process with the following call trace signatures:

    1. siebmtshmw process signature:  

      May 14 05:44:39 exlfcn13 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      May 14 05:44:40 exlfcn13 kernel: siebmtshmw    D 0000000000000003     0  5899      1 0x00020080
      May 14 05:44:40 exlfcn13 kernel:  ffff8807514dbc18 0000000000000086 0000000000000000 ffffffffb056a2a8
      May 14 05:44:40 exlfcn13 kernel:  ffff880768e482c0 ffff880c49de8400 ffff880768e48698 000000011d53e10d
      May 14 05:44:40 exlfcn13 kernel:  00000000514dbca8 0000000000000000 0000000000000000 ffff880768e482c0
      May 14 05:44:40 exlfcn13 kernel: Call Trace:
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffff814376d5>] io_schedule+0x42/0x5c
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffffa03eb78a>] nfs_wait_bit_uninterruptible+0xe/0x12 [nfs]
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffff81437c04>] __wait_on_bit+0x4a/0x7c
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffffa03eb77c>] ? nfs_wait_bit_uninterruptible+0x0/0x12 [nfs]
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffffa03eb77c>] ? nfs_wait_bit_uninterruptible+0x0/0x12 [nfs]
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffff81437ca9>] out_of_line_wait_on_bit+0x73/0x80
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffff81075ab9>] ? wake_bit_function+0x0/0x2f
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffffa03eb77a>] nfs_wait_on_request+0x2b/0x2d [nfs]
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffffa03ef9b2>] nfs_updatepage+0x149/0x3ae [nfs]
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffffa03e30ce>] nfs_vm_page_mkwrite+0xc8/0xe8 [nfs]
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffff810ee995>] do_wp_page+0x1bf/0x5ff
      May 14 05:44:40 exlfcn13 kernel:  [<ffffffff8122fa38>] ? cpumask_next+0x19/0x1b
      May 14 05:44:41 exlfcn13 kernel:  [<ffffffff810f0612>] handle_mm_fault+0x75c/0x7d9
      May 14 05:44:41 exlfcn13 kernel:  [<ffffffff810425b3>] ? need_resched+0x23/0x2d
      May 14 05:44:41 exlfcn13 kernel:  [<ffffffff8143b196>] do_page_fault+0x200/0x26c
      May 14 05:44:41 exlfcn13 kernel:  [<ffffffff81439175>] page_fault+0x25/0x30

    2. flush process signature

      May 14 05:46:41 exlfcn13 kernel: INFO: task flush-0:25:5228 blocked for more than 120 seconds.
      May 14 05:46:41 exlfcn13 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      May 14 05:46:41 exlfcn13 kernel: flush-0:25    D 000000000000000e     0  5228      2 0x00000080
      May 14 05:46:41 exlfcn13 kernel:  ffff8806923879e0 0000000000000046 ffff8806923879a0 ffffffffb056a2a8
      May 14 05:46:41 exlfcn13 kernel:  ffff880228cf6240 ffff880c40cb0040 ffff880228cf6618 ffff880028300000
      May 14 05:46:41 exlfcn13 kernel:  ffff880caa4011e8 ffff880692387a70 ffff8806923879c0 ffff880228cf6240
      May 14 05:46:41 exlfcn13 kernel: Call Trace:
      May 14 05:46:41 exlfcn13 kernel:  [<ffffffff814376d5>] io_schedule+0x42/0x5c
      May 14 05:46:41 exlfcn13 kernel:  [<ffffffff810d5a35>] sync_page+0x49/0x4d
      May 14 05:46:41 exlfcn13 kernel:  [<ffffffff81437af2>] __wait_on_bit_lock+0x47/0x8f
      May 14 05:46:41 exlfcn13 kernel:  [<ffffffff810d59ec>] ? sync_page+0x0/0x4d
      May 14 05:46:41 exlfcn13 kernel:  [<ffffffff810d59a6>] __lock_page+0x69/0x70
      May 14 05:46:41 exlfcn13 kernel:  [<ffffffff81075ab9>] ? wake_bit_function+0x0/0x2f
      May 14 05:46:41 exlfcn13 kernel:  [<ffffffff810425cb>] ? should_resched+0xe/0x2f
      May 14 05:46:41 exlfcn13 kernel:  [<ffffffff810dcbdf>] write_cache_pages+0x1d0/0x35e
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffffa03ef1ed>] ? nfs_writepages_callback+0x0/0x25 [nfs]
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffffa03ef1a5>] nfs_writepages+0x9c/0xe4 [nfs]
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffffa03f009d>] ? nfs_flush_one+0x0/0xc3 [nfs]
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff810dcdb8>] do_writepages+0x21/0x2a
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff81137020>] writeback_single_inode+0xc9/0x1e4
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff8113787e>] writeback_inodes_wb+0x30d/0x382
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff81137a24>] wb_writeback+0x131/0x1bc
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff810ec131>] ? bdi_start_fn+0x0/0xd1
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff81137c8d>] wb_do_writeback+0x13c/0x153
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff810ec131>] ? bdi_start_fn+0x0/0xd1
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff81137cd0>] bdi_writeback_task+0x2c/0xb3
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff810ec19b>] bdi_start_fn+0x6a/0xd1
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff810756d3>] kthread+0x6e/0x76
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff81012dea>] child_rip+0xa/0x20
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff81075665>] ? kthread+0x0/0x76
      May 14 05:46:42 exlfcn13 kernel:  [<ffffffff81012de0>] ? child_rip+0x0/0x20 
NOTE:

Hung process detection feature can be activated by setting kernel.hung_task_timeout_secs to a non zero (seconds) value. This can be done using one of below steps.

i) Using the sysctl command to show and/or dynamically change the value as shown below

[root@elorl01cn09 ~]# sysctl -w kernel.hung_task_timeout_secs=180
kernel.hung_task_timeout_secs = 180

[root@elorl01cn09 ~]# sysctl kernel.hung_task_timeout_secs
kernel.hung_task_timeout_secs = 180

(or)

ii) By editing /etc/sysctl.conf and appending a line such as the following, to leave the setting configured across later restarts:

kernel.hung_task_timeout_secs = 180

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.