My Oracle Support Banner

Unexpected OCFS2 Ring Fencing and Restarts For Compute Nodes In An Exalogic Virtual System Receiving Deleting Projects/Filesystems or Receiving ZFS Replication Streams (Doc ID 1961508.1)

Last updated on JUNE 11, 2018

Applies to:

Oracle Exalogic Elastic Cloud Software - Version 2.0.4.0.0 and later
Linux x86-64
Oracle Virtual Server x86-64

Symptoms

You have an Exalogic system running Exalogic Elastic Cloud Software (EECS) version 2.0.4.x or 2.0.6.x in a virtual deployment, where you have encountered unexpected restarts across one or more compute nodes. At the time when the unexpected restarts occur the deletion of one or more Projects and/or Shares had been initiated on the rack's ZFS Appliance or the ZFS Appliance has been the target of snapshot data being replicated to it. In such circumstances the ZFS Appliance may stop responding to NFS operations for a period long enough to trigger the Oracle Clustered File System (OCFS2) software component to "ring fence" compute nodes off from the OCFS2 cluster by triggering them to automatically reboot.

To ensure the integrity of its cluster, each node within an OCFS2 cluster periodically updates it's node specific block in a clustered file that is updated by all nodes to show they are still alive. In an Exalogic system the clustered file system is mapped onto an NFSv3 file system and each compute node's update to the centralized "heartbeat" file translates to a write taking place to an NFSv3 mounted file system. If an OCFS2 monitor process running on each compute node does not see it node's update being successfully written over NFS, then it starts a timer to track the period since the last successful write took place. If the timer exceeds a configured threshold, which for Exalogic is 5 minutes (or 300000 milliseconds) then the monitor process will trigger the compute node to reboot itself in an attempt to recover from the NFS write problems so that it can successfully rejoin the OCFS2 cluster.

The presence of one or more of the following symptoms can be used to confirm that an unexpected compute node restart in an Exalogic system was triggered by the OCFS2 "ring-fencing" mechanism:

  1. Sustained o2hbmonitor "ping" failures being logged to /var/log/messages that indicate the last successful heartbeat update occurred more than 300000 milliseconds ago

    • Log messages similar to the following are logged to /var/log/messages every two seconds once ongoing ping failure have exceeded 50% of the period before which ring-fencing will occur:
      o2hbmonitor: Last ping 158098 msecs ago on /dev/dm-0, 0004FB0000050000D03AF3D7D3CD3818
       
    • The ping failures will continue at the rate of 1 new message every 2 seconds until it crosses the 29xxxx or 30xxxx threshold:
      Oct 18 06:03:48 elorl01cn01 o2hbmonitor: Last ping 158098 msecs ago on /dev/dm-0, 0004FB0000050000D03AF3D7D3CD3818
      Oct 18 06:03:50 elorl01cn01 o2hbmonitor: Last ping 160099 msecs ago on /dev/dm-0, 0004FB0000050000D03AF3D7D3CD3818
      Oct 18 06:03:52 elorl01cn01 o2hbmonitor: Last ping 162099 msecs ago on /dev/dm-0, 0004FB0000050000D03AF3D7D3CD3818
         :
      Oct 18 06:06:06 elorl01cn01 o2hbmonitor: Last ping 296117 msecs ago on /dev/dm-0, 0004FB0000050000D03AF3D7D3CD3818
      Oct 18 06:06:08 elorl01cn01 o2hbmonitor: Last ping 298117 msecs ago on /dev/dm-0, 0004FB0000050000D03AF3D7D3CD3818
      Oct 18 06:06:10 elorl01cn01 o2hbmonitor: Last ping 300118 msecs ago on /dev/dm-0, 0004FB0000050000D03AF3D7D3CD3818
       
      The last message in the sequence may indicate a ping failure 29xxxx or 30xxxx msecs ago (i.e. a ping failure at or beyond 30xxxx msecs is not always present).

  2.  The presence of the message "*** ocfs2 is very sorry to be fencing this system by restarting ***" in /var/log/messages or console output from the nodes ILOM

    • The last "ping" failure message in /var/log.messages is sometimes, but not always, followed by the message "*** ocfs2 is very sorry to be fencing this system by restarting ***" as seen below:
      ...
      Oct 23 02:53:55 elorl01cn16  o2hbmonitor: Last ping 297409 msecs ago on /dev/dm-0, 0004FB000005000090AC21601DA6DC2A
      Oct 23 02:53:57 elorl01cn16  o2hbmonitor: Last ping 299410 msecs ago on /dev/dm-0, 0004FB000005000090AC21601DA6DC2A
      Oct 23 02:53:58 elorl01cn16  kernel: (kworker/2:1,163,2):o2hb_write_timeout:304 ERROR: Heartbeat write timeout to device dm-0 after 300316 milliseconds
      Oct 23 02:53:58 elorl01cn16  kernel: (kworker/2:1,163,2):o2hb_stop_all_regions:2638 ERROR: stopping heartbeat on all active regions.
      Oct 23 02:53:58 elorl01cn16  kernel: *** ocfs2 is very sorry to be fencing this system by restarting ***
      Oct 23 02:58:48 elorl01cn16  syslogd 1.4.1: restart.
      Oct 23 02:58:48 elorl01cn16  kernel: klogd 1.4.1, log source = /proc/kmsg started.
      Oct 23 02:58:48 elorl01cn16  kernel: Initializing cgroup subsys cpuset
      Oct 23 02:58:48 elorl01cn16  kernel: Initializing cgroup subsys cpu
      Oct 23 02:58:48 elorl01cn16  kernel: Linux version 2.6.39-300.22.2.el5uek (mockbuild@ca-build56.us.oracle.com) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-52)) #1 SMP Fri Jan 4 12:40:29 PST 2013
      Oct 23 02:58:48 elorl01cn16  kernel: Command line: ro root=UUID=fdf7e3cf-1233-4120-aff4-5e4ea2a4d17c  console=tty console=ttyS0,9600n8 console=tty0 console=hvc0 swiotlb=131072
      ...
    • Whilst the above message is not always found in /var/log/messages, it is always found in the console log which is captured by the compute node's ILOM and can be found at the location ilom/@persist@hostconsole.log in a diagnostic bundle collected from the compute nodes ilom:
      > elorl01cn01-ilom_1302FML07F_2014-10-23T10-06-47/ilom/@persist@hostconsole.log:*** ocfs2 is very sorry to be fencing this system by restarting *** 


  3. Sustained log messages in /var/log/messages of the format 'nfs: server XXX.XXX.XX.XX not responding, still trying', indicating that the NFS server has been unresponsive

    • You will see one or more repetitions of log messages similar to the following, across all the compute nodes that ring fenced and rebooted:
      Oct 23 02:20:41 elorl01cn01 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:42 elorl01cn02 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:42 elorl01cn03 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:42 elorl01cn04 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn05 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn06 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn07 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn08 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:42 elorl01cn09 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn10 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn11 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn12 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:42 elorl01cn13 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:42 elorl01cn14 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn15 kernel: nfs: server 192.168.21.5 not responding, still trying
      Oct 23 02:20:41 elorl01cn16 kernel: nfs: server 192.168.21.5 not responding, still trying
       
      NOTE:

      • The above messages are indicative of the potential NFS performance problem that ocfs2 responds to by "ring fencing" in an effort to recover from.


  4. Log messages in /var/log/messages indicating blocked "o2hb" and "kworker" processes

    • For example:
      INFO: task o2hb-0004FB0000:9647 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       ffff8801f0b4dae0 0000000000000246 ffff8801f873cec0 0000000000012200
       ffff88021ade4380 ffff880223fc2200 ffff8801f0b4da30 ffffffffa008bd5a
       ffff8801f0b4daa0 ffff880221a78400 ffff880221a78400 ffff8801f0b4daa0
      Call Trace:
       [<ffffffffa008bd5a>] ? __clone_and_map_simple+0x6a/0x90 [dm_mod]
       [<ffffffffa008c084>] ? __clone_and_map+0x204/0x270 [dm_mod]
       [<ffffffff81235e90>] ? part_round_stats+0x40/0x70
       [<ffffffffa008d1f1>] ? __split_and_process_bio+0xc1/0x110 [dm_mod]
       [<ffffffff815061e5>] schedule+0x45/0x60
       [<ffffffff8150663c>] schedule_timeout+0x19c/0x2e0
       [<ffffffffa008d43d>] ? dm_request+0x4d/0x50 [dm_mod]
       [<ffffffff81239e70>] ? generic_make_request+0x260/0x4b0
       [<ffffffff811a0900>] ? __bio_add_page+0x180/0x240
       [<ffffffff8150606d>] wait_for_common+0x10d/0x160
       [<ffffffff81069150>] ? try_to_wake_up+0x180/0x180
       [<ffffffff8123a136>] ? submit_bio+0x76/0x120
       [<ffffffff8150619d>] wait_for_completion+0x1d/0x20
       [<ffffffffa021a324>] o2hb_wait_on_io+0x24/0x40 [ocfs2_nodemanager]
       [<ffffffffa021a3e6>] o2hb_read_slots+0xa6/0x160 [ocfs2_nodemanager]
       [<ffffffffa021cd97>] o2hb_do_disk_heartbeat+0x1e7/0x5d0 [ocfs2_nodemanager]
       [<ffffffff810410e9>] ? pvclock_clocksource_read+0x29/0x80
       [<ffffffff8107e1c0>] ? lock_timer_base+0x70/0x70
       [<ffffffffa021de40>] o2hb_thread+0x90/0x400 [ocfs2_nodemanager]
       [<ffffffff81507ffe>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
       [<ffffffff81069107>] ? try_to_wake_up+0x137/0x180
       [<ffffffff81069162>] ? default_wake_function+0x12/0x20
       [<ffffffff8105745b>] ? __wake_up_common+0x5b/0x90
       [<ffffffff81507ffe>] ? _raw_spin_unlock_irqrestore+0x1e/0x30
       [<ffffffffa021ddb0>] ? o2hb_alloc_hb_set+0x70/0x70 [ocfs2_nodemanager]
       [<ffffffff81090da7>] kthread+0x97/0xa0
       [<ffffffff81511884>] kernel_thread_helper+0x4/0x10
       [<ffffffff81510963>] ? int_ret_from_sys_call+0x7/0x1b
       [<ffffffff815084e1>] ? retint_restore_args+0x5/0x6
       [<ffffffff81511880>] ? gs_change+0x13/0x13
       
    • and:
      INFO: task kworker/u:8:15758 blocked for more than 120 seconds.
      "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      kworker/u:8     D 0000000000000000     0 15758      2 0x00000000
       ffff8801e15cb890 0000000000000246 ffff8801e15cb7a0 0000000000012200
       ffff88021c5001c0 ffff8801f4606140 ffff8801e15cb7f0 ffff88020d895988
       ffff8801e15cb7e0 ffffffff81041081 ffff8801e15cb7e0 00000000000011ee
      Call Trace:
       [<ffffffff81041081>] ? pvclock_get_nsec_offset+0x11/0x50
       [<ffffffff810410e9>] ? pvclock_clocksource_read+0x29/0x80
       [<ffffffff81239a7c>] ? blk_flush_plug_list+0x3c/0x180
       [<ffffffff815061e5>] schedule+0x45/0x60
       [<ffffffff81506271>] io_schedule+0x71/0xb0
       [<ffffffff8110f94e>] sleep_on_page+0xe/0x20
       [<ffffffff8150695a>] __wait_on_bit_lock+0x5a/0xb0
       [<ffffffff8110f940>] ? __lock_page+0x80/0x80
       [<ffffffff8110f92a>] __lock_page+0x6a/0x80
       [<ffffffff810918d0>] ? autoremove_wake_function+0x50/0x50
       [<ffffffff8110f7de>] ? find_get_page+0x1e/0x80
       [<ffffffff81110a51>] find_lock_page+0x61/0x80
       [<ffffffff81110ab7>] grab_cache_page_write_begin+0x47/0xe0
       [<ffffffffa0738f81>] nfs_write_begin+0x91/0x160 [nfs]
       [<ffffffff8110ef77>] generic_perform_write+0xc7/0x1a0
       [<ffffffff8110f07b>] generic_file_buffered_write_iter+0x2b/0x60
       [<ffffffff81111c0b>] __generic_file_write_iter+0x20b/0x3a0
       [<ffffffff811189f9>] ? __pagevec_free+0x49/0xa0
       [<ffffffff81111e25>] __generic_file_aio_write+0x85/0xa0
       [<ffffffff81111e9f>] generic_file_aio_write+0x5f/0xc0
       [<ffffffffa07ba5f7>] dm_nfs_kthread_iov_start+0x37/0x40 [dm_nfs]
       [<ffffffffa07ba7ac>] dm_nfs_kthread_biovec_start+0x14c/0x1b0 [dm_nfs]
       [<ffffffff811101e1>] ? __filemap_fdatawrite_range+0x61/0x70
       [<ffffffffa07ba87c>] dm_nfs_kthread_bio_start+0x6c/0x90 [dm_nfs]
       [<ffffffffa07baa0f>] dm_nfs_kthread_worker+0xcf/0x180 [dm_nfs]
       [<ffffffff8108c0e4>] process_one_work+0x104/0x390
       [<ffffffffa07ba940>] ? dm_nfs_map+0xa0/0xa0 [dm_nfs]
       [<ffffffff8108c66a>] worker_thread+0xca/0x240
       [<ffffffff8108c5a0>] ? rescuer_thread+0x1f0/0x1f0
       [<ffffffff81090da7>] kthread+0x97/0xa0
       [<ffffffff81511884>] kernel_thread_helper+0x4/0x10
       [<ffffffff81510963>] ? int_ret_from_sys_call+0x7/0x1b
       [<ffffffff815084e1>] ? retint_restore_args+0x5/0x6
       [<ffffffff81511880>] ? gs_change+0x13/0x13
       

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.