OCFS2 and SAN Interactions (Doc ID 603038.1)

Last updated on JUNE 27, 2009

Applies to:

Linux Kernel - Version: 1.0.0-1 to 1.2.9-1
Linux x86-64

Symptoms

You have a server cluster that uses the Oracle Cluster File System Version 2 (OCFS2) for shared data store access.  You can ping each cluster node from all other cluster nodes. 

The OCFS2 subsystem refuses to start; or you can start one node but starting the other nodes brings down running nodes; or you see slow OCFS2 performance.

What is wrong?

Changes

The great variety of valid cluster configurations prevents the selection of one magic timeout value that will work optimally in all situations. 

The connection between each cluster node and the external storage, including any intranet fabric, switches, cabling, adapter cards, and firewalls, all contribute to the disk heartbeat selection.  In particular, the number of alternate paths between the cluster node and the external storage must be a prime consideration in the choice of a disk heartbeat timeout.

Similarly, the topology of the cluster private network that carries the network heartbeat traffic must also be considered.  Using bonded Ethernet devices is a frequently overlooked source of network heartbeat latency.

Each node in an OCFS2 cluster must use exactly the same timeout settings.  The /etc/sysconfig/o2cb file must be identical on each node.  Using the ocfs2console tool to manage these settings ensures a consistent setup, cluster-wide.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms