Solaris Cluster 3.x Failover of Shared QFS Metadata Server (SUNW.qfs) Can Run into Timeout (Doc ID 1377401.1)

Last updated on JANUARY 31, 2017

Applies to:

Solaris Cluster - Version 3.2 12/06 to OSC 3.3 3/13 [Release 3.2 to 3.3]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on x86-64 (64-bit)

Symptoms

The following scenario can happen in a Solaris Cluster 3.x environment when using shared qfs with resource type SUNW.qfs and Oracle RAC.

If you have a cluster with a 'several' or 'a lot of' shared qfs then the reboot of one node can force an unwanted reboot of both nodes when the scqfs_prenet_start timeout is reached.

In following example node1 will be rebooted and node2 should become shared qfs metadata server. The messages show that scqfs_prenet_start was started at "21:05:45" (with a value of 300 seconds) and the timeout was reached at "21:10:51".

Example:

 

But keep in mind that when the node1 becomes metadata server again it is still in the shutdown process of the reboot. That means when node1 is totally down a new failover of metadata server to node2 would happen. Such situation can end in the automatic reboot of node2 or a panic.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms