Solaris Cluster 3.x Failover of Shared QFS Metadata Server (SUNW.qfs) Can Run into Timeout
Last updated on JANUARY 31, 2017
Applies to:Solaris Cluster - Version 3.2 12/06 to OSC 3.3 3/13 [Release 3.2 to 3.3]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on x86-64 (64-bit)
The following scenario can happen in a Solaris Cluster 3.x environment when using shared qfs with resource type SUNW.qfs and Oracle RAC.
If you have a cluster with a 'several' or 'a lot of' shared qfs then the reboot of one node can force an unwanted reboot of both nodes when the scqfs_prenet_start timeout is reached.
In following example node1 will be rebooted and node2 should become shared qfs metadata server. The messages show that scqfs_prenet_start was started at "21:05:45" (with a value of 300 seconds) and the timeout was reached at "21:10:51".
But keep in mind that when the node1 becomes metadata server again it is still in the shutdown process of the reboot. That means when node1 is totally down a new failover of metadata server to node2 would happen. Such situation can end in the automatic reboot of node2 or a panic.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms