Solaris Cluster Why Resource Group or Resource is pingpong-ing (failover repeatedly) Between Node/System/Server
(Doc ID 1020354.1)
Last updated on AUGUST 06, 2020
Applies to:Solaris Cluster - Version 3.0 to 4.3 [Release 3.0 to 4.3]
Solaris Cluster Geographic Edition - Version 3.1 to 4.3 [Release 3.1 to 4.3]
Oracle Solaris on x86-64 (64-bit)
Oracle Solaris on SPARC (64-bit)
This document addresses failures to start a Solaris Cluster resource on any node.
A resource group is said to be pingpong-ing when one of its resources fails to start on any node, keeps failing to start on any node, and switches back and forth between the nodes.
The pingpong-ing is controlled by a resource group property named Pingpong_interval. It is used by the RGM (Resource Group Management) to determine where to bring the resource group online in the event of a reconfiguration or as a result of a scha_control call to give over a resource group.
If the resource group fails to start twice to come online on a node or zone within the past Pingpong_interval seconds, then the node or zone is considered ineligible to host the resource group. And it won't try to start again.
If Pingpong_interval in seconds is modified to a very low number, it will probably take more than this time to restart the service, so the condition of failing twice in this period will never happen. In this situation, the resource group will be migrating from one node to another (or being restarted on the same node) forever.
Any cluster command will complain because "The resource group is undergoing a configuration" and will not work.
It occurs when a resource's Start or Prenet_start method exits with a nonzero status or times out.
For further details of this behavior refer to:
<Document 1003759.1> Solaris Cluster Pingpong Avoidance Algorithms Explained
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document
|1) Verify that you are running Sun/Solaris Cluster any of the following methods should help verifying:|
|2) Then confirm that the resource is failing to start on any node.|
|3) Verify if rgmd is not attempting to start the resource group any longer.|
|4) Recover or Prevent the resource from pingpong-ing between nodes.|