Solaris Cluster Pingpong Avoidance Algorithms Explained (Doc ID 1003759.1)

Last updated on MARCH 02, 2017

Applies to:

Solaris Cluster Geographic Edition - Version 3.0 to 4.3 [Release 3.0 to 4.3]
Solaris Cluster - Version 3.0 to 4.3 [Release 3.0 to 4.3]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on x86-64 (64-bit)

Goal

If a Solaris Cluster resource is allowed to restart when it keeps failing on nodes of a cluster and switching back and forth, the resource group is said to be pingpong-ing.

To prevent such circumstances, the cluster framework implements pingpong avoidance algorithms to prevent switching a resource group back to a node where it has already failed.

In Sun Cluster there are two pingpong avoidance algorithms:
These algorithms are the rebalance and  pingpong_check algorithms. The rebalance algorithm is triggered when the resource group fails to start whereas the pingpong_check is triggered when a fault monitor for one of resources in the resource group detects a failure after the resource group has been successfully started.

This document describes the algorithms and their associated resources, and gives a procedure to demonstrate the algorithms in action.

To troubleshoot such a situation refer to:
<Document 1020354.1> Solaris Cluster Why Resource Group or Resource is pingpong-ing (failover repeatedly) Between Node/System/Server

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms