Solaris Cluster Method, Start, Stop or Probe (HA Data Service) Timeout. Tune fault monitor. Resolution Path.
(Doc ID 1020791.1)
Last updated on AUGUST 11, 2020
Applies to:Solaris Cluster - Version 3.2 U3 to 4.3 [Release 3.2 to 4.3]
Solaris Cluster Geographic Edition - Version 3.2 U3 to 4.3 [Release 3.2 to 4.3]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on x86-64 (64-bit)
Oracle Solaris on SPARC (32-bit)
This article is providing help with Solaris Cluster 3.x and 4.x method timeouts. It will help identifying new timeout value and show you how you can change it.
Applications that are being managed by cluster are also called services. Solaris Cluster manages, handles or interacts with services via methods. Methods can be shell scripts or binaries. What the actual method is will vary for each data service aka. agent.
Solaris Cluster runs all methods with a timeout to avoid any one step to not completing and hanging forever. All timeouts have a default value so even if you don't assign one it will take the default from service registration file. The defaults values are calculated for an average system running an average load.
In some cases cluster administrators want to change the timeout. For example, it might need to be increased if the system load is higher then when it was installed and the application takes longer to start now or shorten it if they deem the time is too long and want to see a cluster action happen faster.
Setting an appropriate timeout for methods is important for healthy cluster operation. Too short of a timeout can lead to false failovers and too long of a timeout can result in a failover completing too slow. To determine what is an appropriate timeout for each method there are no hard set rules. You need to run tests preferably under load and see how long it takes for your system to do it and then account for peak load and come up with a figure.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!