Last updated on AUGUST 01, 2016
Applies to:Solaris Cluster - Version OSC 4.0 to OSC 4.2 [Release 4.0 to 4.2]
Solaris Operating System - Version 11.2 to 11.2 [Release 11.0]
Oracle Solaris on x86-64 (64-bit)
Oracle Solaris on SPARC (64-bit)
In this case Solaris Cluster 4.2 is configured to failover zone with the HA-Container agent. The HA-Container resources get randomly restarted due to probe failure with no apparent error or reason justifitying the probe failure in messages files or any other log.
In such cases, it would be best to enable debugging following the section
Enabling debug for Solaris Containers HA Data Service (aka zone, sczbt)
in: <Document 1010497.1>
When the problem is faced, After increasing debug level for the resource, you would start to get these on next restart:
In case the zone had been up and running for days before the problem, the above debug output could be giving misleading information.
Since the start_script is only used during start method execution, if the start completed long time before, and yet the start_sczbt script was actually still running, the probe shall have triggered restart much earlier.
In order to debug the problem and verify that the issue is caused by Bug 19930993 as described below, you might want to do further troubleshooting (eg using truss on the probe script themselves). Due to lack of explicit errors, It is recommended that a Service Request is opened to get guidance on such troubleshooting.
Starting with Solaris 11 a new version of ksh based on opensource ksh93 code is being delivered (different from older ksh used in Solaris 10).
Any system running Solaris 11 (in conjunction with Solaris Cluster and failover zones) could be affected but it appears that the problem happens more frequently with Solaris 11.2.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
Million Knowledge Articles and hundreds of Community platforms