Solaris Cluster Aborting Node/System/Server due to an 'unkillable process or method execution failure' - What Data to Collect? (Doc ID 1310528.1)

Last updated on APRIL 13, 2017

Applies to:

Solaris Cluster Geographic Edition - Version 3.2 12/06 to OSC 4.1 [Release 3.2 to 4.1]
Solaris Cluster - Version 3.2 12/06 to OSC 4.1 [Release 3.2 to 4.1]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on SPARC (32-bit)
Oracle Solaris on x86-64 (64-bit)
Oracle Solaris on x86 (32-bit)

Goal

To define what data needs to be collected when a Solaris Cluster node is aborted due to a method which is unkillable or fails to execute.

Example A) when method is unkillable:

This is the normal behavior of rgmd and it is working as designed. The node is aborted to recover from these critical conditions. This method nicely reboots the node.

However, modifying the behavior of rgmd to enable it to collect a system core file should allow Oracle Solaris Cluster technical support to perform the additional analysis required to root-cause the issue should the problem reoccur during a reasonable, post-event, monitoring period (1 - 2 weeks). A new SR could be opened when/if the problem does reoccur following that monitoring period, referencing the original.

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms