My Oracle Support Banner

Solaris Cluster Aborting Node/System/Server due to an 'unkillable process or method execution failure' - What Data to Collect? (Doc ID 1310528.1)

Last updated on FEBRUARY 06, 2019

Applies to:

Solaris Cluster Geographic Edition - Version 3.2 to 4.1 [Release 3.2 to 4.1]
Solaris Cluster - Version 3.2 to 4.1 [Release 3.2 to 4.1]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on SPARC (32-bit)
Oracle Solaris on x86-64 (64-bit)
Oracle Solaris on x86 (32-bit)

Goal

To define what data needs to be collected when a Solaris Cluster node is aborted due to a method which is unkillable or fails to execute.

Example A) when method is unkillable:

This is the normal behavior of rgmd and it is working as designed. The node is aborted to recover from these critical conditions. This method nicely reboots the node.

However, modifying the behavior of rgmd to enable it to collect a system core file should allow Oracle Solaris Cluster technical support to perform the additional analysis required to root-cause the issue should the problem reoccur during a reasonable, post-event, monitoring period (1 - 2 weeks). A new SR could be opened when/if the problem does reoccur following that monitoring period, referencing the original.

Solution

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.