Solaris Cluster Aborting Node/System/Server due to an 'unkillable process or method execution failure' - What Data to Collect?
(Doc ID 1310528.1)
Last updated on FEBRUARY 19, 2023
Applies to:
Solaris Cluster Geographic Edition - Version 3.2 to 4.1 [Release 3.2 to 4.1]Solaris Cluster - Version 3.2 to 4.1 [Release 3.2 to 4.1]
Oracle Solaris on SPARC (64-bit)
Oracle Solaris on SPARC (32-bit)
Oracle Solaris on x86-64 (64-bit)
Oracle Solaris on x86 (32-bit)
Goal
To define what data needs to be collected when a Solaris Cluster node is aborted due to a method which is unkillable or fails to execute.
Example A) when method is unkillable:
This is the normal behavior of rgmd and it is working as designed. The node is aborted to recover from these critical conditions. This method nicely reboots the node.
However, modifying the behavior of rgmd to enable it to collect a system core file should allow Oracle Solaris Cluster technical support to perform the additional analysis required to root-cause the issue should the problem reoccur during a reasonable, post-event, monitoring period (1 - 2 weeks). A new SR could be opened when/if the problem does reoccur following that monitoring period, referencing the original.
Solution
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |