Hardware failure on a physical node causes Coherence Cluster health became unstable
(Doc ID 2433996.1)
Last updated on AUGUST 16, 2018
Applies to:Oracle Coherence - Version 22.214.171.124.0 to 126.96.36.199.0 [Release 12c]
Information in this document applies to any platform.
Coherence cluster health became unstable due to one of the nodes' hardware failure. In the logs, user sees the soft timeout detected messages but cluster never recovered until they restarted the entire cluster. As a workaround, tweaked the ip-timeout/packet-timeout in the override XML file. User concern is why should a need to configure the death detection parameters? By default, the ip-timeout is total of 15 sec. Why not the failed member removed from the Coherence cluster after it reached the default timeout value?
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!