ECE Coherence Cluster is in ENDANGERED State during Rolling Upgrade and Does Not Recover
(Doc ID 2536501.1)
Last updated on MAY 20, 2024
Applies to:
Oracle Coherence - Version 12.2.1.3.0 and laterOracle Communications BRM - Elastic Charging Engine - Version 11.3.0.0.0 and later
Information in this document applies to any platform.
Symptoms
A customer reported that in an ECE environment after a node/server, ecsN, restarted it got into long (20min+) 'Partition Transferring...' state but it is still in the HA status, MACHINE_SAFE, and the cache counts are still matched the expectations. Given that it's in the HA status, MACHINE_SAFE, and the user tried restarting another ecs node/server, ecsN+1, on a different machine, and it gets some negative messages in the logs something like 'transferring already in progress...', and it becomes ENDANGERED_STATE and the cache counts show lost about Nk entries. After stopping ‘ecsN’ that has been in 'Partition transferring...' state for a longer time, it becomes MACHINE_SAFE again but the cache counts show lost about (N-n) k.
During rolling restart getting following messages, Current partition distribution has been pending for over 2133 seconds;
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |