My Oracle Support Banner

ECE Coherence Cluster is in ENDANGERED State during Rolling Upgrade and Does Not Recover (Doc ID 2536501.1)

Last updated on MAY 19, 2023

Applies to:

Oracle Coherence - Version 12.2.1.3.0 and later
Oracle Communications BRM - Elastic Charging Engine - Version 11.3.0.0.0 and later
Information in this document applies to any platform.

Symptoms

A customer reported that in an ECE environment after a node/server, ecsN, restarted it got into long (20min+) 'Partition Transferring...' state but it is still in the HA status, MACHINE_SAFE, and the cache counts are still matched the expectations. Given that it's in the HA status, MACHINE_SAFE, and the user tried restarting another ecs node/server, ecsN+1, on a different machine, and it gets some negative messages in the logs something like 'transferring already in progress...', and it becomes ENDANGERED_STATE and the cache counts show lost about Nk entries. After stopping ‘ecsN’ that has been in 'Partition transferring...' state for a longer time, it becomes MACHINE_SAFE again but the cache counts show lost about (N-n) k.

During rolling restart getting following messages, Current partition distribution has been pending for over 2133 seconds;

 



Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.