Federation Status in CONNECT_WAIT Status After Rolling Upgrade Server
(Doc ID 3072785.1)
Last updated on FEBRUARY 26, 2025
Applies to:
Oracle Communications BRM - Elastic Charging Engine - Version 12.0.0.4.0 and laterInformation in this document applies to any platform.
Symptoms
After Rolling Upgrade (RU) execution, federation goes to connect_wait status and some connection errors appear in coherence log.
ERROR
-----------------------
025-01-17 17:56:28 INFO Oracle Coherence 12.2.1.4.21 (thread=SelectionService(channels=31, selector=MultiplexedSelector(sun.nio.ch.EPollSelectorImpl@772caabe), id=), member=4): XRefFederatedCache at tmb://<HOST01.DOMAIN>:20092.59231 is disconnected from Participant ECE2, Member=N/A, with address tmb://<IP.244>:20090: java.io.IOException: Connection reset by peer
2025-01-17 17:56:28 INFO Oracle Coherence 12.2.1.4.21 (thread=XRefFederatedCache:DestinationController[ECE2]Worker:0, member=4): Exception connecting to ECE2: com.tangosol.internal.federation.service.bus.DisconnectedException: EndPoint tmb://<IP.244>:20090 for Participant ECE2 was disconnected; retried 48 times (last attempt at 2025-01-17T17:56:28.018-06:00[America/Costa_Rica]); retrying in 23040ms
RU server is a normal execution procedure, and in production, this takes 2 hours in each Elastic Charging Engine (ECE) cluster. The unique solution to get up federation connection is stop the cluster in one site and then replicate cache from another site in booth directions (site1 to site2 and then site2 to site1). Executing gridSync stop before RU is not a solution, because the RU time is very long in the production environment and it causes loss of traffic.
The issue persists after:
1. Applied latest <Patch 37351860> Coherence Cumulative Patch 12.2.1.4.24
2. Replaced name-service-addresses with remote-addresses in charging-coherence-override-qa.federated.xml and charging-coherence-override-qa.xml, referred to <Note 2486543.1>:
<federation-config>
<participants>
<participant>
<name>ECE1</name>
<remote-addresses>
<socket-address>
<address>HOST01</address>
<port>20090</port>
</socket-address>
<socket-address>
<address>HOST02</address>
<port>20090</port>
</socket-address>
</remote-addresses>
<initial-action>stop</initial-action>
<connect-timeout>5m</connect-timeout>
<send-timeout>15m</send-timeout>
</participant>
<participant>
<name>ECE2</name>
<remote-addresses>
<socket-address>
<address>HOST01</address>
<port>20090</port>
</socket-address>
<socket-address>
<address>HOST02</address>
<port>20090</port>
</socket-address>
</remote-addresses>
<initial-action>stop</initial-action>
<connect-timeout>5m</connect-timeout>
<send-timeout>15m</send-timeout>
</participant>
</participants>
<topology-definitions>
<active-active>
<name>Active</name>
<active>ECE1</active>
<active>ECE2</active>
</active-active>
</topology-definitions>
</federation-config>
Changes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |