ECE Crashes During EMGTW High Availability Testing
(Doc ID 3038739.1)
Last updated on AUGUST 01, 2024
Applies to:
Oracle Communications BRM - Elastic Charging Engine - Version 12.0.0.8.0 and laterInformation in this document applies to any platform.
Symptoms
In a cloud native environment (CNE), Elastic Charging Engine crashes during EMGateway high availability (HA) testing.
There are 2 EMGateway replicas in PROD environment. The customer wants to perform HA test for EMGateway to validate the high availability of the system. Both gateways have been configured to run on separate worker nodes.
• Reducing replicas set for EMGateway by scaling down the stateful set of EMGateway
• Powering off the worker node running the EMGateway1
Following steps were performed during the HA test:
1. Updated the log levels to DEBUG for EMGateway and BRMGateway in override-values file of ECE and performed rolling upgrade.
2. Performed rolling upgrade for EMGateway and BRMGateway
3. EMGateway1 was running on worker node 1 and EMGateway2 was running on worker node 7
4. Scaled down the replica of EMGateway from 2 to 1
5. Order was fired from Siebel (updating credit limit for an existing customer account) and it passed through EMGateway1and was successful
6. Scaled up the replica of EMGateway from 1 to 2
7. worker node 1 was powered off through OLVM console, where EMGateway1 was running, while EMGateway2 was running on worker node 7
8. EMGateway was rescheduled to run on worker node 3 after worker node 1 was brought down
9. Order was fired from Siebel and it passed through EMGateway2 and was successful
10. EMGateway1 was running on worker node 3 and EMGateway2 was running on worker node 7
11. Powered on the worker node 1
12. Updated the log levels back to ERROR for EMGateway and brmgtw in override-values file of ECE and performed rolling upgrade.
13. Performed rolling upgrade for EMGateway and brmgtw
14. EMGateway2 was in crashloopback error, which was scheduled on worker node 1
15. Performed helm upgrade again by incrementing the restart count for EMGateway
16. EMGateway2 was in crashloopback error, which was scheduled on worker node 1
17. Deleted the EMGateway2 using kubectl delete but its didn’t come up (describe pod was showing backoff restarting failed)
18. Performed the rolling upgrade for EMGgateway
19. Validated the details for EMGateway in ECE cache and observed one of the EMGateway entires for EMGateway2 was missing
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |