My Oracle Support Banner

BRM CN PS3 IP3 > ECE Failed to Start After Losing One Kubernetes Worker (Doc ID 2751500.1)

Last updated on MAY 31, 2024

Applies to:

Oracle Communications BRM - Elastic Charging Engine - Version 12.0.0.3.0 and later
Information in this document applies to any platform.

Goal

During a system update in Kubernetes (K8s), one of the k8s workers lost connectivity with the cluster (etcd output bellow):

etcd: 2021-01-14 14:31:31.552860 I | embed: rejected connection from ":48674" (error "EOF", ServerName "")

One user has all pods running at (worker node) with issues. These pods include a Connection Manager (CM) pod replica, Pricing Design Center (PDC), Pipeline Configuration Center (PCC), and Elastic Charging Engine (ECE).

CM, PDC and PCC were recovered, and when the node became available in the cluster again, the ecs instances (running at this node) status changed from 0/1 running to 1/1 running.

However, the following with ECE are noted:
1) pricingUpdater was in status init and never started.
2) brmGateway was in status init and never started.
3) ECE charging server was in state "initial".
Which required manual actions in starting up ECE again. 

The user started configLoader job, the job was haning and in 1/1 running (but never Completed). At this point, ECE status was "ConfigDataLoading".  Since it was hanging, the job was forced to be deleted and started again. Again, the configLoader job hanged.  Finally, the user deleted ECE using helm and installed it again which resolved the issue.

Why ECE status was "Initial"? In a Production system, ECE should never reach "Inital" state.
 

Solution

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Goal
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.