My Oracle Support Banner

Traffic Not Processed (Doc ID 2900041.1)

Last updated on FEBRUARY 15, 2024

Applies to:

Oracle Communications BRM - Elastic Charging Engine - Version 11.3.0.7.0 and later
Information in this document applies to any platform.

Symptoms

In a production system, starting at 17.03.2022 07:00, the throughput values visible on the monitoring systems went to 0 for all types of traffic.
High CPU load on Diameter GateWay (DGW)s were observed starting at the same minute.

Actions taken:
1. rolling upgrade DGW - 07:10 - 07:35; CPU usage on DGWs got back to normal.
2. rolling upgrade Elastic Charge Server (ECS) 07:39 - 08:39; traffic from network and simulated are not going through.
3. rolling upgrade DGW 08:40 - 09:00; traffic from network and simulated are still having errors.
4. stop DGWs / start DGWs 09:10; traffic from network resumed, successfully processed / begun getting out of degraded mode.

Before the problem has occurred, at 06:05, the recurrent PORT_OUT subscribers deletion processes started running.

Note: Rolling upgrade has been done as a restart to possibly mitigate the issue; No new versions were deployed.


Both DGW and ECS logs have parking to wait for some condition:

"BRMFederatedCacheWorker:3" #56 daemon prio=5 os_prio=0 tid=0x00007fd490471800 nid=0x3aa8 waiting on condition [0x00007fd4a058f000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00007fddfe805718> (at com.oracle.common.base.SingleWaiterCooperativeNotifier)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at com.oracle.common.base.Blocking.parkNanos(Blocking.java:153)
at com.oracle.common.base.Blocking.park(Blocking.java:137)
at com.oracle.common.base.SingleWaiterMultiNotifier.await(SingleWaiterMultiNotifier.java:50)
at com.oracle.common.base.SingleWaiterCooperativeNotifier.await(SingleWaiterCooperativeNotifier.java:49)
at com.tangosol.coherence.component.net.Poll.waitCompletion(Poll.CDB:6)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:34)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.poll(Grid.CDB:1)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$BinaryMap.sendPartitionedRequest(PartitionedCache.CDB:57)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$BinaryMap.query(PartitionedCache.CDB:21)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$BinaryMap.entrySet(PartitionedCache.CDB:6)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$ViewMap.entrySet(PartitionedCache.CDB:59)
at com.tangosol.coherence.component.util.SafeNamedCache.entrySet(SafeNamedCache.CDB:1)
at oracle.communication.brm.charging.util.coherence.internal.CoherenceTemplateImpl.getAllValues(CoherenceTemplateImpl.java:308)
at oracle.communication.brm.charging.processor.update.internal.coherence.CoherenceUpdateRemoveCustomerActivity.removeBalanceAndBillingTriggeredCycleInfo(CoherenceUpdateRemoveCustomerActivity.java:171)
at oracle.communication.brm.charging.processor.update.internal.coherence.CoherenceUpdateRemoveCustomerActivity.process(CoherenceUpdateRemoveCustomerActivity.java:130)
at sun.reflect.GeneratedMethodAccessor506.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)

 



Changes

Per a thread stack, a violator is started by handling the customer account deletion, which seems to be matching what the user is doing for 'deletes the PORT_OUT accounts'. But the implementation creates 'Coherence Reentrant problem' or a common term 'self-deadlock' for the Coherence Cache service.

The BRMFederatedCacheWorker cache service thread while already running with BRMFederatedCache service as an EntryProcessor itself, and it is also invoking the same BRMFederatedCache cache service via Coherence API getAllValues(). The getAllValues() will hit all ECE servers and for each ECE server including the invoker itself, the same threads with BRMFederatedCache service is also used.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.