Coherence Node Hangs Due to Task Backlog
(Doc ID 2431483.1)
Last updated on OCTOBER 02, 2024
Applies to:
Oracle Coherence - Version 3.7.1.12 and laterInformation in this document applies to any platform.
Symptoms
Coherence cluster with 12 storage nodes which reside on 2 bare metal servers (6 storage nodes per server).
WLS Clients interact with cache by joining the cluster with localstorage=false option.
Problem Description:
1. One of storage nodes start facing huge task backlog (~24k) for DistributedCache service.
2. Whole Coherence cluster becomes nearly unresponsive.
3. CPU usage drops from ~80% to 20% and lower.
4. No event logged in logs at around the time issue appeared.
5. To bring cluster back to normal, sometimes it is enough to kill bad storage node. Sometimes whole cluster needs to be restarted.
Changes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |