Coherence Node Hangs Due to Task Backlog
(Doc ID 2431483.1)
Last updated on AUGUST 03, 2018
Applies to:Oracle Coherence - Version 18.104.22.168 and later
Information in this document applies to any platform.
Coherence cluster with 12 storage nodes which reside on 2 bare metal servers (6 storage nodes per server).
WLS Clients interact with cache by joining the cluster with localstorage=false option.
1. One of storage nodes start facing huge task backlog (~24k) for DistributedCache service.
2. Whole Coherence cluster becomes nearly unresponsive.
3. CPU usage drops from ~80% to 20% and lower.
4. No event logged in logs at around the time issue appeared.
5. To bring cluster back to normal, sometimes it is enough to kill bad storage node. Sometimes whole cluster needs to be restarted.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!