Federation Cluster Persistence Enabled Couldn't Recover After Restart Throws JournalRecordGCDaemon OOME
(Doc ID 2312216.1)
Last updated on OCTOBER 20, 2017
Applies to:Oracle Coherence - Version 22.214.171.124.0 and later
Information in this document applies to any platform.
Customer has a 126.96.36.199.2 federation cluster with Persistence enabled could not recover after restart. One of their cluster with persistence enabled was not able to recover after it went down for scheduled OS patching. The cluster is stuck with guardian timing out on multiple nodes. Even after restarting couple of times, it ran into same issue. Customer had to recover the data from their business data backup to bring the cluster back up. Before federation cluster restart, everything is normal with sufficient memory available on all the nodes. However, when bounced, at the time of restart, nodes fail with Out of Memory. Analysis of heap dump indicates several InvokeAllRequests (containing very large set of Keys) using up memory and triggering out of memory failures. These requests seem to be triggered by JournalRecordGCDaemon on cluster startup.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!