Federation Cluster Persistence Enabled Couldn't Recover After Restart Throws JournalRecordGCDaemon OOME
Last updated on OCTOBER 20, 2017
Applies to:Oracle Coherence - Version 220.127.116.11.0 and later
Information in this document applies to any platform.
Customer has a 18.104.22.168.2 federation cluster with Persistence enabled could not recover after restart. One of their cluster with persistence enabled was not able to recover after it went down for scheduled OS patching. The cluster is stuck with guardian timing out on multiple nodes. Even after restarting couple of times, it ran into same issue. Customer had to recover the data from their business data backup to bring the cluster back up. Before federation cluster restart, everything is normal with sufficient memory available on all the nodes. However, when bounced, at the time of restart, nodes fail with Out of Memory. Analysis of heap dump indicates several InvokeAllRequests (containing very large set of Keys) using up memory and triggering out of memory failures. These requests seem to be triggered by JournalRecordGCDaemon on cluster startup.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms