Federation Cluster Persistence Enabled Couldn't Recover After Restart Throws JournalRecordGCDaemon OOME

(Doc ID 2312216.1)

Last updated on OCTOBER 20, 2017

Applies to:

Oracle Coherence - Version 12.2.1.0.0 and later
Information in this document applies to any platform.

Symptoms

Customer has a 12.2.1.0.2 federation cluster with Persistence enabled could not recover after restart. One of their cluster with persistence enabled was not able to recover after it went down for scheduled OS patching. The cluster is stuck with guardian timing out on multiple nodes. Even after restarting couple of times, it ran into same issue. Customer had to recover the data from their business data backup to bring the cluster back up. Before federation cluster restart, everything is normal with sufficient memory available on all the nodes. However, when bounced, at the time of restart, nodes fail with Out of Memory. Analysis of heap dump indicates several InvokeAllRequests (containing very large set of Keys) using up memory and triggering out of memory failures. These requests seem to be triggered by JournalRecordGCDaemon on cluster startup.

Changes

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms