My Oracle Support Banner

Coherence Cache Is Not Withstanding Partial Loss Of Storage (Doc ID 2262303.1)

Last updated on APRIL 27, 2020

Applies to:

Oracle Coherence - Version 12.2.1.1.0 and later
Information in this document applies to any platform.

Symptoms

Coherence 12.2.1.1.0

There are 6 Weblogic managed servers on 3 machines. 2 on each machine. Each node has 4Gb heap.

75 Million Cache Entries with 300 Gb primary data.

Each machine has an SSD for the overflow cache.

When one of the machine is shutdown, all the cache data in 6 nodes is lost. This happens when each SSD is holding large amount of overflow data (200GB)

The same test works fine when the amount of data in overflow cache in SSD is low.

We can see below guardian timeouts on the Distributed cache thread

<Feb 3, 2017, 4:23:48,949 PM EST> <Error> <com.oracle.coherence> <BEA-000000> <2017-02-03 16:23:48.949/84378.437 Oracle Coherence GE 12.2.1.1.0 <Error> (thread=Cluster, member=6): Detected soft timeout of {WrapperGuardable Guard{Daemon=DistributedCache:coh-rest:AppPartitionedCache} Service=PartitionedCache{Name=coh-rest:AppPartitionedCache, State=(SERVICE_STARTED), LocalStorage=enabled, PartitionCount=16381, BackupCount=1, AssignedPartitions=3688, BackupPartitions=1774, CoordinatorId=3}}>
<Feb 3, 2017, 4:23:51,709 PM EST> <Error> <com.oracle.coherence> <BEA-000000> <2017-02-03 16:23:51.709/84381.196 Oracle Coherence GE 12.2.1.1.0 <Error> (thread=Recovery Thread, member=6): Full Thread Dump:

"DistributedCache:coh-rest:AppPartitionedCache" id=99 State:RUNNABLE
at sun.misc.Unsafe.unpark(Native Method)
at java.util.concurrent.locks.LockSupport.unpark(LockSupport.java:141)
at com.oracle.common.base.SingleWaiterMultiNotifier.signal(SingleWaiterMultiNotifier.java:85)
at com.tangosol.io.journal.FlashJournalRM$PreparerDaemon.notifyJournalFileChanged(FlashJournalRM.java:2528)
at com.tangosol.io.journal.FlashJournalRM$PreparerDaemon.notifyItemQueued(FlashJournalRM.java:2501)
at com.tangosol.io.journal.FlashJournalRM$JournalFile.enqueue(FlashJournalRM.java:1556)
at com.tangosol.io.journal.AbstractJournalRM$JournalImpl.write(AbstractJournalRM.java:1518)
at com.tangosol.io.journal.RamJournalRM$JournalImpl.writeOverflow(RamJournalRM.java:852)
at com.tangosol.io.journal.AbstractJournalRM$JournalImpl.write(AbstractJournalRM.java:1514)
at com.tangosol.io.journal.JournalBinaryStore.store(JournalBinaryStore.java:107)
at com.tangosol.net.cache.CompactSerializationCache.put(CompactSerializationCache.java:450)
at com.tangosol.net.cache.CompactSerializationCache.put(CompactSerializationCache.java:412)
at com.tangosol.util.AbstractKeyBasedMap.putAll(AbstractKeyBasedMap.java:189)
at com.tangosol.net.partition.PartitionSplittingBackingMap.putAllInternal(PartitionSplittingBackingMap.java:434)
at com.tangosol.net.partition.ObservableSplittingBackingCache$CapacityAwareMap.putAllInternal(ObservableSplittingBackingCache.java:1052)
at com.tangosol.net.partition.PartitionSplittingBackingMap.putAll(PartitionSplittingBackingMap.java:169)
at com.tangosol.util.WrapperObservableMap.putAll(WrapperObservableMap.java:185)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.putPrimaryResource(PartitionedCache.CDB:66)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.putPrimaryResource(PartitionedCache.CDB:1)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.insertPrimaryData(PartitionedCache.CDB:62)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.moveResourcesToPrimary(PartitionedCache.CDB:46)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache$Storage.movePartition(PartitionedCache.CDB:39)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.movePartition(PartitionedCache.CDB:17)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.restoreOrphans(PartitionedService.CDB:82)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onOwnershipRequest(PartitionedService.CDB:78)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService$OwnershipRequest.onReceived(PartitionedService.CDB:3)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onMessage(Grid.CDB:38)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid.onNotify(Grid.CDB:23)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.PartitionedService.onNotify(PartitionedService.CDB:3)
at com.tangosol.coherence.component.util.daemon.queueProcessor.service.grid.partitionedService.PartitionedCache.onNotify(PartitionedCache.CDB:3)
at com.tangosol.coherence.component.util.Daemon.run(Daemon.CDB:54)
at java.lang.Thread.run(Thread.java:745)

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.