Sample Questions to Gather Preliminary information for Investigating a Potential Data Loss Situation With Coherence Product. (Doc ID 2883437.1)

Last updated on JULY 25, 2022

Applies to:

Oracle Coherence - Version and later
Information in this document applies to any platform.


 In order to investigate further and fit suitable reasoning following questions are outstanding :

1) What is the cluster topology in this environment as compared with the other environment dealt in the previous SR ( How many nodes ? How many JVM members and How they are distributed across the nodes ?)

2) The above will point to the sufficiency of the Restore and Recovery Quorums for the environment being dealt in this SR.

3) Also do you have a Cluster quorum configured apart from the Restore and Recovery Quorums ?

4) How are you determining that a data loss happened (what mechanism is being used to ascertain the same whether COHQL client or some other means) ?

5) What was the status HA report of the different members on the JMX console at the time of the reported data loss issue ?

6) Did your team happened to observe any orphaned partitions reported at incident time ?

7) What is the backup partition count if any explicitly configured in your environment ?

8) Do you have substantiating information to determine why node eviction(s) are happening in your environment ?

9) After the restart due to the OS patching which partitions were subject to recovery and whether the persistent partitions were stored on a centrally accessible NAS and what was the status of that NAS both at time of incident and at recovery time ?

10) Do you have any explicit process which determines persistent data lifecycle ?

11) Do you have any documented (snap) capturing the last good status where the partition ownership to member identity mapping is available ?

12) What was the coherence cache size and size of data in DB before , during and after the time of the reported incident ?

13) What was the status of the DB at the incident time (whether it was running and accepting fresh data during the time the coherence incident occurred) ?


