NullPointerException - Missing Partition (Doc ID 2295572.1)

Last updated on AUGUST 25, 2017

Applies to:

Oracle Coherence - Version 12.1.3.0.5 and later
Information in this document applies to any platform.

Symptoms

On : 12.1.3.0.5 version, Oracle Coherence

Customer has 5 machines. stgb-cache1 to stgb-cache5. Each machine has around
25 JVMs including storage enabled and disabled.

This is continuation of Bug 24522765. Customer is using the suggested
solution in that bug (Set -Dtangosol.coherence.distributed.strict=false on
Coherence JVMs). They are still seeing the issue (Partition xxx does not
exist at PartitionSplittingBackingMap) and the service termination with NPE
exception this time. In Bug 24522765, service was terminating with
IllegalStateException.

Brief sequence of events:
- Rolling upgrade process in underway
- Every time they do application upgrade, they start a brand new node with
the new code, wait for MACHINE_SAFE of the service and then shutdown the old
node.
- Member 27 requests partition 402 from member 153
- Member 153 says that partition 402 does not exist, throws a
NullPointerException and leaves the service

Member 153 just recently joined the cluster about 5 minutes prior to the NPE.

Why is the Member 153 leaving the cluster with NPE?

Issue happens about 1 out of 5 times.
 


Brief sequence of events:
- Rolling upgrade process in underway
- Member 27 requests partition 402 from member 153
- Member 153 says that partition 402 does not exist, throws a NullPointerException and leaves the service


Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms