My Oracle Support Banner

Client Joining Multiple WKA Clusters Can Cause Them To Combine Or Panic Due To Socket Address Re-use (Doc ID 1275924.1)

Last updated on FEBRUARY 09, 2024

Applies to:

Oracle Coherence - Version 3.6.0 to 3.6.0.4 [Release AS10g]
Information in this document applies to any platform.

Symptoms

This problem can occur under the following conditions:

  1. There are two independant clusters configured to used well-known address lists (WKA) which are, of course, unique to each cluster as cluster nodes can't be members of more than one cluster at a time.  In the output in the article:

    • cluster1, configured with the following system properties:

      tangosol.coherence.localhost=127.0.0.1
      tangosol.coherence.wka=127.0.0.1
      tangosol.coherence.localport=8000
      tangosol.coherence.wka.port=8000

    • cluster2, configured with the following system properties:

      tangosol.coherence.localhost=10.xxx.xxx.xx
      tangosol.coherence.wka=10.xxx.xxx.xx
      tangosol.coherence.localport=8000
      tangosol.coherence.wka.port=8000

    In this simple example each cluster consists of a single node.

  2. The two clusters use the same cluster name, for example tangosol.coherence.cluster=testcluster

  3. A client connects to one of the clusters, then disconnects and attempts to join the second cluster.

There is no problem when the client connects to the first cluster.  However, after it disconnects, by executing CacheFactory.shutdown(), and connects to the second cluster the following errors are reported by the first cluster's node:

The client can be seen to join the cluster which triggers a panic as it mistakenly believes that the two separate clusters were a single-cluster which has two senior members, which is a split-brain situation.  This is incorrect as the two clusters were completely independent until the client had connected to both nodes.  A side effect of this problem is that the two clusters do merge and as can be seen above the storage is re-partitioned between the two nodes.

Changes

This behavior was not seen in Coherence 3.5.3 with the described configuration and the problem was encountered after upgrading to Coherence 3.6.0.

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.