My Oracle Support Banner

Coherence Incubator Processing Pattern Issue One Of The Coherence Nodes Shutdown Causing Task Execution To Halt. (Doc ID 2537023.1)

Last updated on OCTOBER 13, 2023

Applies to:

Oracle Coherence - Version 12.2.1.0.0 and later
Information in this document applies to any platform.

Symptoms

Customer has reported a Coherence incubator processing, messaging pattern issue as follows:

One of the coherence nodes shutdown causing task execution to halt. Let us say I have two machines (Machine1 and Machine2) joining and participating in a coherence cluster which are termed as Agents in our terminology. We generally use standalone Client (No Storage enabled) to submit tasks to cluster above for distributed processing by registering them in overridden coherence config files.

Now when the both the machines (agents) are up and running and when I try to submit the task from Client, I am able to successfully submit and see it executing on the server side which is good. As a disaster scenario, when I made the Machine1 down and then submit a task from client on another machine, Machine2, my expectation is the other registered Machine2 that is up and running should be successfully picking the task. However, it is stuck and it does not throw any stack trace.

As a first step, I tried to print additional loggers by modifying the logging severity to 9 in coherence-config file, which did not help. Then I tried to debug coherence processing pattern code and found the task that is submitted (submissionResult.getSubmissionState()) is not moving from Assigned state to Executing state.

In failure scenario, I do not see the task submission state moving to Executing and hence the below method does nothing. I guess when one of the nodes is shutdown the submission state is not moving to Executing state and hence the required method submissionOutcome.onStarted(); is not getting called which is making the task getting stuck.

I have printed the submission state in failure case and came to know its stuck in Assigned state. Here is my query. Which listener will move the submission state from Assigned to Executing? (I believe its onMapEvent that triggers the status change). Will that listener fails due to external JVM (part of cluster) getting shutdown.

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.