Coherence ValidatePolls Error Causes Cluster To Become Unresponsive (Doc ID 2275648.1)

Last updated on JUNE 11, 2017

Applies to:

Oracle Coherence - Version 12.2.1.0.2 to 12.2.1.0.3 [Release 12c]
Information in this document applies to any platform.

Symptoms

A single node in the cluster logs a validatePolls error indicating a failure to communicate with all other cluster members.

It appears that subsequently, all cluster communication fails and the cluster becomes unresponsive. All cluster members need to be killed and a full restart of the cluster is required to clear the error. Below error is thrown in the logs

2017-02-17 09:36:04.804/361.960 Oracle Coherence GE 12.2.1.0.3 <Error>(thread=Transport:TransportService, member=n/a): validatePolls: This service timed-out due to unanswered handshake request. Manual intervention is required to stop the members that have not responded to this Poll at com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$PollArray.validatePollsExtra(Grid.CDB:16) at
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$PollArray.validatePolls(Grid.CDB:63) at
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$PollArray.validatePollsExtra(Grid.CDB:16) at
com.tangosol.coherence.component.util.daemon.queueProcessor.service.Grid$PollArray.validatePolls(Grid.CDB:63)

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms