Coordinated Replicat is slow as some threads are hung

(Doc ID 2350269.1)

Last updated on JANUARY 19, 2018

Applies to:

Oracle GoldenGate - Version 12.1.2.0.0 and later
Information in this document applies to any platform.

Symptoms

Test Setup

Sql Server : SQL Server 2008 R2 Enterprise SP2
OGG 12.1.2.0.0 : OGGCORE_MAIN_PLATFORMS_130619.1400
Test : OGG 12.1.2 SqlServer - SqlServer
Setup : Sql Server SOURCE(MyDatabase) and Sql Server
DEST(MyDatabase),
1-Way Replication( Extract - Replicat)

Test description

In Generic test on target side with BDB enabled, we added Coordinated Replicat with 200 threads. When test had been running for a few minutes, we
noticed that the first 65 threads worked fine and responded the correct result with GGSCI command, but the other threads were hung there.

GGSCI (SLC01IMS) 16> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT STOPPED
REPLICAT RUNNING GREP 00:00:00 00:00:06

GGSCI (SLC01IMS) 17> send grep065 status
Sending STATUS request to REPLICAT GREP065 ...
Current status: Processing data
Sequence #: 41
RBA: 23955600
3 records in current transaction

GGSCI (SLC01IMS) 18> send grep066 status
Sending STATUS request to REPLICAT GREP066 ...
ERROR: sending message to REPLICAT GREP066 (Timeout waiting for message).

GGSCI (SLC01IMS) 19> info grep066
REPLICAT GREP066 Last Started 2013-06-20 20:49 Status RUNNING
COORDINATED Replicat Thread Thread 66
Checkpoint Lag 00:00:00 (updated 02:48:33 ago)
Process ID 16668
Log Read Checkpoint File ./dirdat/rdst/ge000002
2013-06-20 20:43:37.458000 RBA 1431

From report file GREP.RPT, we found a lot of below warning messages.

2013-06-20 23:43:48 WARNING OGG-06043 The coordinator has not received a
heartbeat message from thread 70.
Dumping thread state:
Coordinated thread 70:
'pstack' is not recognized as an internal or external command,
operable program or batch file.

2013-06-20 23:43:48 WARNING OGG-06043 The coordinator has not received a
heartbeat message from thread 71.
Dumping thread state:
Coordinated thread 71:
'pstack' is not recognized as an internal or external command,
operable program or batch file.

 

The symptoms of the problem is that for each replicat starting from thread/process #60 thru #200, they all encountered a TCP/IP disconnect and they can't recover from them ... thus the problem with communicating with
the coordinator.


Changes

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms