Coordinated Replicat is slow as some threads are hung
(Doc ID 2350269.1)
Last updated on JANUARY 08, 2021
Applies to:
Oracle GoldenGate - Version 12.1.2.0.0 and laterInformation in this document applies to any platform.
Symptoms
Test Setup
Sql Server : SQL Server 2008 R2 Enterprise SP2
OGG 12.1.2.0.0 : OGGCORE_MAIN_PLATFORMS_130619.1400
Test : OGG 12.1.2 SqlServer - SqlServer
Setup : Sql Server SOURCE(MyDatabase) and Sql Server
DEST(MyDatabase),
1-Way Replication( Extract - Replicat)
Test description
In Generic test on target side with BDB enabled, we added Coordinated Replicat with 200 threads. When test had been running for a few minutes, we
noticed that the first 65 threads worked fine and responded the correct result with GGSCI command, but the other threads were hung there.
GGSCI (SLC01IMS) 16> info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
JAGENT STOPPED
REPLICAT RUNNING GREP 00:00:00 00:00:06
GGSCI (SLC01IMS) 17> send grep065 status
Sending STATUS request to REPLICAT GREP065 ...
Current status: Processing data
Sequence #: 41
RBA: 23955600
3 records in current transaction
GGSCI (SLC01IMS) 18> send grep066 status
Sending STATUS request to REPLICAT GREP066 ...
ERROR: sending message to REPLICAT GREP066 (Timeout waiting for message).
GGSCI (SLC01IMS) 19> info grep066
REPLICAT GREP066 Last Started 2013-06-20 20:49 Status RUNNING
COORDINATED Replicat Thread Thread 66
Checkpoint Lag 00:00:00 (updated 02:48:33 ago)
Process ID 16668
Log Read Checkpoint File ./dirdat/rdst/ge000002
2013-06-20 20:43:37.458000 RBA 1431
From report file GREP.RPT, we found a lot of below warning messages.
2013-06-20 23:43:48 WARNING OGG-06043 The coordinator has not received a
heartbeat message from thread 70.
Dumping thread state:
Coordinated thread 70:
'pstack' is not recognized as an internal or external command,
operable program or batch file.
2013-06-20 23:43:48 WARNING OGG-06043 The coordinator has not received a
heartbeat message from thread 71.
Dumping thread state:
Coordinated thread 71:
'pstack' is not recognized as an internal or external command,
operable program or batch file.
The symptoms of the problem is that for each replicat starting from thread/process #60 thru #200, they all encountered a TCP/IP disconnect and they can't recover from them ... thus the problem with communicating with
the coordinator.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Cause |
Solution |
References |