Main Note - Oracle GoldenGate - Lag, Performance, Slow and Hung Processes
(Doc ID 1304557.1)
Last updated on DECEMBER 14, 2020
Oracle GoldenGate - Version 5.0.0 and later Information in this document applies to any platform.
This note collects and categorizes notes and papers on identifying, quantifying and remediation issues associated with perceived performance problems. This includes issues concerning processed described as slow, hung, having excessive lag, and having performance issues.
LAG is the elapsed time between when a transaction is written to a storage medium such as an archive log and the time when Replicat writes the same transaction to the target database. Generally speaking, all rows within a transaction will have the same lag. Exceptions can occur when a transaction is broken up and applied by multiple replicats or as multiple transactions. Parameters such as RANGE are responsible for the first case. MAXTRANSOPS is responsible for the second case.
Lag is introduced by:
- Extract in reading the archive log and writing the data to a trail (or remote host),
- (optional) datapump reading the extract trail and writing (normally) to a remote host,
- collector (server.exe) on the target receiving network data and writing it to a local trail,
- replicat reading the local trail and writing to the database,
- database itself.
The status and info commands from ggsci show lag, as does the send <object>, lag command.
- The sum of these reported lags is the time OGG considers as LAG.
- Lag from an info command may differ from lag from a send command by small amounts.
- Lag from an info command is returned by the manager based on the last recorded checkpoint.
- Lag from a send command is returned by the <object> based on the row timestamp that the <object> is currently processing.
Lag is expressed in time units and in kilobytes of data the <object> has to process yet. These generally correlate but one or the other may be zero and the 'other' be a small but non-zero amount.
The lag is a measure of the difference between row archiving or writing and the time an <object> examines that row's timestamp. It is not an indication of how long it will take an <object> to 'catch up'.
Millions of bytes and hours of lag can be caught up in minutes or less.
Because lag is based on checkpoints, in the case of Long Running Transactions (LRTs), a transaction may be started at a given time and hours later not be committed. This transaction will be the oldest outstanding data and will not checkpoint without a commit. This can cause an <object> processing a LRT to appear to be hours behind when the real issue is that the LRT has yet to commit. In a replicat, this issue can be diminshed (with transaction integrity consequences) by adding MAXTRANSOPS to the replicat parameter file.
<Note 1273285.1> How To Troubleshoot Oracle Redo Log Reading Extract Slow Performance Issue
<Note 969639.1> What Information Does Support Need to Diagnose a Replicat Performance Issue on Open Systems
<Note 962592.1> Excessive REPLICAT LAG Times
<Note 968710.1> Using A Heartbeat Table To Monitor LAG Between The Source And Target.
<Note 1071892.1> Excessive LAG on Data Pump Sending Data Over WAN
<Note 968614.1> Why Does GoldenGate Report The Lag Is Unknown?
<Note 964705.1> Extract RBA Not Moving And LAG Increasing And Appear Hung
<Note 1299679.1> OGG - Heartbeat process to monitor lag and performance in GoldenGate
<Note 1478958.1> Extract On Restart While In Recovery Runs Slower Using Lots Of Virtual Memory / Disk File Caching <Note 1493225.1> Extract Is Very Slow In Processing the records <Note 1518793.1> Classic Extract runs slower when there are lots of row chaining <Note 1501813.1> Extract (126.96.36.199.2 or 188.8.131.52.3) is Slow In Restarting <Note 1564093.1> GoldenGate Extract Slow when executing on RAC <Note 1490270.1> Why is my VAM or other extract slow or hung or not checkpointing? <Note 1487557.1> Integrated Extract runs slower upon restart during Recovery and status shows "In recovery: At E <Note 1256884.1> Catalog Data Extract - Items Still Slow After Upgrading to Latest Code Version <Note 1337637.1> OGG Extract in Version 184.108.40.206 is Slow Until Restart <Note 1363266.1> OGG How to address Extract Performance Issue When Reading Archive Logs stored in Oracle ASM <Note 1356524.1> How to estimate Goldengate extract redo processing speed? <Note 1337637.1> OGG Extract in Version 220.127.116.11 is Slow Until Restart <Note 1329640.1> Why replicat (or pump) hangs after upstream pump (or extract) had ETROLLOVER? <Note 1432994.1> OGG Extract Hangs on AIX When Reading Through DBLOGREADER <Note 1192972.1> ALO Extract Hangs and on restart an error 22 (Invalid argument) occurs <Note 1555982.1> OGG Extract Hangs on ASM archived logs with sector_size=4k <Note 1381055.1> GoldenGate NSK Extract Hangs in Starting State And is Not Moving <Note 1343063.1> OGG Extract is Hung Stuck on Archive Log Extseqno 0 Extrba 0 and can't be Al <Note 1528401.1> Classic Extract - Extract Hang Using 100% CPU <Note 1358904.1> Extract Is Running But Stuck At Same RBA And Send Commands Gives Timeout Waiting For Message <Note 1362390.1> Goldengate Extract Stuck On 'Bad' redo log, But Chekcpoint Got Update <Note 1467942.1> Unable to start additional GoldenGate processes after several already started. Stuck in "Starting
Would you like to explore this topic further with other Oracle Customers, Oracle Employees, and Industry Experts?