Oracle GoldenGate - Using the GoldenGate Logdump Utility to Manually Load Balance across Multiple Processes (Doc ID 1301300.1)

Last updated on JANUARY 13, 2017

Applies to:

Oracle GoldenGate - Version 10.4.0.0 and later
Information in this document applies to any platform.

Goal

Replicating a high volume of change data in a single data stream (one extract, one pump, one replicat, for example) often does not provide enough throughput to maintain latency requirements for real-time data feeds between source and target systems. Understanding how to split the data into evenly distributed data streams is fundamental to increasing throughput.

After a reasonable sample of change data has been captured and written to a trail, the trail data will be used as the primary input to methodically spread the tables across multiple replication processes for increased throughput. This is done by extracting change data over a representative period of time that might either reflect a specific load test or typical production transaction mix.

Once the trail(s) exist a feature of the logdump utility is then used log table details (bytes, inserts, updates, deletes, etc.) to a loosely structured text file. Before this data is used in a spreadsheet the log dump output will first be reformatted into horizontal records with one row per table and values separated by commas. Reformatting is done using standard UNIX commands strung together at the command line or in shell scripts.

Finally, the formatted data is imported into MS Excel (or any spreadsheet application) and percentages are calculated based on change data bytes. The resulting spreadsheet will then provide the insight needed to evenly distribute the data load across multiple processes. The spreadsheet may also double as a summary report to customers.

Undoubtedly some tables will account for a greater percentage of the change data than other tables. In instances when, for example, one table accounts for 90% or all the change data bytes then the load balancing method described here will highlight this fact but balancing the load across multiple process will typically require the use of additional methods such as the of the "range" function in table or map statements. Ranges are not covered in this document.

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms