Timesten: Large Batch DML Can Complete on Master Node And Then Fail Irrecoverably on Subscriber Nodes
Last updated on SEPTEMBER 23, 2016
Applies to:Oracle TimesTen In-Memory Database - Version 11.2.1 and later
Information in this document applies to any platform.
It is possible to create a situation where very large batch transactions are executed and committed successfully on the master node (the node where the transaction is submitted) but which then fail on the subscriber node(s) due to a lack of temporary space. When this happens, an irrecoverable situation is created on the subscriber node which requires intervention and re-creation of the subscriber node.
In the case where this problem was discovered, a customer was trying to execute a global update on a table with 16 million rows. Both master and subscriber nodes had identical permanent and temporary space partitions defined. When testing against 11.2.1 we found that while the transaction completed and committed on the master node the transaction failed on the logical subscriber node due to temp space exhaustion. We then observed that the transaction was rolled back and temp space freed and the transaction was replayed by the replication agent, and that the transaction again failed: the subscriber node was in a repetitive loop of attempting the replicated update/failing due to lack of temp space/freeing temp memory and rolling back transaction/replaying the transaction (and failing again). The ttmesg.log file showed the following messages:
In 11.2.2, the transaction failed and could not be rolled back, apparently because allocated temp space could not be released. Because the transaction could not be rolled back, an assertion was generated which resulted in the invalidation of the data store. Recovery of the data store commenced automatically and the recovery itself subsequently failed and generated the same assertion because the transaction could neither be applied nor rolled back. This resulted in a different repeating loop of transaction failure/assertion and invalidation/recovery failure, which also could not be resolved except through manual intervention and destroying and rebuilding the subscriber daemon.
The code was fixed in 188.8.131.52.0 so that it would not assert and would act the same as described above for the 11.2.1.
So in both 11.2.1 and 11.2.2, when a transaction commits on the master node and then fails on the subscriber node due to lack of temporary space the result is an unrecoverable condition in the subscriber node.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms