TimesTen: Bug 10176825 - Process Failure Leads To Data Store Invalidation (Doc ID 1327737.1)

Last updated on SEPTEMBER 23, 2016

Applies to:

Oracle TimesTen In-Memory Database - Version 11.2.1.1.0 to 11.2.1.7.0 [Release 11.2]
Information in this document applies to any platform.
This Note is of interest to all application developers and DBAs working with the TimesTen InMemory DataBase.


Goal

This problem is known and has been handled in an unpublished bug. The basic description of the bug and version fix information can be seen in <Note 10176825.8>. What follows here is a detailed description of the bug process and bug "signature" to assist customers in determining if they are encountering this problem.

The bug is a problem in the transaction log generation component of the TimesTen V11 source code. This bug results in an invalid address being written to the log buffer such that any application process attempting to access the buffer may encounter this invalid address and then will abend; the data store to which the process was connected will subsequently become invalid. There are no known restrictions on the type of application process which is vulnerable to this bug, and the failure has been seen in both C++ and Java client application processes. 

This bug has a distinct diagnostic signature. Customers encountering this bug will observe the following:

1) An application process connected to the data store terminates abnormally with some kind of OS exception (SIGBUS, SIGSEG, etc.). If the application dumps core, a stack trace from the core file will include a call to the  sbLogCTNWrap  procedure. For example, see the 3rd entry in the following stack trace taken from a failed process (emphasis added):

o>l@1 signal SIGBUS function sbLogBufInsertFull()
(dbx 5) where
=>[1] sbLogBufInsertFull(0x2248a0, 0x4c2eee9c, 0x100648e08d0, 0x10, 0x10, 0x1), position: 0xffffffff7ea728cc
[2] sbLogRecInsert(0x0, 0x10000046778, 0x1, 0xc9, 0x10000046780, 0xc9), position: 0xffffffff7eaabb00
[3] sbLogCTNWrap(0x1002327d0, 0x0, 0x100000466b0, 0x10, 0x1400, 0x12000), position: 0xffffffff7eaacfac
[4] sbLogWriteFull(0x46, 0x1, 0x2, 0xffffffff7ff65084, 0xffffffff7ff65060, 0x0), position: 0xffffffff7eaab2c4
[5] sbXactCommit(0x0, 0x10000c61de8, 0x100000465a0, 0xffffffff7f412880, 0x1002327d0, 0x12018), position: 0xffffffff7ebadbd0
[6] sb_xactCommitQ(0x12018, 0x1002327d0, 0x1400, 0x1, 0x12020, 0x1000), position: 0xffffffff7e8f3b40
...



2) The ttmesg.log will show a pattern of messages beginning with the termination of the application process, followed by subdaemon-initiated cleanup after the process failure, followed by the failure to successfully rollback the abended process' open transactions, followed by the failure of the data store subdaemon itself, finally resulting in the invalidation of the data store:

/* process abend message in ttmesg.log */
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.warning] 15758 exited while connected to data store '/rtt/TimesTen/rtt' shm 268453835 count=1
...

/* cleanup of abended process including rollback of open transactions commences*/

Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[638]: [ID 702911 local1.info] subd: evaluate '/rtt/TimesTen/rtt' pid 15758
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[638]: [ID 702911 local1.info] subd: OK rc=0
...
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] Rolling back transactions for pid 15758
...
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[637]: [ID 702911 local1.info] rollback
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[637]: [ID 702911 local1.info] subd got #12741.25482, rollback: name=/rtt/TimesTen/rtt shmKey=%10%00G%cb pid=%00%00=%8e
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[637]: [ID 702911 local1.info] subd: rollback pid 15758
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[637]: [ID 702911 local1.info] subd: found 1 connection belonging to pid 15758
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[637]: [ID 702911 local1.info] 15758 doRollbackWork: Calling sbOLUndoByConn for connid 21 -- rtt
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: sbOLUndoByConn(): oplog stack for connection 21: 37 124 100 104
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 104 for connid 21 -- begin
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: sbOLUndoLogBufLatches: insp->strandno = 1; released allocation latch; released insertion latch
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 104 for connid 21 -- returned 0
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 100 for connid 21 -- begin
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 100 for connid 21 -- returned 0
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 124 for connid 21 -- begin
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 124 for connid 21 -- returned 0
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 37 for connid 21 -- begin
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 100 for connid 21 -- begin
Jun 1 14:25:21 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] 637/ffffffff7bcccee8: undo op 100 for connid 21 -- returned 0
...

/*TT14000 daemon failure and data store termination*/

Jun 1 14:26:38 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.error] TT14000: TimesTen daemon internal error: Unparsable reply from subdaemon port 32778 pid 637, rc -2: invalidate
Jun 1 14:26:38 mx-rnk02 last message repeated 1 time
Jun 1 14:26:38 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.error] TT14000: TimesTen daemon internal error: Reply: ''
Jun 1 14:26:38 mx-rnk02 last message repeated 1 time
Jun 1 14:26:38 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.info] Mark process #0 as down (0)
Jun 1 14:26:38 mx-rnk02 TimesTen Data Manager 11.2.1.5.0.rtt[636]: [ID 702911 local1.warning] Invalidating the data store (failcode -1, recovery for 15758)
Jun 1 14:26:38 mx-rnk02 last message repeated 1 time
.



What has happened here is that the application process (in this case pid 15758) has failed due to the invalid pointer which was generated in the sbLogCTNWrap procedure. The data store subdaemon detects the process failure and initiates cleanup of the failed process. This cleanup involves rolling back transactions left open by the failed process (in this case, the process was multithreaded with several connections to TimesTen, so there were several open transactions when the process abended, and these all had to be cleaned up.) However, in the process of rolling back the orphaned transactions, the subdaemon also encounters the bad address pointer in sbLogCTNWrap and the subdaemon process also terminates abnormally. By rule, the abnormal termination of a data store subdaemon results in the invalidation of the data store, and that is exactly what happens in the case of this bug.

Note that in all observed cases, subsequent data store recovery is automatically commenced and completed without problem; there is no permanent corruption in the data store and no need for manual intervention.

Solution

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms