My Oracle Support Banner

MaxRep: No data throughput on all Protection Plans causes the Recovery Point Objective (RPO) to exceed the default time limit threshold (Doc ID 2111203.1)

Last updated on APRIL 25, 2018

Applies to:

Pillar Axiom Replication Engine (MaxRep) - Version 3.0 to 3.0 [Release 3.0]
Information in this document applies to any platform.

Symptoms

The default threshold for the MaxRep Recovery Point Objective (RPO) value is 30 minutes. An alert will be sent if the RPO increases beyond this limit. The RPO can be increased or decreased under Protect -> Manage Protection Plan:  Click on Modify to change the Protection Plan and select Modify Replication Options. Under normal operations, the RPO should be well below the default threshold of 30 minutes but due to a known issue, this value may start increasing and continue to increase until a workaround is applied by Oracle Support.

Below is an example of the symptoms as seen in the MaxRep Graphical User Interface (GUI) under Monitor -> Volume Protection:

 

 

Symptoms of the issue can also be determined in the following way:

a) Open a SSH to the source Engine IP address where the plans are being affected.

b) Check the source and target cache folder to determine if the differentials are draining correctly.

c) Under /home/svsystems/transport/log/, review the file cxps.err.log for timeout errors like these: 

2016-Feb-24 12:35:32 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67965.7fddfd5320a0) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:32 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67966.7fde3ee100b0) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:32 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67967.7fddc127d090) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:32 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67968.7fddf80556e0) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:32 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67969.7fde3f43cc30) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:32 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67970.7fddc00cc0c0) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:32 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67971.7fddc007e380) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:33 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67972.7fddfd9845f0) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:33 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67973.7fddb826eb20) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:34 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67974.7fde3f43b9d0) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:34 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67975.7fde3f869d90) 10.1.20.113 asyncReadSome timed out (300)
2016-Feb-24 12:35:34 ERROR [at cxpslib/session.cpp:handleTimeout:245] (sid: 67976.7fde29104290) 10.1.20.113 asyncReadSome timed out (300)

The expected log entries during normal operation looks like these:

2015-Oct-16 22:19:43 INFO REQUEST HANDLER WORKER THREAD STARTED: 0x7fdeec000920
2015-Oct-16 22:19:43 INFO REQUEST HANDLER WORKER THREAD STARTED: 0x7fdeec001420
2015-Oct-16 22:19:43 INFO REQUEST HANDLER WORKER THREAD STARTED: 0x7fdeec000be0
2015-Oct-16 22:19:43 INFO REQUEST HANDLER WORKER THREAD STARTED: 0x7fdeec001a20
2015-Oct-16 22:19:43 INFO REQUEST HANDLER WORKER THREAD STARTED: 0x7fdeec0010f0
2015-Oct-16 22:19:43 INFO REQUEST HANDLER WORKER THREAD STARTED: 0x7fdeec002070

d) Next open a SSH session to the target Engine IP address.

e) Check the cachemgr session threads status - which are currently running. If it is hung the expected output looks like this: 

[root@DRSANREP-01 ~]# netstat -apn| grep cachemgr
tcp 1 0 10.1.20.113:54666 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54693 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54737 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54674 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54768 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54683 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54745 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54637 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 0 0 10.1.20.113:47960 10.1.20.13:9443 ESTABLISHED 2209/cachemgr
tcp 1 0 10.1.20.113:54768 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54683 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54745 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54637 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 0 0 10.1.20.113:47939 10.1.20.13:9443 ESTABLISHED 2209/cachemgr
tcp 1 0 10.1.20.113:54743 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54739 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 1 0 10.1.20.113:54736 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr
tcp 0 0 10.1.20.113:47961 10.1.20.13:9443 ESTABLISHED 2209/cachemgr
tcp 1 0 10.1.20.113:54641 10.1.20.13:9443 CLOSE_WAIT 2209/cachemgr

The expected output during normal operation would look like this:

[root@DRSANREP-01 ~]# netstat -apn| grep cachemgr
tcp 0 0 10.1.20.113:47960 10.1.20.13:9443 ESTABLISHED 2209/cachemgr
tcp 0 0 10.1.20.113:47960 10.1.20.13:9443 ESTABLISHED 2209/cachemgr
tcp 0 0 10.1.20.113:47960 10.1.20.13:9443 ESTABLISHED 2209/cachemgr
tcp 0 0 10.1.20.113:47960 10.1.20.13:9443 ESTABLISHED 2209/cachemgr

In the above analysis example it is shown that cachemgr was not draining the differentials from the source/target cache because it was hung displaying the "CLOSE_WAIT" message.

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.