OCSG Cluster Mode Failover Time Too Long
Last updated on AUGUST 31, 2016
Applies to:Oracle Communications Services Gatekeeper - Version 18.104.22.168 to 5.1.0 [Release 5.0 to 5.1]
Information in this document applies to any platform.
When one OCSG NT node experienced hardware/network crashes, like unplug cable or reboot machine, AT is out of service for a period of time.
Even after turning the cluster heartbeat “period length” to minimum value, the time is around 20 seconds.
It's not acceptable if the out of service time is more than 1 second in active-active deployment.
HA tests in 2x2 cluster environment.
1) REST / Parlay QoS-applyQos: Disable Ethernet of one NT during traffic run
2) REST / Parlay QoS-applyQos: shutdown of one NT Server during traffic run
Both cases fails with below symptoms:
Requests sent to AT1 and AT2 get no response,no matter what kind communication services.
This symptoms lasts for about 5 minutes, after this period of time, the system recovers.
Kill NT server doesn't have such service interrupt issue
Same issue observed on OCSG 22.214.171.124 and 5.1.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
Million Knowledge Articles and hundreds of Community platforms