Ops Center: Steps to Clear a failed EC HA restart and increase timeout for Clusteware
(Doc ID 1515725.1)
Last updated on AUGUST 23, 2022
Applies to:
Enterprise Manager Ops Center - Version 12C and laterInformation in this document applies to any platform.
Goal
The EC services look to have gone offline due to an authentication event exceeding the allowed number of failed login attempts. The EC then tried to restart those services and at the same time the HA Clusterware software tried to start the secondary EC instance. This Clusterware start eventually timed out and the EC's services finally started but this left the environment in a strange state.
The environment typically took 6-7 minutes to start with 2 Proxy Controllers and 8 CDOM's, the current environment has doubled to 4 Proxy Controllers and 16 CDOM's so estimated time to start is now 12-14 minutes. The timeout for a failover start is 9 minutes which is now too short for the current environment. So the time out needs to be increased as well.
Ops Center: ecadm ha-start failed to start the EC after crash
The EC crashed and put the HA Cluster into an inconsistent state. After rebooting the servers, the ./ecadm ha-start failed to start the EC. Note the <EC> could be the primary or the secondary EC in the example below.
Solution
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Goal |
Solution |