Ops Center: Steps to Clear a failed EC HA restart and increase timeout for Clusteware
(Doc ID 1515725.1)
Last updated on SEPTEMBER 25, 2019
Applies to:Enterprise Manager Ops Center - Version 12C and later
Information in this document applies to any platform.
The EC services look to have gone offline due to an authentication event exceeding the allowed number of failed login attempts. The EC then tried to restart those services and at the same time the HA Clusterware software tried to start the secondary EC instance. This Clusterware start eventually timed out and the EC's services finally started but this left the environment in a strange state.
The environment typically took 6-7 minutes to start with 2 Proxy Controllers and 8 CDOM's, the current environment has doubled to 4 Proxy Controllers and 16 CDOM's so estimated time to start is now 12-14 minutes. The timeout for a failover start is 9 minutes which is now too short for the current environment. So the time out needs to be increased as well.
Ops Center: ecadm ha-start failed to start the EC after crash
The EC crashed and put the HA Cluster into an inconsistent state. After rebooting the servers, the ./ecadm ha-start failed to start the EC. Note the <EC> could be the primary or the secondary EC in the example below.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document