Oracle Grid Infrastructure: How to Troubleshoot cssagent/cssmonitor Evictions
(Doc ID 1549496.1)
Last updated on MAY 21, 2021
Applies to:Oracle Database Backup Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Oracle Database - Enterprise Edition - Version 18.104.22.168 and later
Oracle Database Cloud Schema Service - Version N/A and later
Information in this document applies to any platform.
cssagent or cssmonitor needs to receive heartbeats from CSSD on a regular basis. If the length of time between heartbeats from CSSD is too long, cssagent or cssmonitor will abort the node.
When the node aborts for this reason, the node alert log in $GRID_HOME/log/<hostname>/alert<nodename>.log will show a reboot advisory (CRS-8011) explaining that cssagent or cssmonitor rebooted the node.
These messages may be in the local node alert log or in one of the other nodes.
The top cause is OS resource starvation causing ocssd heartbeat thread and/or cssagent to not get scheduled, even though these processes run with top priority. In most cases you will need to resolve the OS resource issue.
The purpose of this document is to provide steps to take after a cluster node is rebooted by cssagent/cssmonitor.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document
|1. Check for OS resource starvation or scheduler problem at the time of the reboot.|
|a) Collecting the archived OS statistics|
|b) What to look for in the archived OS statistics.|
|2. Make sure that the latest PSU is applied to get fixes for known issues with resource consumption which can lead to node eviction.|
|3. For AIX specifically:|
|4. Enable crashdump if further debug required|