My Oracle Support Banner

Introducing Cluster Health Monitor (IPD/OS) (Doc ID 736752.1)

Last updated on DECEMBER 23, 2019

Applies to:

Oracle Database - Standard Edition - Version to [Release 10.1 to 11.1]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Oracle Database Cloud Service - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Generic Linux


What is Cluster Health Monitor (IPD/OS)?

Cluster Health Monitor - also known as IPD/OS is a set of tools to collect Operating system performance data periodically and automatically. The data is stored for both online and offline analysis.

Please refer to the Cluster Health Monitor (CHM) FAQ <Document 1328466.1> for more information about Cluster Health Monitor

Where can I get latest copy of Cluster Health Monitor?

The latest copy of Cluster Health Monitor (IPD/OS) is always with the install image if the version is or greater.  Please note that Cluster Health Monitor is available on only selected platforms.

For Linux , the pre- version of Cluster Health Monitor (IPD/OS) can be downloaded from :

Cluster Health Monitor Diagnostic Data Collection Process

The tool collects OS performance data. This data can be used to tune Single Instance, RAC performance tuning. It can also be used to find out root cause for Oracle Clusterware eviction especially ones caused by scheduler issues or high CPU loads. Generic performance collection tools sometimes have trouble collecting data when the OS gets very busy. This is where Cluster Health Monitor comes into play.

Why Cluster Health Monitor ?

Oracle Clusterware & Oracle database performance/node reboot due to lack of CPU/Memory resources cause Customers to ask how to monitor their OS. Some customers have rudimentary scripts that utilize vmstat, mpstat but they are often not collected at regular intervals. In some cases, we have seen customers collect this once per hour which does not make it very useful when the node is hung/evited via reboot in the middle of the hour. OSwatcher did a wonderful job of making the data collection uniform with uniform collection intervals. Cluster Health Monitor extends OSwatcher by ensuring it is always scheduled and collects data points while providing a client GUI to view current load.

What platforms can I run the Cluster Health Monitor?

The Cluster Health Monitor is NOT available for Itanium platform (Linux, Windows, and HP Itanium) on all version. and earlier: Linux only (download from OTN) Solaris (Sparc and x86-64) and Linux AIX, Solaris (Sparc and x86-64), Linux , and Windows



For OTN version of Cluster Health Monitor, the complete steps to install the tool is explained in the readme file shipped with the product

For or later version, the cluster health monitor is installed automatically when Grid Infrastructure (aka CRS) is installed.  The resource name for Cluster Health Monitor is ora.crf that is managed by ohasd.


The tool can be used by Customers to monitor their nodes online or offline. Generally when working with Oracle support, the data is viewed offline.

Please note that $ORACRF_HOME is /usr/lib/oracrf if Cluster Health Monitor is from OTN
and $ORACRF_HOME is GI_HOME if Cluster Health Monitor is installed with Grid Infrastructure ( or later)

Non-GUI Mode (preferred for gathering the data)

The $ORACRF_HOME/bin/oclumon command can be used to get the load information.
Execute oclumon -h option to see the help

For help from command line : oclumon <verb> -h
For help in interactive mode : <verb> -h
Currently supported verbs are :
showtrail, showobjects, dumpnodeview, manage, version, debug, quit and help

There are various attributes that can be used to find out the performance problem.

Some useful attributes that can be passed to oclumon are
  1. Showobjects
    $ORACRF_HOME/bin/oclumon showobjects -n <node_name> -time "2008-06-03 16:10:00"
  2. Dumpnodeview
    $ORACRF_HOME/bin/oclumon dumpnodeview -n <node_name>
  3. Showgaps
    $ORACRF_HOME/bin/oclumon showgaps -n <node_name>  \
    -s "2009-07-09 02:40:00"  -e "2009-07-09 03:59:00"  

    Number of gaps found = 0
  4. Showtrail
    $ORACRF_HOME/bin/oclumon showtrail -n <node_name> -diskid \
    sde qlen totalwaittime -s "2009-07-09 03:40:00" \
    -e "2009-07-09 03:50:00" -c "red" "yellow" "green"

    Parameter=QUEUE LENGTH
    2009-07-09 03:40:00     TO      2009-07-09 03:41:31     GREEN
    2009-07-09 03:41:31     TO      2009-07-09 03:45:21     GREEN
    2009-07-09 03:45:21     TO      2009-07-09 03:49:18     GREEN
    2009-07-09 03:49:18     TO      2009-07-09 03:50:00     GREEN
    Parameter=TOTAL WAIT TIME

    $ORACRF_HOME/bin/oclumon showtrail -n <node_name> -sys cpuqlen \
    -s "2009-07-09 03:40:00" -e "2009-07-09 03:50:00" \
    -c "red" "yellow" "green"

    Parameter=CPU QUEUELENGTH 

    2009-07-09 03:40:00     TO      2009-07-09 03:41:31     GREEN
    2009-07-09 03:41:31     TO      2009-07-09 03:45:21     GREEN
    2009-07-09 03:45:21     TO      2009-07-09 03:49:18     GREEN
    2009-07-09 03:49:18     TO      2009-07-09 03:50:00     GREEN


 GUI Mode (Available only with OTN version)

Online mode can be used to detect problems live on the problem environment. The data can be viewed using Cluster Health Monitor utility /usr/lib/oracrf/bin/crfgui. The GUI is not installed on the nodes of the server but can be installed on any other client using -g <Install_dir>

1. For example, To look at the load on a node you can run the command .

/usr/lib/oracrf/bin/ -m <Nodename>

The default refresh rate for this GUI is 1 second. To change refresh rate to 5 seconds execute

/usr/lib/oracrf/bin/ -n <Node_to_be_monitored> -r 5

2. Another attribute that can be passed to the tool is -d. This is used to view the data in the past from the current time. So if there was a node reboot 4 hours ago and you need to look at the data about 10 minutes before the reboot, you would pass -d "04:10:00"

/usr/lib/oracrf/bin/ -d "04:10:05"

All the above usage scenarios requires gui access to the nodes.

Data Collection

For Oracle or later RAC installations use the diagcollection script that comes with Cluster Health Monitor:

GI_HOME/bin/ --collect --chmos
GI_HOME/bin/oclumon dumpnodeview -allnodes -v > <your-directory>/<your-filename>

For other versions run

/usr/lib/oracrf/bin/oclumon dumpnodeview -allnodes -v -last "23:59:59" > <your-directory>/<your-filename>

Make sure <your-directory> has more than 2Gb space to create file<your-filename>
Zip or compress <your-filename> before uploading to the Service Request.

Also update the SR with the information when (date and time) you have observed a specific issue.


To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!

In this Document
 What is Cluster Health Monitor (IPD/OS)?
 Where can I get latest copy of Cluster Health Monitor?
 Cluster Health Monitor Diagnostic Data Collection Process
 Why Cluster Health Monitor ?
 What platforms can I run the Cluster Health Monitor?
 Data Collection
 Scalability RAC Community

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.