My Oracle Support Banner

Introducing Cluster Health Monitor (IPD/OS) (Doc ID 736752.1)

Last updated on SEPTEMBER 25, 2018

Applies to:

Oracle Database - Standard Edition - Version 10.1.0.2 to 11.1.0.7 [Release 10.1 to 11.1]
Generic Linux

Details

What is Cluster Health Monitor (IPD/OS)?

Cluster Health Monitor - also known as IPD/OS is a set of tools to collect Operating system performance data periodically and automatically. The data is stored for both online and offline analysis.

Please refer to the Cluster Health Monitor (CHM) FAQ <Document 1328466.1> for more information about Cluster Health Monitor

Where can I get latest copy of Cluster Health Monitor?

The latest copy of Cluster Health Monitor (IPD/OS) is always with the install image if the version is 11.2.0.2 or greater.  Please note that Cluster Health Monitor is available on only selected platforms.

For Linux , the pre-11.2.0.2 version of Cluster Health Monitor (IPD/OS) can be downloaded from :

  http://www.oracle.com/technetwork/database/options/clustering/downloads/index.html

Cluster Health Monitor Diagnostic Data Collection Process

The tool collects OS performance data. This data can be used to tune Single Instance, RAC performance tuning. It can also be used to find out root cause for Oracle Clusterware eviction especially ones caused by scheduler issues or high CPU loads. Generic performance collection tools sometimes have trouble collecting data when the OS gets very busy. This is where Cluster Health Monitor comes into play.

Why Cluster Health Monitor ?

Oracle Clusterware & Oracle database performance/node reboot due to lack of CPU/Memory resources cause Customers to ask how to monitor their OS. Some customers have rudimentary scripts that utilize vmstat, mpstat but they are often not collected at regular intervals. In some cases, we have seen customers collect this once per hour which does not make it very useful when the node is hung/evited via reboot in the middle of the hour. OSwatcher did a wonderful job of making the data collection uniform with uniform collection intervals. Cluster Health Monitor extends OSwatcher by ensuring it is always scheduled and collects data points while providing a client GUI to view current load.

What platforms can I run the Cluster Health Monitor?

The Cluster Health Monitor is NOT available for Itanium platform (Linux, Windows, and HP Itanium) on all version.

11.2.0.1 and earlier: Linux only (download from OTN)
11.2.0.2: Solaris (Sparc and x86-64) and Linux
11.2.0.3: AIX, Solaris (Sparc and x86-64), Linux , and Windows

Actions

Installation

For OTN version of Cluster Health Monitor, the complete steps to install the tool is explained in the readme file shipped with the product

For 11.2.0.2 or later version, the cluster health monitor is installed automatically when Grid Infrastructure (aka CRS) is installed.  The resource name for Cluster Health Monitor is ora.crf that is managed by ohasd.

Usage

The tool can be used by Customers to monitor their nodes online or offline. Generally when working with Oracle support, the data is viewed offline.

Please note that $ORACRF_HOME is /usr/lib/oracrf if Cluster Health Monitor is from OTN
and $ORACRF_HOME is GI_HOME if Cluster Health Monitor is installed with Grid Infrastructure (11.2.0.2 or later)

Non-GUI Mode (preferred for gathering the data)

The $ORACRF_HOME/bin/oclumon command can be used to get the load information.
Execute oclumon -h option to see the help

For help from command line : oclumon <verb> -h
For help in interactive mode : <verb> -h
Currently supported verbs are :
showtrail, showobjects, dumpnodeview, manage, version, debug, quit and help

There are various attributes that can be used to find out the performance problem.

Some useful attributes that can be passed to oclumon are
  1. Showobjects
    $ORACRF_HOME/bin/oclumon showobjects -n stadn59 -time "2008-06-03 16:10:00"
  2. Dumpnodeview
    $ORACRF_HOME/bin/oclumon dumpnodeview -n halinux4
  3. Showgaps
    $ORACRF_HOME/bin/oclumon showgaps -n celx32oe40d  \
    -s "2009-07-09 02:40:00"  -e "2009-07-09 03:59:00"  

    Number of gaps found = 0
  4. Showtrail
    $ORACRF_HOME/bin/oclumon showtrail -n celx32oe40d -diskid \
    sde qlen totalwaittime -s "2009-07-09 03:40:00" \
    -e "2009-07-09 03:50:00" -c "red" "yellow" "green"

    Parameter=QUEUE LENGTH
    2009-07-09 03:40:00     TO      2009-07-09 03:41:31     GREEN
    2009-07-09 03:41:31     TO      2009-07-09 03:45:21     GREEN
    2009-07-09 03:45:21     TO      2009-07-09 03:49:18     GREEN
    2009-07-09 03:49:18     TO      2009-07-09 03:50:00     GREEN
    Parameter=TOTAL WAIT TIME

    $ORACRF_HOME/bin/oclumon showtrail -n celx32oe40d -sys cpuqlen \
    -s "2009-07-09 03:40:00" -e "2009-07-09 03:50:00" \
    -c "red" "yellow" "green"

    Parameter=CPU QUEUELENGTH 

    2009-07-09 03:40:00     TO      2009-07-09 03:41:31     GREEN
    2009-07-09 03:41:31     TO      2009-07-09 03:45:21     GREEN
    2009-07-09 03:45:21     TO      2009-07-09 03:49:18     GREEN
    2009-07-09 03:49:18     TO      2009-07-09 03:50:00     GREEN


 GUI Mode (Available only with OTN version)


Online mode can be used to detect problems live on the problem environment. The data can be viewed using Cluster Health Monitor utility /usr/lib/oracrf/bin/crfgui. The GUI is not installed on the nodes of the server but can be installed on any other client using crfinst.pl -g <Install_dir>

1. For example, To look at the load on a node you can run the command .

/usr/lib/oracrf/bin/crfgui.sh -m <Nodename>

The default refresh rate for this GUI is 1 second. To change refresh rate to 5 seconds execute

/usr/lib/oracrf/bin/crfgui.sh -n <Node_to_be_monitored> -r 5

2. Another attribute that can be passed to the tool is -d. This is used to view the data in the past from the current time. So if there was a node reboot 4 hours ago and you need to look at the data about 10 minutes before the reboot, you would pass -d "04:10:00"

/usr/lib/oracrf/bin/crfgui.sh -d "04:10:05"

All the above usage scenarios requires gui access to the nodes.

Data Collection

For Oracle 11.2.0.2 or later RAC installations use the diagcollection script that comes with Cluster Health Monitor:

GI_HOME/bin/diagcollection.sh --collect --chmos
or
GI_HOME/bin/oclumon dumpnodeview -allnodes -v > <your-directory>/<your-filename>


For other versions run

/usr/lib/oracrf/bin/oclumon dumpnodeview -allnodes -v -last "23:59:59" > <your-directory>/<your-filename>


Make sure <your-directory> has more than 2Gb space to create file<your-filename>
Zip or compress <your-filename> before uploading to the Service Request.

Also update the SR with the information when (date and time) you have observed a specific issue.

Contacts

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Details
 What is Cluster Health Monitor (IPD/OS)?
 Where can I get latest copy of Cluster Health Monitor?
 Cluster Health Monitor Diagnostic Data Collection Process
 Why Cluster Health Monitor ?
 What platforms can I run the Cluster Health Monitor?
Actions
 Installation
 Usage
 Data Collection
Contacts
 Scalability RAC Community
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.