AIX: Node Reboot Under High Load as not All OCSSD.BIN Threads are Running in Real-Time (Doc ID 1493943.1)

Last updated on JULY 24, 2014

Applies to:

Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
IBM AIX on POWER Systems (64-bit)

Symptoms

RAC node reboots while clusterware is up and running. NMON/OSWatcher stopped collecting OS stats a few minutes before the reboot happened as if the system was hanging.  The AIX errpt output shows no errors being logged prior to the reboot.

The ocssd.log on the rebooted node does not show any problems leading up to the reboot.

 

2012-06-19 15:37:05.186: [ CSSD][3862]clssnmSendingThread: sending status msg to all nodes
2012-06-19 15:37:05.187: [ CSSD][3862]clssnmSendingThread: sent 4 status msgs to all nodes

>> reboot happened

2012-06-19 15:49:32.098: [ CSSD][1]clsu_load_ENV_levels: Module = CSSD, LogLevel = 1,

 

2012-06-19 15:48:58.578
[ohasd(4494861)]CRS-2112:The OLR service started on node prod2.
2012-06-19 15:48:58.738
[ohasd(4494861)]CRS-1301:Oracle High Availability Service started on node prod2.
2012-06-19 15:48:58.766
[ohasd(4494861)]CRS-8011:reboot advisory message from host: prod2, component: cssagent, with time stamp: L-2012-06-19-15:41:40.343
[ohasd(4494861)]CRS-8013:reboot advisory message text: Rebooting after limit 28341 exceeded; disk timeout 0, network timeout 28341, last heartbeat from CSSD at epoch seconds 1347073071.971, 28368 milliseconds ago based on invariant clock value of 2289749493

 

/usr/sysv/bin/ps -eLo  user,s,pid,lwp,pri,args

oracle S 4323214 4716462  0 /app/11g/grid/11203/bin/ocssd.bin
oracle S 4323214 4716491 60 /app/11g/grid/11203/bin/ocssd.bin
..

 

 

 

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms