AIX: Node Reboot Under High Load as not All OCSSD.BIN Threads are Running in Real-Time
Last updated on MARCH 12, 2018
Applies to:Oracle Database - Enterprise Edition - Version 10.2.0.1 and later
IBM AIX on POWER Systems (64-bit)
RAC node reboots while clusterware is up and running. NMON/OSWatcher stopped collecting OS stats a few minutes before the reboot happened as if the system was hanging. The AIX errpt output shows no errors being logged prior to the reboot.
The ocssd.log on the rebooted node does not show any problems leading up to the reboot.
- ocssd.log from rebooted node
2012-06-19 15:37:05.186: [ CSSD]clssnmSendingThread: sending status msg to all nodes
2012-06-19 15:37:05.187: [ CSSD]clssnmSendingThread: sent 4 status msgs to all nodes
>> reboot happened
2012-06-19 15:49:32.098: [ CSSD]clsu_load_ENV_levels: Module = CSSD, LogLevel = 1,
- alert.log for clusterware from rebooted node
[ohasd(4494861)]CRS-2112:The OLR service started on node prod2.
[ohasd(4494861)]CRS-1301:Oracle High Availability Service started on node prod2.
[ohasd(4494861)]CRS-8011:reboot advisory message from host: prod2, component: cssagent, with time stamp: L-2012-06-19-15:41:40.343
[ohasd(4494861)]CRS-8013:reboot advisory message text: Rebooting after limit 28341 exceeded; disk timeout 0, network timeout 28341, last heartbeat from CSSD at epoch seconds 1347073071.971, 28368 milliseconds ago based on invariant clock value of 2289749493
- not all threads for ocssd.bin are running at real time priority (0):
/usr/sysv/bin/ps -eLo user,s,pid,lwp,pri,args
oracle S 4323214 4716462 0 /app/11g/grid/11203/bin/ocssd.bin
oracle S 4323214 4716491 60 /app/11g/grid/11203/bin/ocssd.bin
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
Million Knowledge Articles and hundreds of Community platforms