My Oracle Support Banner

A Linux Node in an Active Passive CRS Cluster Rebooted After the CSS Service Reported an Abnormal Termination (Doc ID 2615113.1)

Last updated on DECEMBER 24, 2019

Applies to:

Linux OS - Version Oracle Linux 6.0 and later
Information in this document applies to any platform.

Symptoms

An Oracle Linux 6 node in an active/passive CRS cluster rebooted unexpectedly, with the CSS service reporting an abnormal termination, with the following seen in the CRS Alert logs:

2019-09-09 23:46:01.985 [CRSD(4160)]CRS-2878: Failed to restart resource 'ptcprd.db'
2019-09-09 23:48:49.137 [OCSSD(3722)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /obin/oracle/app/grid/diag/crs/<HOSTNAME>/crs/trace/ocssd.trc
2019-09-09 23:48:49.181 [OCSSD(3722)]CRS-1652: Starting clean up of CRSD resources.

 

No other errors or issues were found in the CRS logs.

Available oswatcher performance data shows a high load with some application processes becoming stuck in D (uninterruptable) state:

zzz ***Mon Sep 9 23:48:54 EDT 2019
top - 23:48:55 up 347 days, 15:14, 0 users, load average: 18.80, 13.68, 12.61
Tasks: 257 total, 9 running, 248 sleeping, 0 stopped, 0 zombie
Cpu(s): 15.3%us, 2.0%sy, 0.0%ni, 33.7%id, 48.7%wa, 0.0%hi, 0.0%si, 0.3%st
Mem: 23645396k total, 20700396k used, 2945000k free, 594788k buffers
Swap: 37748732k total, 905908k used, 36842824k free, 16036240k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
30197 oracle 20 0 4352 576 392 S 56.3 0.0 2:27.63 gzip
20696 grid 20 0 1411m 29m 27m D 5.9 0.1 2:05.69 oracle_20696_+a 4144 root RT 0 1058m 118m 78m S 3.0 0.5 2606:27 osysmond.bin
5821 root 20 0 2257m 40m 4692 S 2.0 0.2 296:18.81 ir_agent
26299 root RT -5 864m 199m 79m S 2.0 0.9 476:39.28 ologgerd
30196 oracle 20 0 113m 1336 1012 D 2.0 0.0 0:05.63 tar
3469 root 20 0 1839m 39m 19m S 1.0 0.2 2472:06 ohasd.bin
3607 grid 20 0 676m 23m 16m S 1.0 0.1 2241:56 evmd.bin
3686 root RT 0 1057m 116m 78m S 1.0 0.5 814:29.33 cssdmonitor
4060 grid -2 0 1408m 15m 15m S 1.0 0.1 3458:42 asm_vktm_+asm2
4160 root 20 0 2269m 52m 25m S 1.0 0.2 2954:43 crsd.bin
4239 grid 20 0 2227m 39m 19m S 1.0 0.2 1413:41 oraagent.bin
20285 grid 20 0 742m 19m 12m S 1.0 0.1 80:29.06 scriptagent.bin
1 root 20 0 19416 1100 884 S 0.0 0.0 3:07.94 init
2 root 20 0 0 0 0 S 0.0 0.0 0:02.44 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 6:28.26 ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
6 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kworker/u:0
7 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/u:0H
8 root RT 0 0 0 0 S 0.0 0.0 3:40.53 migration/0
9 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
10 root 20 0 0 0 0 S 0.0 0.0 307:12.27 rcu_sched
11 root RT 0 0 0 0 S 0.0 0.0 1:22.59 watchdog/0

The I/O performance on the local disk was normal and as expected, however two ethernet interfaces on the server show a high number of RX-DRP packets (as seen below):

zzz ***Mon Sep 9 23:48:24 EDT 2019
Kernel Interface table
Iface MTU Met RX-OK RX-ERR RX-DRP RX-OVR TX-OK TX-ERR TX-DRP TX-OVR Flg
eth0 1500 0 682747150 0 21023747 0 1022800103 0 0 0 BMRU
eth0:1 1500 0 - no statistics available - BMRU
eth0:3 1500 0 - no statistics available - BMRU
eth0:4 1500 0 - no statistics available - BMRU
eth1 1500 0 695534403 0 21023748 0 766222333 0 0 0 BMRU
eth1:1 1500 0 - no statistics available - BMRU
lo 65536 0 327720758 0 0 0 327720758 0 0 0 LRU



Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.