My Oracle Support Banner

Unable To Connect To Database On Two Nodes RAC Due To Hung CRS (Doc ID 2653090.1)

Last updated on JULY 30, 2020

Applies to:

Oracle Database - Enterprise Edition - Version 12.1.0.2 and later
Information in this document applies to any platform.

Symptoms

Receiving CRS-2878: Failed to restart resource 'ora.net1.network' followed by CRS-6015: Oracle Clusterware has experienced an internal error.

CRS-2878: Failed to restart resource 'ora.net1.network' followed by
CRS-6015: Oracle Clusterware has experienced an internal error.

+ This is a Exadata quarter rack on X8-2 running 7 VM's per compute node for several 2 node RAC

Kernel version: 4.1.12-124.26.12.el6uek.x86_64 #2 SMP Wed May 8 22:12:22 PDT 2019 x86_64

+ During the incident given date-time we see all VM's have faced this network failure issue. As per OS log messages a grep on bondetho we see for all vm's the following:

dbxxxx.xxx.com: Mar  7 21:16:32 xxxx systemd-networkd: bondeth0 : Removing non-existent address: xx.xx.xx.x/xx (valid for ever)

+ Clusterware alert log.
 
250 PRVG-2031 : Owner of file "/etc/oracle/ocr.loc" did not match the expected value on node "xxxxx". [Expected = "root" ; Found = "vacheck"]  >>>>>>>>>>>>>>>
251 PRVG-13620 : Node application "ora.xxxxx.vip" is offline on node "xxxxx".
252 PRVG-2031 : Owner of file "/etc/oracle/olr.loc" did not match the expected value on node "xxxxx". [Expected = "root" ; Found = "vacheck"]    >>>>>>>>>>>>>>

+ From the crsd_oraagent_xxx.trc trace file, we can "Unplanned offline" for ora.scan2.vip and ora.scan3.vip and ora.xxx.vip.

[check] VipAgent::check Network check indicated UNPLANNED OFFLINE 2020-03-07 21:16:32.653 :CLSDYNAM:3296012032: [ora.scan3.vip]{1:20191:2}
[check] AgentException caught in VipAgent::check() 2020-03-07 21:16:32.653 :CLSDYNAM:3296012032: [ora.scan3.vip]{1:20191:2}
[check] VipAgent::check Network check indicated UNPLANNED OFFLINE

+ From crsd.trc file to know more about the error CRS-6015, there is an assetion failure thrown from clsPeResourceInstance3.cpp

Incident 81 created, dump file: /u01/app/grid/diag/crs/xxxx/crs/incident/incdir_81/crsd_i81.trc
CRS-6015 [] [] [] [] [] [] [] [] [] [] [] []
2020-03-07 21:16:38.508 :GIPCHTHR:1069537024:  gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30050loopCount 54

From incident/incdir_81/crsd_i81.trc

----- Invocation Context Dump -----
Error Descriptor: CRS-6015 [] [] [] [] [] [] [] [] [] [] [] []
Error class: 0
Problem Key # of args: 0
Number of actions: 10
----- Incident Context Dump -----
Address: 0x7f451c024fb8
Incident ID: 81
Problem Key: CRS 6015
Error: CRS-6015 [] [] [] [] [] [] [] [] [] [] [] []
[00]: dbgexProcessError [diag_dde]
[01]: dbgePostErrorDirectVaList_int [diag_dde]
[02]: dbgePostErrorDirect [diag_dde]
[03]: clsdAdrPostError []
[04]: clsdadrpr_CreateIncidentCheck []
[05]: clsdadrprAlert []
[06]: clsd_alertprintft []
[07]: _ZN9CLS_Debug6CLSlog5alertERSojPKcP8clsdmctx []<-- Signaling
[08]: _ZN9CLS_Debug9clsAssertEbPKhS1_i []
[09]: _ZNK7cls_pe316ResourceInstance18translateAgfwStateEjNS0_6ActionES1_ []
 
+ Resource ora.net1.network goes offline.

2020-03-07 21:16:37.772 :    AGFW:3323545344: [     INFO] {0:1:99} ora.net1.network xxxx 1 state changed from: STARTING to: OFFLINE
2020-03-07 21:16:37.772 :    AGFW:3323545344: [     INFO] {0:1:99} Switching online monitor to offline one
 
+ crsd_orarootagent_vacheck.trc

2020-03-07 21:16:29.823 : default:3319342848: ICMP Ping from 10.60.13.8 to 10.60.13.254
2020-03-07 21:16:32.230 :CLSDYNAM:3319342848: [ora.net1.network]{1:20191:2} [check] NetworkAgent::check checkLinkStatus returned false
 

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.