ASM Crashes as HAIP Does not Failover When Two or More Private Network Fails (Doc ID 1323995.1)

Last updated on AUGUST 06, 2014

Applies to:

Oracle Database - Enterprise Edition - Version 11.2.0.2 to 11.2.0.2 [Release 11.2]
Information in this document applies to any platform.

Symptoms


When two or more cluster_interconnect fails, HAIP does not fail over as orarootagent.bin core dumps; as a result, ASM/DB instance may crash. The private network failure does not have to be local - in other words, if two or more private network fails on remote node while there's no such issue at all on local node, orarootagent could core dump on local node.


Apr 11 14:59:13 srac1 nxge: [ID 339653 kern.notice] NOTICE: nxge7: xcvr addr:0x0a - link is down
Apr 11 15:06:41 srac1 nxge: [ID 339653 kern.notice] NOTICE: nxge6: xcvr addr:0x0b - link is down
..
2011-04-11 15:06:54.293: [ CRSCOMM][21][FFAIL] Ipc: Couldnt clscreceive message, no message: 11
2011-04-11 15:06:54.293: [ CRSCOMM][21] Ipc: Client disconnected.
2011-04-11 15:06:54.293: [ CRSCOMM][21][FFAIL] IpcL: Listener got clsc error 11 for memNum. 11
2011-04-11 15:06:54.293: [ CRSCOMM][21] IpcL: connection to member 11 has been removed
2011-04-11 15:06:54.293: [CLSFRAME][21] Removing IPC Member:{Relative|Node:0|Process:11|Type:3}
2011-04-11 15:06:54.293: [CLSFRAME][21] Disconnected from AGENT process: {Relative|Node:0|Process:11|Type:3}
2011-04-11 15:06:54.294: [   CRSPE][29] {0:0:1262} Disconnected from server:
2011-04-11 15:06:54.294: [    AGFW][24] {0:0:1264} Agfw Proxy Server received process disconnected notification, count=1
2011-04-11 15:06:54.294: [    AGFW][24] {0:0:1264} /ocw/grid/bin/orarootagent_root disconnected.
2011-04-11 15:06:54.294: [    AGFW][24] {0:0:1264} Agent /ocw/grid/bin/orarootagent_root[27021] stopped!
2011-04-11 15:06:54.294: [ CRSCOMM][24] {0:0:1264} IpcL: removeConnection: Member 11 does not exist.
2011-04-11 15:06:54.294: [    AGFW][24] {0:0:1264} Restarting the agent /ocw/grid/bin/orarootagent_root
2011-04-11 15:06:54.294: [    AGFW][24] {0:0:1264} Starting the agent: /ocw/grid/bin/orarootagent with user id: root and incarnation:6
2011-04-11 15:06:54.333: [    AGFW][24] {0:0:1264} Starting the HB [Interval =  30000, misscount = 6kill allowed=1] for agent: /ocw/grid/bin/orarootagent_root
2011-04-11 15:06:42.047: [    AGFW][10] {0:0:902} Agent received the message: AGENT_HB[Engine] ID 12293:26724
2011-04-11 15:06:42.590: [ora.crf][43] {0:0:892} [check] clsdmc_respget return: status=0, ecode=0
2011-04-11 15:06:42.590: [ora.crf][43] {0:0:892} [check] Check return = 0, state detail = NULL

>> orarootagent terminated and restarted

2011-04-11 15:06:54.484: [    AGFW][1] Starting the agent: /ocw/grid/log/srac2/agent/ohasd/orarootagent_root/
2011-04-11 15:06:54.484: [   AGENT][1] Agent framework initialized, Process Id = 24308
2011-04-11 15:06:54.488: [ USRTHRD][1] Utils::getCrsHome crsHome /ocw/grid
>>
mutex_lock_impl(0x40000000130, 0x0, 0xfffffd7fff7c0f30, 0x88, 0x0,
mutex_lock(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7fff2a37f8
lfiwr(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ffc7df997
clsdf_nativewrite(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ff4b06647
clsdprln_native(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ff4b09b26
clsd_logThread(0x0, 0x0, 0x0, 0x0, 0x0, 0x0), at 0xfffffd7ff4b0299c
SKGXP: ospid 5553: network interface with IP address 169.254.164.135 no longer running (check cable)

IPC Send timeout detected. Receiver ospid 5551

Received an instance abort message from instance 3


>>>>

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms