My Oracle Support Banner

Bug 28298447:cluster crashed due to mellanox driver related issue (Doc ID 2460394.1)

Last updated on JANUARY 02, 2020

Applies to:

Oracle Database - Enterprise Edition - Version 12.2.0.1 and later
Information in this document applies to any platform.

Symptoms

CRS was crashed due to voting disk accessibility issue.

 

2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1034]{0}: skgxp_dmpaggpt: 0x7f01f00b57a0 flags: 0x9
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1035]{0}: SSKGXPT 0x7f01f00b57a0 flags 0x2 { WRITE } sockno 117 IP xxx.xxx.xxx.16 RDS 53382 lerr 0
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1036]{0}: SSKGXPT 0x7f01f00b57e8 flags 0x0 sockno 118 IP xxx.xxx.xxx.16 RDS 53382 lerr 11
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1037]{0}: SKGXPID 0xf00b55c4 vers 0 conproto 1 flags 8 magic 4c89
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1038]{0}: skgxp port number 0x25334837 process id 12809 admno 71d65b35 pad1 0 pad2 0
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1039]{0}: admin port id
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1040]{0}: SKGXPGPID Internet address xxx.xxx.xxx.15 RDS port number 57046, mask 22
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1041]{0}: SKGXPGPID Internet address xxx.xxx.xxx.16 RDS port number 53382, mask 22
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1042]{0}: from
2018-09-06 16:38:41.005 : DISKMON:480040704: SKGXP:[7f020c1462d0.1043]{0}: SKGXPGPID Internet address xxx.xxx.xxx.16 RDS port number 53382, mask 0
2018-09-06 16:38:41.012 : DISKMON:487200512: SKGXP:[7f020c25e340.231]{0}: (241476 -> 4404) SKGXP_SEND_HEART_BEAT: failed to send heart beat to [xxx.xxx.xxx.17/34060]
2018-09-06 16:38:41.012 : DISKMON:487200512: SKGXP:[7f020c25e340.232]{0}: SKGXP_DO_HEART_BEAT_RESP: NO HB PENDING source: 1 (max 2) in response from xxx.xxx.xxx.17 mhbr 192.168.108.0/529
2018-09-06 16:38:41.012 : DISKMON:487200512: SKGXP:[7f020c25e340.233]{0}: skgxp_dmpaggpt: 0x7f0204142530 flags: 0x9
2018-09-06 16:38:41.012 : DISKMON:487200512: SKGXP:[7f020c25e340.234]{0}: SSKGXPT 0x7f0204142530 flags 0x0 sockno 105 IP xxx.xxx.xxx.17 RDS 34060 lerr 11
2018-09-06 16:38:41.012 : DISKMON:487200512: SKGXP:[7f020c25e340.235]{0}: SSKGXPT 0x7f0204142578 flags 0x2 { WRITE } sockno 106 IP xxx.xxx.xxx.17 RDS 34060 lerr 0
2018-09-06 16:38:41.012 : DISKMON:487200512: SKGXP:[7f020c25e340.236]{0}: SKGXPID 0x41424e4 vers 0 conproto 1 flags 8 magic 4c89
2018-09-06 16:38:41.012 : DISKMON:487200512: SKGXP:[7f020c25e340.237]{0}: skgxp port number 0x58a22e69 process id 4404 admno 15c9378d pad1 0 pad2 0
2018-09-06 16:38:41.012 : DISKMON:487200512: SKGXP:[7f020c25e340.238]{0}: admin port id

2018-09-06 16:38:42.004 : DISKMON:477083392: SKGXP:[7f020c1c01b0.253]{0}: (241476 -> 19415) SKGXP_SEND_HEART_BEAT: failed to send heart beat to [xxx.xxx.xxx.19/43140]
2018-09-06 16:38:42.004 : DISKMON:480040704: dskm_hb_thrd_main11: got status change
2018-09-06 16:38:42.004 : DISKMON:480040704: INFO: Entering Cell Reconnect: rscnam: o/xxx.xxx.xxx.15;xxx.xxx.xxx.16 rsc: 0x7f01f005f7e0 state: UNREACHABLE reconn_attempts: 4 last_reconn_ts: 1536189900
2018-09-06 16:38:42.004 : DISKMON:480040704: dskm_ant_rsc_monitor_start: clearing short timeout mode

2018-09-06 16:38:50.720 : DISKMON:487200512: dskm_node_guids_are_offline: query SM done. retcode = 56891(REACHABLE)
2018-09-06 16:38:50.720 : DISKMON:487200512: dskm_hb_thrd_main: running in short sleep mode tout 500 msec iters: 2
2018-09-06 16:38:51.007 : DISKMON:477083392: SKGXP:[7f020c1c01b0.319]{0}: (241476 -> 19415) SKGXP_SEND_HEART_BEAT: failed to send heart beat to [xxx.xxx.xxx.19/43140]
2018-09-06 16:38:51.008 : DISKMON:477083392: SKGXP:[7f020c1c01b0.320]{0}: (241476 -> 19415) SKGXP_SEND_HEART_BEAT: failed to send heart beat to [xxx.xxx.xxx.20/10348]
2018-09-06 16:38:51.013 : DISKMON:487200512: SKGXP:[7f020c25e340.419]{0}: (241476 -> 4404) SKGXP_SEND_HEART_BEAT: failed to send heart beat to [xxx.xxx.xxx.17/34060]

2018-09-06 16:46:41.395 : DISKMON:608233216: dskm_process_isc: clearing short timeout mode. rsc: o/xxx.xxx.xxx.19;xxx.xxx.xxx.20 (0x7f36f805f7e0) communication appears to be stabilized
2018-09-06 16:46:41.395 : DISKMON:608233216: dskm_process_isc4: skgxpnetmapcompare no changes found
2018-09-06 16:46:41.395 : DISKMON:611133184: dskm_oss_thrd_main2: posted

File_name:: diskmon.trc

 Sendmesg calls failing with "lerr 11".

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.