My Oracle Support Banner

Infiniband Switch ib02 Subnet Manager Stuck In DISCOVER State (Doc ID 2344681.1)

Last updated on JULY 01, 2021

Applies to:

Exalogic Elastic Cloud X3-2 Hardware - Version X3 and later
Sun Datacenter InfiniBand Switch 36 - Version All Versions and later
Sun Network QDR InfiniBand Gateway Switch - Version All Versions and later
Information in this document applies to any platform.

Symptoms

Infiniband switch ib02 is stuck in DISCOVER state

[root@ib02 ~]# showfruinfo

Sun_Man1R:
  UNIX_Timestamp32 : Thu Jul 27 09:15:03 2017
  Sun_Fru_Description : ASSY,NM2-GW
  Vendor_ID_Code : 13 A6
  Vendor_ID_Code_Source : 01
  Vendor_Name_And_Site_Location : 5030 CELESTICA CORP. SRIRACHA CHONBURI TH
  Sun_Part_Number : 7057249
  Sun_Serial_Number : 465769T+1326RT02NG
  Serial_Number_Format : 4V3F1-2Y2W2X4S
  Initial_HW_Dash_Level : 99
  Initial_HW_Rev_Level : 01
  Sun_Fru_Shortname : NM2 gateway
  Sun_Hazard_Class_Code : Y
  Sun_SpecPartNo : 7054735

Sun_FRU_LabelR:
  Sun_Serial_Number : AKXXXXXXX
  FRU_Part_Dash_Number : 7054724


[root@ib02 ~]# getmaster
Local SM enabled and running, state DISCOVER <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Last change in Master SubnetManager status detected at: Thu Dec 28 13:53:55 GMT 2017
Master SubnetManager on sm lid 1 sm guid 0x10XXXXXXXXaaa0 : SUN IB QDR GW switch ib03 192.X.X.23
Master SubnetManager Activity Count: 31857 Priority: 5

 


Action Taken
[root@ib02 ~]# smsubnetprotection list active
No active secret mkeys configured on the system

[root@ib03 ~]# smsubnetprotection list active
# File_format_version_number 1
# Sun DCS IB mkey config file
# This file is generated, do not edit
# secretmkey=enabled
# nodeid=ib02.<domain>.com
# time= 4 Sep 23:38:30
# checksum=76e970f569a14507faab158ed4e9a40d
#! commit_number : 3
Mkey Untrusted Mkey Smkey Attribute
------------------ ------------------ ------------------ ---------
0xa00000000XXXXXXX 0xafecb1b0cad65d59 0x35cc89c81432d02e C

 

The following is logged in the /var/log/messages:
Dec 28 09:38:42 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:38:52 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:39:02 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:39:12 ib02 OpenSM[3532]: SM port is down#012

And the following in the /var/log/opensm.log:

Dec 28 15:00:07 419471 [B6D09B70] 0x02 -> osm_pi_rcv_process_probe: Port 0x10XXXXXXXXbbb0 has unknown M_Key, protection level 1
Dec 28 15:00:07 419471 [B5D07B70] 0x02 -> state_mgr_is_sm_port_down: SM is fenced out
Dec 28 15:00:07 420470 [B5D07B70] 0x80 -> SM port is down
SM port is down

Dec 28 15:00:17 426422 [B6D09B70] 0x02 -> osm_pi_rcv_process_probe: Port 0x10XXXXXXXXbbb0 has unknown M_Key, protection level 1
Dec 28 15:00:17 426422 [B5D07B70] 0x02 -> state_mgr_is_sm_port_down: SM is fenced out
Dec 28 15:00:17 427422 [B5D07B70] 0x80 -> SM port is down

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.