Infiniband Switch ib02 Subnet Manager Stuck In DISCOVER State
(Doc ID 2344681.1)
Last updated on JULY 01, 2021
Applies to:
Exalogic Elastic Cloud X3-2 Hardware - Version X3 and laterSun Datacenter InfiniBand Switch 36 - Version All Versions and later
Sun Network QDR InfiniBand Gateway Switch - Version All Versions and later
Information in this document applies to any platform.
Symptoms
Infiniband switch ib02 is stuck in DISCOVER state
[root@ib02 ~]# showfruinfo
Sun_Man1R:
UNIX_Timestamp32 : Thu Jul 27 09:15:03 2017
Sun_Fru_Description : ASSY,NM2-GW
Vendor_ID_Code : 13 A6
Vendor_ID_Code_Source : 01
Vendor_Name_And_Site_Location : 5030 CELESTICA CORP. SRIRACHA CHONBURI TH
Sun_Part_Number : 7057249
Sun_Serial_Number : 465769T+1326RT02NG
Serial_Number_Format : 4V3F1-2Y2W2X4S
Initial_HW_Dash_Level : 99
Initial_HW_Rev_Level : 01
Sun_Fru_Shortname : NM2 gateway
Sun_Hazard_Class_Code : Y
Sun_SpecPartNo : 7054735
Sun_FRU_LabelR:
Sun_Serial_Number : AKXXXXXXX
FRU_Part_Dash_Number : 7054724
[root@ib02 ~]# getmaster
Local SM enabled and running, state DISCOVER <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Last change in Master SubnetManager status detected at: Thu Dec 28 13:53:55 GMT 2017
Master SubnetManager on sm lid 1 sm guid 0x10XXXXXXXXaaa0 : SUN IB QDR GW switch ib03 192.X.X.23
Master SubnetManager Activity Count: 31857 Priority: 5
Action Taken
[root@ib02 ~]# smsubnetprotection list active
No active secret mkeys configured on the system
[root@ib03 ~]# smsubnetprotection list active
# File_format_version_number 1
# Sun DCS IB mkey config file
# This file is generated, do not edit
# secretmkey=enabled
# nodeid=ib02.<domain>.com
# time= 4 Sep 23:38:30
# checksum=76e970f569a14507faab158ed4e9a40d
#! commit_number : 3
Mkey Untrusted Mkey Smkey Attribute
------------------ ------------------ ------------------ ---------
0xa00000000XXXXXXX 0xafecb1b0cad65d59 0x35cc89c81432d02e C
The following is logged in the /var/log/messages:
Dec 28 09:38:42 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:38:52 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:39:02 ib02 OpenSM[3532]: SM port is down#012
Dec 28 09:39:12 ib02 OpenSM[3532]: SM port is down#012
And the following in the /var/log/opensm.log:
Dec 28 15:00:07 419471 [B6D09B70] 0x02 -> osm_pi_rcv_process_probe: Port 0x10XXXXXXXXbbb0 has unknown M_Key, protection level 1
Dec 28 15:00:07 419471 [B5D07B70] 0x02 -> state_mgr_is_sm_port_down: SM is fenced out
Dec 28 15:00:07 420470 [B5D07B70] 0x80 -> SM port is down
SM port is down
Dec 28 15:00:17 426422 [B6D09B70] 0x02 -> osm_pi_rcv_process_probe: Port 0x10XXXXXXXXbbb0 has unknown M_Key, protection level 1
Dec 28 15:00:17 426422 [B5D07B70] 0x02 -> state_mgr_is_sm_port_down: SM is fenced out
Dec 28 15:00:17 427422 [B5D07B70] 0x80 -> SM port is down
Changes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |