My Oracle Support Banner

Exalogic Infiniband Switch Replacement - Follow-up Actions (Restoration) (Doc ID 2218689.1)

Last updated on MARCH 07, 2022

Applies to:

Exalogic Elastic Cloud X4-2 Eighth Rack - Version X6 to X6 [Release X6]
Exalogic Elastic Cloud X3-2 Hardware - Version X3 to X3 [Release X3]
Exalogic Elastic Cloud X4-2 Eighth Rack - Version X4 to X4 [Release X4]
Oracle Exalogic Elastic Cloud Software - Version 2.0.0.0.0 and later
Linux x86-64
Oracle Solaris on x86-64 (64-bit)
Oracle Virtual Server x86-64






Purpose

This Note provides follow-up reconfiguration steps for restoring the Exalogic Infiniband (IB) switch configuration after replacement.

For Exalogic racks running with January 2018 PSU 2.0.6.3.180116, steps in following MOS Note have to be followed for restoring the switch configuration.

<Note 2482924.1>: Exalogic: How To Restore Infiniband Switch Configuration After Switch Replacement In Exalogic Racks Running with January 2018 PSU 2.0.6.3.180116

This document is intended to be used immediately after physical replacement steps have been completed. 

Prior to a customer running these procedures, an Oracle Field Engineer should have completed replacing the switch by following the two Canned Action Plan documents below.

<Note 1383773.1>: How to Replace a Failed Sun Network QDR InfiniBand Gateway Switch 
<Note 1341658.1>: How to Replace a Failed Sun Datacenter InfiniBand Switch 36 

Scope

Infiniband Switch replacements in Exalogic racks (Physical and Virtual).

Details

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Purpose
Scope
Details
 1. Validate the Firmware version on the Newly Replaced Switch
 2. Validate whether Subnet Manager is disabled on newly replaced Switch
 3. Validate SM controller_handover setting on other switch which is currently running the SUBNET MANAGER MASTER (SM MASTER) in the Fabric
 4. Validate that the physical installation of the new switch into the fabric was completed successfully. Run the "ibnetdiscover" and "ibswitches" command.
 5. Change the passwords of the root and ilom-admin users on the replacement switch to their previous values.
 Steps for changing the password for "root" user
 Steps for changing the password for "ilom-admin" user
 6. Update the smnodes list on the newly replaced switch with the IP addresses of all the switches in the Fabric running the subnet manager.
 7. Set SM Priority To Recommended Values on the replaced switch
 8. Validate if ocadmin SNMP community exists and create it if it does not exists
 9. Restore the Switch configuration from Exabr backups.
 a. Check if you are able to login to newly replaced switch using "root" user from Compute Node 1 and EMOC Control vServer
 b. View a list of backups by running ExaBR list command on replacement Infiniband Switch
 c. Use ExaBR restore command on newly replaced InfiniBand switch to restore the configuration from exabr backups:
 d. Validate if the exabr restore command restored the Switch configuration successfully.
 e. Run "smpartition check" Command on Switch that is currently running SM Master and Make sure Parititions are OK
 f. Run "smpartition start && smpartition commit" command on the other Switch that is currently running SM Master to propagate the partitions to newly replaced Switch
 10. Validate whether the VNICs and VLANs are seen after Exabr restore on the newly replaced Switch.
 11. Register the Port GUIDs of Newly Replaced Switch with EoIB Partitions using "exabr ib-register" command.
 12. Validate the status of VNICs and VLANs
 13. Additional restoration steps for Virtual & Hybrid racks with EMOC
 13(a) Remove old SSH keys for Replaced Infiniband Switch IP from known_hosts file of Proxy Controller VM's PC1 & PC2.
 13(b) Rediscover newly replaced Infiniband Switch from EMOC - Virtual and Hybrid Racks
 Final checkup and verification
 a. Check/set firewall rule settings on port 623
 b. Check the opensm status and smpriorities on all switches in the IB fabric
 c. Check network/fabric is operating normally
 d. Take a fresh Exalogic Control vServers backup using Exabr.
 e. Collect fresh full exalogs from the rack after Switch replacement.
 KNOWN ISSUES WHICH CAN BE ENCOUNTERED DURING SWITCH REPLACEMENT
 Exalogic Exabr ib-register Command To Register New Replaced Infiniband Switch Port GUIDs Fails With "Unable to get rpc version on some nodes in the fabric" Error
 Exalogic: "smpartition start" & "exabr ib-register" Commands Failing With "cli commit is in progress" Error
 Exalogic: SM Status Of New Replaced Standby NM2-GW IB Switches (With Firmware Versions 2.2.7 Or Older) Shown As "DISCOVER" Instead Of "STANDBY"
 Exalogic: Exabr ib-register Command On IB Switches Failing With "Error: partition key <Pkey> does not exist on the Infiniband fabric" Errors
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.