Exalogic Infiniband Switch Replacement - Follow-up Actions (Restoration)
(Doc ID 2218689.1)
Last updated on MARCH 07, 2022
Applies to:
Exalogic Elastic Cloud X4-2 Eighth Rack - Version X6 to X6 [Release X6]Exalogic Elastic Cloud X3-2 Hardware - Version X3 to X3 [Release X3]
Exalogic Elastic Cloud X4-2 Eighth Rack - Version X4 to X4 [Release X4]
Oracle Exalogic Elastic Cloud Software - Version 2.0.0.0.0 and later
Linux x86-64
Oracle Solaris on x86-64 (64-bit)
Oracle Virtual Server x86-64
Purpose
This Note provides follow-up reconfiguration steps for restoring the Exalogic Infiniband (IB) switch configuration after replacement.
For Exalogic racks running with January 2018 PSU 2.0.6.3.180116, steps in following MOS Note have to be followed for restoring the switch configuration.
This document is intended to be used immediately after physical replacement steps have been completed.
Prior to a customer running these procedures, an Oracle Field Engineer should have completed replacing the switch by following the two Canned Action Plan documents below.
Scope
Infiniband Switch replacements in Exalogic racks (Physical and Virtual).
Details
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Purpose |
Scope |
Details |
1. Validate the Firmware version on the Newly Replaced Switch |
2. Validate whether Subnet Manager is disabled on newly replaced Switch |
3. Validate SM controller_handover setting on other switch which is currently running the SUBNET MANAGER MASTER (SM MASTER) in the Fabric |
4. Validate that the physical installation of the new switch into the fabric was completed successfully. Run the "ibnetdiscover" and "ibswitches" command. |
5. Change the passwords of the root and ilom-admin users on the replacement switch to their previous values. |
Steps for changing the password for "root" user |
Steps for changing the password for "ilom-admin" user |
6. Update the smnodes list on the newly replaced switch with the IP addresses of all the switches in the Fabric running the subnet manager. |
7. Set SM Priority To Recommended Values on the replaced switch |
8. Validate if ocadmin SNMP community exists and create it if it does not exists |
9. Restore the Switch configuration from Exabr backups. |
a. Check if you are able to login to newly replaced switch using "root" user from Compute Node 1 and EMOC Control vServer |
b. View a list of backups by running ExaBR list command on replacement Infiniband Switch |
c. Use ExaBR restore command on newly replaced InfiniBand switch to restore the configuration from exabr backups: |
d. Validate if the exabr restore command restored the Switch configuration successfully. |
e. Run "smpartition check" Command on Switch that is currently running SM Master and Make sure Parititions are OK |
f. Run "smpartition start && smpartition commit" command on the other Switch that is currently running SM Master to propagate the partitions to newly replaced Switch |
10. Validate whether the VNICs and VLANs are seen after Exabr restore on the newly replaced Switch. |
11. Register the Port GUIDs of Newly Replaced Switch with EoIB Partitions using "exabr ib-register" command. |
12. Validate the status of VNICs and VLANs |
13. Additional restoration steps for Virtual & Hybrid racks with EMOC |
13(a) Remove old SSH keys for Replaced Infiniband Switch IP from known_hosts file of Proxy Controller VM's PC1 & PC2. |
13(b) Rediscover newly replaced Infiniband Switch from EMOC - Virtual and Hybrid Racks |
Final checkup and verification |
a. Check/set firewall rule settings on port 623 |
b. Check the opensm status and smpriorities on all switches in the IB fabric |
c. Check network/fabric is operating normally |
d. Take a fresh Exalogic Control vServers backup using Exabr. |
e. Collect fresh full exalogs from the rack after Switch replacement. |
KNOWN ISSUES WHICH CAN BE ENCOUNTERED DURING SWITCH REPLACEMENT |
Exalogic Exabr ib-register Command To Register New Replaced Infiniband Switch Port GUIDs Fails With "Unable to get rpc version on some nodes in the fabric" Error |
Exalogic: "smpartition start" & "exabr ib-register" Commands Failing With "cli commit is in progress" Error |
Exalogic: SM Status Of New Replaced Standby NM2-GW IB Switches (With Firmware Versions 2.2.7 Or Older) Shown As "DISCOVER" Instead Of "STANDBY" |
Exalogic: Exabr ib-register Command On IB Switches Failing With "Error: partition key <Pkey> does not exist on the Infiniband fabric" Errors |
References |