My Oracle Support Banner

How to Prepare an Infiniband (IB) Fabric for Planned Outage of an IB Switch (Doc ID 2140928.1)

Last updated on MAY 16, 2022

Applies to:

Oracle SuperCluster Specific Software
Sun Datacenter InfiniBand Switch 36 - Version All Versions to All Versions [Release All Releases]
Sun Network QDR InfiniBand Gateway Switch - Version All Versions to All Versions [Release All Releases]
Exadata Database Machine V2 - Version All Versions and later
Information in this document applies to any platform.

Purpose

This document contains information on how to prepare an Infiniband (Infiniband) Fabric for any planned outage of an Infiniband Switch within that Infiniband Fabric. It also contains a checklist to assist Customer-admin to determine if a full Fabric outage will be required, based on the results of checks done.

Scope

IMPORTANT NOTE: For Infiniband switches within an Exalogic system or a multi-rack containing Exalogic, use:

How to Prepare an Exalogic Infiniband (Infiniband) Fabric for Planned Outage of an Infiniband Switch (Doc ID 2211261.1) instead of this document.


Planned Outage could include a Reboot (or boot after previous shut-down), Patching (firmware-upgrade), or Replacement of an Infiniband Switch in the Infiniband Fabric.

The checks and actions in this document are critical to ensuring that production traffic in the Infiniband (Infiniband) Fabric may be resilient to the necessary restart of the Infiniband Switch involved in any of the above operations.

Based on the result of the aforementioned checks, guidance is provided - via a checklist - as to whether a full downtime of the Infiniband Fabric will be required (full outage of all switches and nodes actively participating in the fabric). Customers should only take the Infiniband Switch outage within a production Infiniband Fabric, when all checks are cleared in the affirmative.

This document is referenced by several other Oracle Support knowledge articles, including:

   - How to Prepare an Infiniband Switch for Replacement (Doc ID 1636229.1)

The document distribution is EXTERNAL since it needs to be shared with and used by the Customer-admin, as well as referenced by Partners, Field Engineers, and Oracle Support.

 

Details

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Purpose
Scope
Details
 1. Backup ALL Infiniband switches in your fabric before you go any further ensuring you have a date and time of the backups 
 2. Checks for Infiniband fabric with multiple Infiniband Switches
 3. For a CRS Cluster, confirm fix is in place for node reboot on Infiniband Switch reboot issue
 4.  Check firmware version on all the Infiniband switches within the rack.
 5. Check that the Subnet Manager is running on ALL of the intended switches, not running on switches not intended to be running on, and ensure they have the correct priority, ControledHandover, and state
 6. Check that all Infiniband Switches can ping each other through management interfaces
 7. Check Infiniband Partitions and secret M-Key policy if configured
 8. Confirm type/extent of downtime required
 
9. Complete the check-list template – IB Fabric preparation for IB Switch planned outage.
 10. Data Collection and Upload
 11. Proceed to next steps
 Notes / Addendum
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.