My Oracle Support Banner

How to Prepare an Infiniband (IB) Fabric for Planned Outage of an IB Switch (Doc ID 2140928.1)

Last updated on AUGUST 23, 2020

Applies to:

Sun Datacenter InfiniBand Switch 36 - Version All Versions to All Versions [Release All Releases]
Oracle SuperCluster Specific Software
Sun Network QDR InfiniBand Gateway Switch - Version All Versions to All Versions [Release All Releases]
Exadata Database Machine V2 - Version All Versions and later
Information in this document applies to any platform.

Purpose

This document contains information on how to prepare an Infiniband (IB) Fabric for any planned outage of an Infiniband Switch within that IB Fabric. It also contains a checklist to assist Customer-admin to determine if a full Fabric outage will be required, based on the results of checks done.

Scope

Note: For IB switches within an exalogic system or a multirack containing exalogic, use Doc ID 2211261.1 instead of this document.

Planned Outage could include a Reboot (or boot after previous shut-down), Patching (firmware-upgrade), or Replacement of an IB Switch in the IB Fabric.

The checks and actions in this document are critical to ensuring that production traffic in the Infiniband (IB) Fabric may be resilient to the necessary restart of the IB Switch involved in any of the above operations.

Based on the result of the aforementioned checks, guidance is provided - via a checklist - as to whether a full downtime of the IB Fabric will be required (full outage of all switches and nodes actively participating in the fabric). Customers should only take the IB Switch outage within a production IB Fabric, when all checks are cleared in the affirmative.

This document is referenced by several other Oracle Support knowledge articles, including:

   - How to Prepare an Infiniband Switch for Replacement (Doc ID 1636229.1)

The document distribution is EXTERNAL since it needs to be shared with and used by the Customer-admin, as well as referenced by Partners, Field Engineers, and Oracle Support.

 

Details

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Purpose
Scope
Details
 1. Checks for IB fabric with multiple IB Switches
  1.1. Confirm Hosts bonding/IPMP/IO-path redundancy
  1.2. For a CRS Cluster, confirm fix is in place for node reboot on IB Switch reboot issue
  1.5. Check the opensm status and smpriorities on all switches
  1.6. Check IB Fabric using “ibswitches” and “getmaster”
  1.7.  Check that all IB Switches can ping each other through management interfaces
  1.8. Check IB partitions and secret M-Key policy
 2.  Confirm type/extent of downtime required
 3. Complete the check-list template – IB Fabric preparation for IB Switch planned outage.
 4. Data Collection and Upload
 5. Proceed to next steps
 Notes / Addendum
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.