My Oracle Support Banner

I/O SERD Threshold Values Are Set Too Low And May Result In PCIEX-8000-J5, PCIEX-8000-YJ And PCIEX-8000-KP Faults. (Doc ID 1617956.1)

Last updated on JANUARY 26, 2024

Applies to:

Sun SPARC Enterprise M9000-32 Server - Version All Versions and later
Oracle SuperCluster T5-8 Hardware - Version All Versions and later
SPARC M5-32 - Version All Versions and later
Oracle SuperCluster T5-8 Full Rack - Version All Versions and later
SPARC M6-32 - Version All Versions and later
Information in this document applies to any platform.

Symptoms

PCIEX-8000-J5 and/or PCIEX-8000-KP FMA faults similar to those shown below will be reported. Systems utilizing InfiniBand fabric, i.e., T5-8 SSC are seen to be more susceptible to these faults.

The fmdump -e output will contain ereport.io.pciex.dl.btlp and/or ereport.io.pciex.dl.bdllp events. Examples are below the FMA examples.

-------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Jan 26 23:29:14 <UUID> PCIEX-8000-J5  Major

Problem Status    : isolated
Diag Engine       : eft / 1.16
System
   Manufacturer  : unknown
   Name          : unknown
   Part_Number   : unknown
   Serial_Number : unknown
   Host_ID       : 84fbxxx

----------------------------------------
----------------------------------------
Suspect 1 of 1 :
  Fault class : fault.io.pciex.device-interr-corr
  Certainty   : 100%
  Affects     : dev:////pci@540/pci@1
  Status      : faulted but still in service

  FRU
    Location         : "/SYS/PM2"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7056873
    Revision         : 08
    Serial_Number    : xxxxx
    Chassis
       Manufacturer  : Oracle Corporation
       Name          : SPARC T5-8
       Part_Number   : 7068108
       Serial_Number : xxxxx
       Status        : faulty

Description : Too many recovered internal errors have been detected within the
             specified PCIEX device. This may degrade into a non-recoverable
             specified PCIEX device. This may degrade into a non-recoverable
             fault.

 


--------------- ------------------------------------  -------------- ---------
TIME            EVENT-ID                              MSG-ID         SEVERITY
--------------- ------------------------------------  -------------- ---------
Jan 26 23:56:38 <UUID> PCIEX-8000-KP  Major

Problem Status    : solved
Diag Engine       : eft / 1.16
System
   Manufacturer  : unknown
   Name          : unknown
   Part_Number   : unknown
   Serial_Number : unknown
   Host_ID       : 84fbcxxx

----------------------------------------
Suspect 1 of 2 :
  Fault class : fault.io.pciex.device-interr-corr
  Certainty   : 100%
  Affects     : dev:////pci@5c0/pci@1/pci@0
  Status      : faulted but still in service

  FRU
    Location         : "/SYS/MB"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7070931
    Revision         : 03
    Serial_Number    :xxxxx
    Chassis
       Manufacturer  : Oracle Corporation
       Name          : SPARC T5-8
       Part_Number   : 7068108
       Serial_Number : xxxxx
       Status        : faulty
----------------------------------------
Suspect 2 of 2 :
  Fault class : fault.io.pciex.bus-linkerr-corr
  Certainty   : 100%
  Affects     : dev:////pci@5c0/pci@1/pci@0
  Status      : faulted but still in service

  FRU
    Location         : "/SYS/MB"
    Manufacturer     : unknown
    Name             : unknown
    Part_Number      : 7070931
    Revision         : 03
    Serial_Number    : xxxxx
    Chassis
       Manufacturer  : Oracle Corporation
       Name          : SPARC T5-8
       Part_Number   : 7068108
       Serial_Number : xxxxx
       Status        : faulty

Description : Too many recovered bus errors have been detected, which indicates
             a problem with the specified bus or with the specified
             transmitting device. This may degrade into an unrecoverable
             fault.


Verify the type and number of PCIe errors that triggered the fault event;

 fmdump -eVu {UUID}

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
 Purpose
 Scope
 Details
References

My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.