X4800 / X2-8 Server Might Post CPU/PCI Faults and Fail to Boot after Firmware Upgrade on Sun InfiniBand Dual Port 4x QDR PCIe EM
(Doc ID 2116695.1)
Last updated on OCTOBER 20, 2021
Applies to:
Sun Fire X4800 Server - Version Not Applicable to Not Applicable [Release N/A]Sun Server X2-8 - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.
Symptoms
The system posts CPU/PCI faults and fails to boot.
e.g. FMA fault on SP:
fault.cpu.intel.internal --> SPX86-8000-F4
HOST Console output:
[ 162.215634] mlx4_core 0000:08:00.0: command 0x23 timed out (go bit not cleared).
[ 162.292781] mlx4_core 0000:08:00.0: device is going to be reset.
[ 163.411212] pcieport 0000:00:05.0: AER: Uncorrected (Non-Fatal) error received: id=0028.
[ 163.498970] pcieport 0000:00:05.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, id=0028(Requester ID).
[ 163.644466] pcieport 0000:00:05.0: device [8086:340c] error status/mask=00004000/00000000.
[ 163.728809] pcieport 0000:00:05.0: [14] Completion Timeout (First).
Changes
On systems which have both PCIe EM slots of a CMOD/IOH populated with a Sun InfiniBand Dual Port 4x QDR PCIe ExpressModule Host Channel Adapter M2.
The issue might be triggered by:
- Upgrading InfiniBand Dual Port 4x QDR PCIe EMs to FW 2.11.2014 or later.
- Installing a spare part with FW 2.11.2014 or later.
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |