Memory Reference Code Not Cleared for Auxiliary Processors Resulting in False SPX86-8001-U5 UE Error
(Doc ID 1557001.1)
Last updated on OCTOBER 29, 2021
Applies to:
Sun Server X2-4 - Version Not Applicable and laterSun Fire X4170 Server - Version Not Applicable and later
Sun Fire X4275 Server - Version Not Applicable and later
Oracle Exalogic Elastic Cloud X2-2 Hardware - Version X2 and later
Exalogic Elastic Cloud X3-2 Hardware - Version X3 and later
Information in this document applies to any platform.
Symptoms
The Intel microprocessor supports logging of errors using the MCA (machine check architecture).
MCA will indite memory error if an error occurs and report the errors to iLOM for diagnosis output.
There is an issue with early platform firmware that results in false SPX86-8001-U5 errors being reported by iLOM under certain conditions.
The conditions are as follows:
The platform tests Intel CPU's during POST (Power On Self Test) and reports any errors found to iLOM. However during this time, only the BSP (Boot Strap Processor) is active.
All other processors (AP's - Auxiliary Processors) are run through a series of tests but the test execution is stored in a special scratchpad area for reporting later in the boot cycle.
BIOS (Basic Input/Output System) typically reads and clears this scratchpad area to diagnose issues.
The condition that results in a possible SPX86-8001-U5 is seen when a platform is reset during initialisation which does not allow for the scratchpad area to be cleared by BIOS and reports the scratchpad test execution as real errors on the next platform power-on.
On the next power-on BIOS will check the MC registers (Memory reference Code) and send the error information to iLOM if the MC registers were set.
iLOM will then report the error as failing DIMMS with a UE condition (Uncorrectable Error).
The scratchpad was not cleared correctly so the error is reported falsely.
Error signature in iLOM SEL (system event log):
1 | 1/1/2012 | 00:00:01 | System Boot Initiated | Initiated by warm reset | Asserted
2 | 1/1/2012 | 00:00:01 | System Firmware Progress | Memory initialization | Asserted
3 | 1/1/2012 | 00:00:01 | System Boot Initiated | Initiated by warm reset | Asserted
4 | 1/1/2012 | 00:00:01 | System Firmware Progress | Memory initialization | Asserted
5 | 1/1/2012 | 00:00:01 | System Firmware Progress | Primary CPU initialization | Asserted
6 | 1/1/2012 | 00:00:01 | System Firmware Progress | Management controller initialization | Asserted
7 | 1/1/2012 | 00:00:01 | System Firmware Progress | Secondary CPU Initialization | Asserted
8 | 1/1/2012 | 00:00:01 | Memory | Uncorrectable Error | Asserted | OEM Data-2 0x00 OEM Data-3 0x08
Error signature with iLOM Fault Diagnostics / ASR:
ASR:Memory Uncorrectable ECC Fault
Critical alert on faulty component MB/P0/D8. A system component faulted due to fault in memory intel dimm_ue.
Event Time = Tue Dec 1 00:00:01 UTC 2012
Fault Message ID = SPX86-8001-U5
Fault UUID = e1111111-d222-c333-d444-e55555555555
Knowledge Article URL = http://www.sun.com/msg/SPX86-8001-U5
Fault Description = Not available
Fault Severity = other
Product Manufacturer = ORACLE CORPORATION
Product Name = SUN FIRE X4170 M2 SERVER
Product Serial Number = SOMESERNU
Chassis Manufacturer = SOMECHAMU
Chassis Name = SOMECHANA
Chassis Serial Number = SOMECHANU
Chassis Part Number = SOMECHAPA
DiagEntity = SOMEDIAGEN
SystemIdentifier = Not available
Changes
Platform reset / rebooted
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |