Memory Reference Code Not Cleared for AP's Resulting in False SPX86-8001-ME Population Errors (Doc ID 1557029.1)

Last updated on NOVEMBER 16, 2016

Applies to:

Sun Fire X4270 M2 Server - Version Not Applicable and later
Sun Fire X4170 Server - Version Not Applicable and later
Oracle Exalogic Elastic Cloud X2-2 Hardware - Version All Versions and later
Exalogic Elastic Cloud X3-2 Hardware - Version All Versions and later
Sun Blade X6270 Server Module - Version Not Applicable and later
Information in this document applies to any platform.

Symptoms

The Intel microprocessor supports logging of errors using the MCA (machine check architecture).

MCA will indite memory error if an error occurs and report the errors to iLOM for diagnosis output.

 

There is an issue with early platform firmware that results in false SPX86-8001-ME errors being reported by iLOM under certain conditions.

 

The conditions are as follows: 

The platform tests Intel CPU's during POST (Power On Self Test) and reports any errors found to iLOM. However during this time, only the BSP (Boot Strap Processor) is active.

All other processors (AP's - Auxiliary Processors) are run through a series of tests but the test execution is stored in a special scratchpad area for reporting later in the boot cycle.

BIOS (Basic Input/Output System) typically reads and clears this scratchpad area to diagnose issues.

 

The condition that results in a possible SPX86-8001-ME is seen when a platform is reset during initialisation which does not allow for the scratchpad area to be cleared by BIOS and reports the scratchpad test execution as real errors on the next platform power-on.

 

On the next power-on BIOS will check the MC registers (Memory reference Code) and send the error information to iLOM if the MC registers were set.

iLOM will then report the error as a population violation.

The scratchpad was not cleared correctly so the error is reported falsely.

 

Error signature with iLOM Fault Diagnostics / ASR: 

Fault class : fault.memory.intel.dimm.none

FRU             : /SYS/MB
                   (Part Number: 111-1234-01)
                   (Serial Number: SOMESERNU)

Description : Memory DIMMs are not populated.

Response   : BIOS forwards error telemetry to the SP for diagnosis and logging in ILOM event log.

Impact       : The entire memory subsystem is unusable and there is no video output.

Action        : The administrator should review the ILOM event log for additional information pertaining to this diagnosis.  Please refer to the Details section of the Knowledge Article for additional information. Product Manufacturer = ORACLE CORPORATION

Changes

Platform reset / rebooted

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms