My Oracle Support Banner

The BDA utility, bdacheckhw Reports a Memory Failure: "WARNING: Hardware errors reported by ILOM : fault.memory.intel.sb.dimm_ce" (Doc ID 1956549.1)

Last updated on JANUARY 05, 2017

Applies to:

Big Data Appliance X3-2 Hardware - Version All Versions and later
Big Data Appliance X4-2 Hardware - Version All Versions and later
Big Data Appliance Hardware - Version All Versions and later
x86_64

Symptoms

1. Running the BDA utility, bdacheckhw on a node, reports a memory failure:  fault.memory.intel.sb.dimm_ce.

The complete message looks like:

WARNING: Hardware errors reported by ILOM : fault.memory.intel.sb.dimm_ce
INFO: Run 'ipmitool sunoem cli "show faulty"' to see the full error
...
WARNING: Big Data Appliance warnings during hardware validation checks


2. Running 'ipmitool sunoem cli "show faulty"' reports the same:

# ipmitool sunoem cli "show faulty"
  
Connected. Use ^D to exit.
-> show faulty
  
Target              | Property               | Value
--------------------+------------------------+---------------------------------
/SP/faultmgmt/0     | fru                    | /SYS/MB/P0/D7
/SP/faultmgmt/0/    | class                  | fault.memory.intel.sb.dimm_ce
faults/0            |                        |
/SP/faultmgmt/0/    | sunw-msg-id            | SPX86-8004-CE
faults/0
...                             


3. Rebooting the server clears the memory fault.

4. The ILOM snapshot after the reboot confirms the fault is cleared.  In other words, the ILOM snapshot confirms no DIMM faults are present after reboot.

a) From ./ilom/@usr@local@bin@spshexec_show_-script_@X@logs@event@list.out we see the fault and that it was cleared after reboot:

1060   Wed Dec  3 15:08:08 2014  Fault     Repair    minor
      Fault fault.memory.intel.sb.dimm_ce on component /SYS/MB/P0/D7 cleared
...
1058   Tue Dec  2 11:51:11 2014  Fault     Fault     critical
      Fault detected at time = Tue Dec  2 11:51:11 2014. The suspect component:
       /SYS/MB/P0/D7 has fault.memory.intel.sb.dimm_ce with probability=100. Re
      fer to http://www.sun.com/msg/SPX86-8004-CE for details.


b) From -> show faulty after reboot, the fault is cleared:

Target              | Property               | Value                         
--------------------+------------------------+---------------------------------

-> Session closed

ipmiint_sunoem_led_get.out fault leds

P0/SERVICE       | OFF
...
P0/D6/SERV       | OFF
P0/D7/SERV       | OFF<<<<<<<<<<<<<<<  Not faulted
P1/SERVICE       | OFF
...
P1/D7/SERV       | OFF

c) Also from the ILOM snapshot the fault is "Repaired"/"Resolved" after reboot:

...
2014-12-02/11:51:11  ef3f77c5-d16b-6a08-fca1-dbce2c725eee   SPX86-8004-CE 
       FRU       = /SYS/MB/P0/D7

2014-12-03/15:08:08  ef3f77c5-d16b-6a08-fca1-dbce2c725eee SPX86-8004-CE Repaired

2014-12-03/15:08:08  ef3f77c5-d16b-6a08-fca1-dbce2c725eee SPX86-8004-CE Resolved
...




Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.