Sun Fire[TM] Server System Board (SB) voltage errors.

(Doc ID 1019667.1)

Last updated on OCTOBER 18, 2017

Applies to:

Sun Fire 4810 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire V1280 Server - Version Not Applicable and later
Sun Fire E4900 Server - Version Not Applicable and later
Sun Fire 3800 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun Fire 6800 Server - Version Not Applicable to Not Applicable [Release N/A]
All Platforms

Symptoms

This document describes how to identify and resolve System Board (SB) errors related to voltage.  The servers included are as follows:

This document does not address I/O Board (IB) voltage errors.  See Document 1017844.1 if you have similar errors associated to an IB.

Error Messages:
Look for these "Key Indicators" of a voltage issue in error messaging in the System Controller's (SC) log files (showlogs) or console:

Path broken between CBH and SDC
Device voltage problem: /N0/SB#

Attempt to power up /N0/SB
# failed

/N0/SB#, sensor status, outside acceptable limits 


(where # is the board number)

Some examples of those "Key Indicators" from actual failure messaging found in showlogs files:

Fri Sep 26 11:45:46 sc lom: [ID 360430 local0.error] Device voltage problem: /N0/SB0 abnormal state for device: Board 0 3.3 VDC 0 Value: 0.37 Volts DC
Fri Sep 26 11:45:46 sc lom: [ID 322610 local0.notice] CPU Board V3 at /N0/SB0 Device poll caused: sun.serengeti.FailedHwException: (SdcAsic)Asic.getTemp: Path broken between CBH and SDC: SB0.sdc.10 (12000010)
Fri Sep 26 11:45:46 sc lom: [ID 336982 local0.notice] Device will not be polled
Fri Sep 26 11:45:46 sc lom: [ID 664082 local0.notice] CPU Board V3 at /N0/SB0 Device poll caused: sun.serengeti.FailedHwException: (ArAsic)Asic.getTemp: Path broken between CBH and SDC: SB0.ar.10 (12080010)
Fri Sep 26 11:45:46 sc lom: [ID 336982 local0.notice] Device will not be polled


Sat Sep 27 06:16:24 sc lom: [ID 395834 local0.error] Attempt to power up /N0/SB0 failed: /N0/SB0 3.3V DC failed, observed: 0.15 volts
Sat Sep 27 06:16:25 sc lom: [ID 503827 local0.error] sun.serengeti.HpuFailedException: CPU Board V3 at /N0/SB0
Sat Sep 27 06:16:25 sc lom: [ID 889337 local0.notice] sun.serengeti.CommException
Sat Sep 27 06:16:29 sc lom: [ID 304509 local0.error] No usable Cpu board in domain.


Wed Oct 01 21:56:10 sc lom: [ID 390680 local0.notice] CPU Board V3 at /N0/SB0 Device poll caused: sun.serengeti.HpuFailedException: CpuVoltageA2D.getOutputVoltage: sun.serengeti.CommException: I2cComm.readCmd:  Path broken between CBH and SDC: SB0.sbbc1.regs.c0 (102000c0)
Wed Oct 01 21:56:10 sc lom: [ID 336982 local0.notice] Device will not be polled
Wed Oct 01 21:56:10 sc lom: [ID 120592 local0.notice] /N0/SB0, sensor status, outside acceptable limits (7,1,0x207000d00070000)

All examples above showed SB0, but the board in question could be any SB in the system and the errors would generally be similar. 

The showenvironment command may also show an "ERROR LOW" status for the SB and a 3.3 VDC sensor value of 0.xx (in other words, less then the LoWarn value).

lom> showenvironment -v


Slot    Device     Sensor       Min    LoWarn Value  HiWarn Max    Units     Age     Status


------- ---------- ------------ ------ ------ ------ ------ ------ --------- ------- ------


   ***** Results truncated for this example *****


/N0/SB0 Board 0    3.3 VDC 0      2.97   3.13 0.49     3.47   3.63 Volts DC    5 min *** ERROR LOW ***


   ***** Results truncated for this example *****


/N0/SB2 Board 0    1.5 VDC 0      1.35   1.42 1.51     1.58   1.65 Volts DC    9 sec OK


/N0/SB2 Board 0    3.3 VDC 0      2.97   3.13 3.27     3.47   3.63 Volts DC    9 sec OK


Expected Behavior:
When a server encounters an SB voltage error and the domain is not yet booted or in operation the domain it is part of will either fail POST tests, domain or board poweron, a Keyswitch operation, or fail to boot properly.  If the domain is already in operation it will crash when a SB encounters a voltage issue.

If the domain crashes, showlogs data might indicate all sorts of Parity Error events as having taken place, such as an Address Parity Error, Parity Bidi Event, L2CheckError Event, or more.  The most important thing to note is that when a domain crashes in addition to one of the key following errors, the root cause is likely to be a voltage issue which caused the Parity Error event - not the other way around.  See the Additional Information section of this article for an example.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms