My Oracle Support Banner

Sun SPARC[TM] Enterprise M4000/M5000 multiple "IO Manager:Link error" in context of other errored components due to IOU#0 (Doc ID 1530633.1)

Last updated on MAY 09, 2018

Applies to:

Sun SPARC Enterprise M5000 Server - Version Not Applicable to Not Applicable [Release N/A]
Sun SPARC Enterprise M4000 Server - Version Not Applicable to Not Applicable [Release N/A]
Information in this document applies to any platform.

Symptoms

Multiple "IO Manager:Link error" in context of other errored components due to IOU#0

This document is intended to show a context that covers multiple errors on multiple parts which require to consider IOU#0 as the bad component.

It shows two timeframes whereas the error pattern in the latter timeframe is already known to be caused by IOU#0 which is discussed in the following document:
Doc ID 1296435.1: Sun SPARC[TM] Enterprise M4000/M5000 - MBU_B and MEMB being faulted with SCF-8004-8X, SCF-8000-1D, and SCF-8005-MJ errors.

 

On a M5000 System the following errors are recorded in 'showlogs monitor':

  [...]
  Jan 31 02:14:12 <hostname> Warning: /IOU#0/PCI#1:IO Manager:Link error
  Jan 31 02:14:15 <hostname> Warning: /IOU#0/PCI#3:IO Manager:Link error
  Jan 31 02:15:16 <hostname> Alarm: /MBU_B/MEMB#4,/MBU_B:ANALYZE:MAC-SC interface fatal error
  [...]

 
Notice: The "IO Manager:Link error" errors are related to both Fibre DownLink Cards assembled in IOU#0. The one and only other assembled Card in IOU#0 is a PCIe network card SUNW,qlc.

XSCF's FMA and its 'fmdump -V' gives the following (just an excerpt):

  Jan 31 02:14:11.0116 95f0befa-8f62-4a6f-b45e-a4e3cccc0289 IOXSCF-8000-1A
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.iox.device.fail
                certainty = 0x64
                        scf-resource = hc:///chassis=0/iou=0/pcislot=1/link =0/xmtr=0
                        scf-resource = hc:///chassis=0/iou=0/pcislot=1/link =0/xmtr=0
                location = IOU#0-PCI#1
        (end fault-list[0])
        fault-status = 0x1

  Jan 31 02:14:14.8418 a6a40219-a951-4f47-afc0-5e3804ad0d75 IOXSCF-8000-1A
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.iox.device.fail
                certainty = 0x64
                        scf-resource = hc:///chassis=0/iou=0/pcislot=3/link =0/xmtr=0
                location = IOU#0-PCI#3
        (end fault-list[0])
        fault-status = 0x1

  Jan 31 02:15:11.2313 0bada0f7-4dc6-4069-8695-6468be76a040 SCF-8005-5X
        fault-list-sz = 0x2
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.SPARC-Enterprise.if.fe -mac-sc
                certainty = 0x21
                        scf-resource = hc:///chassis=0/cmu=1
                detected-by = ANALYZE
                location = /MBU_B
        (end fault-list[0])
        (start fault-list[1])
                class = fault.chassis.SPARC-Enterprise.if.fe -mac-sc
                certainty = 0x42
                        scf-resource = hc:///chassis=0/cmu=1/mac=0
                detected-by = ANALYZE
                location = /MBU_B/MEMB#4
        (end fault-list[1])
        fault-status = 0x1 0x1

 
Subsequent Degraded/Faulted components as per 'showstatus' are:

  *   MBU_B Status:Degraded;
  *       MEMB#4 Status:Faulted;
      IOU#0 Status:Normal;
  *       PCI#1 Status:Faulted;
  *       PCI#3 Status:Faulted;

 
The error pattern in the second timeframe clearly indicates that there is a problematical IOU#0. The 'showlogs monitor' output has the following errors:

  [...]
  Jan 31 20:11:43 <hostname> Warning: /UNSPECIFIED:SCF:spurious unit interrupt
  Jan 31 20:11:50 <hostname> Warning: /UNSPECIFIED:SCF:spurious unit interrupt
  Jan 31 20:11:56 <hostname> Warning: /IOU#0/PCI#1:IO Manager:Link error
  Jan 31 20:12:01 <hostname> Warning: /UNSPECIFIED:SCF:spurious unit interrupt
  Jan 31 20:12:08 <hostname> Warning: /IOU#0/PCI#3:IO Manager:Link error
  Jan 31 20:13:28 <hostname> Alarm: /MBU_B/MEMB#5:ANALYZE:MAC detected clock fatal failure
  Jan 31 20:13:32 <hostname> monitor_msg: SCF:DomainID 0 state change (initialize phase started, detail#10)
  Jan 31 20:13:33 <hostname> monitor_msg: SCF:DomainID 1 state change (initialize phase started, detail#10)
  Jan 31 20:13:35 <hostname> monitor_msg: SCF:DomainID 3 state change (initialize phase started, detail#10)
  Jan 31 20:13:53 <hostname> monitor_msg: SCF:DomainID 3 is deconfigured (no available XSB)
  Jan 31 20:14:01 <hostname> Warning: /MBU_B:SCF:SC test error
  Jan 31 20:14:06 <hostname> Warning: /MBU_B:SCF:SC test error
  Jan 31 20:14:07 <hostname> monitor_msg: SCF:System stopped (no available XSB)
  [...]

 
XSCF's FMA and its 'fmdump -V' give the following (just an excerpt):

  Jan 31 20:11:41.5983 86c6543b-3552-4c9c-8896-3e824e7b4f9f SCF-8004-8X
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.SPARC-Enterprise.asic. cpu.power.fail
                certainty = 0x64
                detected-by = SCF
                location = CHASSIS
        (end fault-list[0])
        fault-status = 0x0
  Jan 31 20:11:48.4577 8be58b80-9acc-4b51-af92-37366e6f4e5a SCF-8004-8X
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.SPARC-Enterprise.asic. cpu.power.fail
                certainty = 0x64
                detected-by = SCF
                location = CHASSIS
        (end fault-list[0])
        fault-status = 0x0
  Jan 31 20:11:51.7522 faa758e8-0759-47c4-a831-4e71815c61da IOXSCF-8000-1A
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.iox.device.fail
                certainty = 0x64
                        scf-resource = hc:///chassis=0/iou=0/pcislot=1/link =0/xmtr=0
                location = IOU#0-PCI#1
        (end fault-list[0])
        fault-status = 0x1
  Jan 31 20:11:57.9733 739f130c-82a7-47cd-bd5c-f817ecb12cff SCF-8004-8X
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.SPARC-Enterprise.asic. cpu.power.fail
                certainty = 0x64
                detected-by = SCF
                location = CHASSIS
        (end fault-list[0])
        fault-status = 0x0
  Jan 31 20:12:05.3927 72ba0b96-e1ce-4dbb-b504-3df4066d00b8 IOXSCF-8000-1A
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.iox.device.fail
                certainty = 0x64
                        scf-resource = hc:///chassis=0/iou=0/pcislot=3/link =0/xmtr=0
                location = IOU#0-PCI#3
        (end fault-list[0])
        fault-status = 0x1
  Jan 31 20:13:24.7270 9cef99db-92a3-4dde-9460-0ab5d3d8635c SCF-8000-1D
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.SPARC-Enterprise.if.fe -asic-clk
                certainty = 0x64
                        scf-resource = hc:///chassis=0/cmu=1/mac=1
                detected-by = ANALYZE
                location = /MBU_B/MEMB#5
        (end fault-list[0])
        fault-status = 0x1
  Jan 31 20:13:57.1795 3ecdc2a5-bea0-4d28-a6b4-ce2d764ef539 SCF-8005-MJ
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.SPARC-Enterprise.asic. sc.test
                certainty = 0x64
                        scf-resource = hc:///chassis=0/cmu=0/sc=0
                detected-by = SCF
                location = /MBU_B
        (end fault-list[0])
        fault-status = 0x1
  Jan 31 20:13:59.4440 fc42731d-c11f-4526-b10a-d4bbf86d32fa SCF-8005-MJ
        fault-list-sz = 0x1
        fault-list = (array of embedded nvlists)
        (start fault-list[0])
                class = fault.chassis.SPARC-Enterprise.asic. sc.test
                certainty = 0x64
                        scf-resource = hc:///chassis=0/cmu=0/sc=1
                detected-by = SCF
                location = /MBU_B
        (end fault-list[0])
        fault-status = 0x1

 
Notice: In this example MEMB#5 is the highest assembled MEMB. The FMA messages outlined above and the info of 'showstatus' is with XCP version 1091. XCP 1111 now includes IOU#0 as a suspect component for the error pattern in the second
timeframe.

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.