My Oracle Support Banner

Exadata: Exachk: InfiniBand Network Error Counters Are Non-zero FAILS ON IMAGE >= 11.2.3.3.0 (Doc ID 1633690.1)

Last updated on MARCH 01, 2019

Applies to:

Oracle Exadata Storage Server Software - Version 11.2.3.3.0 and later
Information in this document applies to any platform.

Symptoms

Exachk sometimes fails this check on Linux images greater than or equal to 11.2.3.3.0.


FAIL => InfiniBand network error counters are non-zero

High PortXmitWait IB Counter is Observed :

 # Collection Module: IBCardInfoExaWatcher



=== mlx4_0 port2 counters ===
[INFO: 2014-02-18-11:05:56] Port counters:  Lid 6 port 2
[INFO: 2014-02-18-11:05:56] PortSelect: 2
[INFO: 2014-02-18-11:05:56] CounterSelect: 0x1400
[INFO: 2014-02-18-11:05:56] SymbolErrorCounter: 0
[INFO: 2014-02-18-11:05:56] LinkErrorRecoveryCounter: 0
[INFO: 2014-02-18-11:05:56] LinkDownedCounter: 0

[INFO: 2014-02-18-11:05:56] PortRcvErrors: 0
[INFO: 2014-02-18-11:05:56] PortRcvRemotePhysicalErrors: 0
[INFO: 2014-02-18-11:05:56] PortRcvSwitchRelayErrors: 0
[INFO: 2014-02-18-11:05:56] PortXmitDiscards: 0
[INFO: 2014-02-18-11:05:56] PortXmitConstraintErrors: 0
[INFO: 2014-02-18-11:05:56] PortRcvConstraintErrors: 0
[INFO: 2014-02-18-11:05:56] CounterSelect2: 0x00
[INFO: 2014-02-18-11:05:56] LocalLinkIntegrityErrors: 0
[INFO: 2014-02-18-11:05:56] ExcessiveBufferOverrunErrors: 0
[INFO: 2014-02-18-11:05:56] VL15Dropped: 0
[INFO: 2014-02-18-11:05:56] PortXmitData: 4294967295
[INFO: 2014-02-18-11:05:56] PortRcvData: 4294967295
[INFO: 2014-02-18-11:05:56] PortXmitPkts: 498272891
[INFO: 2014-02-18-11:05:56] PortRcvPkts: 633787322
[INFO: 2014-02-18-11:05:56] PortXmitWait: 8225394   <-------

 

After running “ibclearcounters” we still see the counter exists and is reported with the below query.

root@XXXXdb01:/# ibclearcounters

## Summary: 7 nodes cleared 0 errors

root@XXXXdb01:/# ibqueryerrors.pl -rR -s LinkDowned,RcvSwRelayErrors,XmtDiscards,XmtWait
Suppressing:
Errors for 0x2128e8aee4a0a0 "SUN DCS 36P QDR xxxx-ibb0 55.39.110.216"
GUID 0x2128e8aee4a0a0 port ALL: [PortXmitWait == 20216]
GUID 0x2128e8aee4a0a0 port 0: [PortXmitWait == 6532]
Link info: 1 0[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> [ ] "" ( )
GUID 0x2128e8aee4a0a0 port 1: [PortXmitWait == 12923]
Link info: 1 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x0021280001efecd0 10 2[ ] "xxxxcelxxx02 C 192.168.10.4 HCA-1" ( )
GUID 0x2128e8aee4a0a0 port 2: [PortXmitWait == 10429]
Link info: 1 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x0021280001cf2ee6 8 2[ ] "xxxxcelxxx01 C 192.168.10.3 HCA-1" ( )
GUID 0x2128e8aee4a0a0 port 4: [PortXmitWait == 1690]
...
...

 

Changes

 

Cause

To view full details, sign in with your My Oracle Support account.

Don't have a My Oracle Support account? Click to get started!


In this Document
Symptoms
Changes
Cause
Solution
References


My Oracle Support provides customers with access to over a million knowledge articles and a vibrant support community of peers and Oracle experts.