Exadata: Exachk: InfiniBand Network Error Counters Are Non-zero FAILS ON IMAGE >= 11.2.3.3.0
(Doc ID 1633690.1)
Last updated on MARCH 01, 2019
Applies to:
Oracle Exadata Storage Server Software - Version 11.2.3.3.0 and laterInformation in this document applies to any platform.
Symptoms
Exachk sometimes fails this check on Linux images greater than or equal to 11.2.3.3.0.
FAIL => InfiniBand network error counters are non-zero
High PortXmitWait IB Counter is Observed :
# Collection Module: IBCardInfoExaWatcher
=== mlx4_0 port2 counters ===
[INFO: 2014-02-18-11:05:56] Port counters: Lid 6 port 2
[INFO: 2014-02-18-11:05:56] PortSelect: 2
[INFO: 2014-02-18-11:05:56] CounterSelect: 0x1400
[INFO: 2014-02-18-11:05:56] SymbolErrorCounter: 0
[INFO: 2014-02-18-11:05:56] LinkErrorRecoveryCounter: 0
[INFO: 2014-02-18-11:05:56] LinkDownedCounter: 0
[INFO: 2014-02-18-11:05:56] PortRcvErrors: 0
[INFO: 2014-02-18-11:05:56] PortRcvRemotePhysicalErrors: 0
[INFO: 2014-02-18-11:05:56] PortRcvSwitchRelayErrors: 0
[INFO: 2014-02-18-11:05:56] PortXmitDiscards: 0
[INFO: 2014-02-18-11:05:56] PortXmitConstraintErrors: 0
[INFO: 2014-02-18-11:05:56] PortRcvConstraintErrors: 0
[INFO: 2014-02-18-11:05:56] CounterSelect2: 0x00
[INFO: 2014-02-18-11:05:56] LocalLinkIntegrityErrors: 0
[INFO: 2014-02-18-11:05:56] ExcessiveBufferOverrunErrors: 0
[INFO: 2014-02-18-11:05:56] VL15Dropped: 0
[INFO: 2014-02-18-11:05:56] PortXmitData: 4294967295
[INFO: 2014-02-18-11:05:56] PortRcvData: 4294967295
[INFO: 2014-02-18-11:05:56] PortXmitPkts: 498272891
[INFO: 2014-02-18-11:05:56] PortRcvPkts: 633787322
[INFO: 2014-02-18-11:05:56] PortXmitWait: 8225394 <-------
After running “ibclearcounters” we still see the counter exists and is reported with the below query.
root@XXXXdb01:/# ibclearcounters
## Summary: 7 nodes cleared 0 errors
root@XXXXdb01:/# ibqueryerrors.pl -rR -s LinkDowned,RcvSwRelayErrors,XmtDiscards,XmtWait
Suppressing:
Errors for 0x2128e8aee4a0a0 "SUN DCS 36P QDR xxxx-ibb0 55.39.110.216"
GUID 0x2128e8aee4a0a0 port ALL: [PortXmitWait == 20216]
GUID 0x2128e8aee4a0a0 port 0: [PortXmitWait == 6532]
Link info: 1 0[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> [ ] "" ( )
GUID 0x2128e8aee4a0a0 port 1: [PortXmitWait == 12923]
Link info: 1 1[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x0021280001efecd0 10 2[ ] "xxxxcelxxx02 C 192.168.10.4 HCA-1" ( )
GUID 0x2128e8aee4a0a0 port 2: [PortXmitWait == 10429]
Link info: 1 2[ ] ==( 4X 10.0 Gbps Active/ LinkUp)==> 0x0021280001cf2ee6 8 2[ ] "xxxxcelxxx01 C 192.168.10.3 HCA-1" ( )
GUID 0x2128e8aee4a0a0 port 4: [PortXmitWait == 1690]
...
...
Changes
Cause
To view full details, sign in with your My Oracle Support account. |
|
Don't have a My Oracle Support account? Click to get started! |
In this Document
Symptoms |
Changes |
Cause |
Solution |
References |