Packet loss (missed_packets, etc.) on 10Gb or faster interfaces on x86 Systems due to CPU Power Management (Doc ID 2142951.1)

Last updated on SEPTEMBER 13, 2016

Applies to:

Solaris Operating System - Version 11.1 and later
Information in this document applies to any platform.

Symptoms

Data was consistently being reported missing by applications, which were receiving multicast flows  (UDP datagrams) over a 10Gb ixgbe interface.   The systems involved were a X4-4 sending, and an X4-8 receiving. 

The workload was not particularly high throughput - jumbo frames were in use, and all datagrams were less than the MTU - no IP reassembly was required.  Overall throughput generally peaked at no more than 2Gbit/sec and was usually much less.   This level of traffic should not have been anything close to a challenge for this platform, network interface or Solaris (11.3.5.6.0 in this case).

The missed_packets kstat counter on the receiving ixgbe interface was found to be steadily incrementing.  This indicates NIC hardware FIFO overruns;  the NIC was unable to offload data via DMA into driver memory at the rate it was arriving.

This type of packet loss is normally attributed to "PCIe back-pressure".  The X4-8 receiving system's PCIe 3.0 subsystem really should not have been a bottleneck here.  Although the ixgbe NIC (a standard dual 10Gb Niantic X1109A-Z) is a PCIe v2.0 part, the card and the PCIe I/O subsystem is capable of handling much higher data rates.

Changes

Various ixgbe driver tuning  (more/larger RX rings, rss_udp_enable=1 to distribute the UDP workload among all RX rings, Ethernet flow control) was implemented.  A different card in a different slot was also tried - even though the existing card was in a good spot.     All tuning / card placement changes had no effect.

Cause

Sign In with your My Oracle Support account

Don't have a My Oracle Support account? Click to get started

My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms