Packet loss (missed_packets, etc.) on 10Gb or faster interfaces on x86 Systems due to CPU Power Management
Last updated on MARCH 20, 2018
Applies to:Solaris Operating System - Version 11.1 and later
Information in this document applies to any platform.
Data was consistently being reported missing by applications, which were receiving multicast flows (UDP datagrams) over a 10Gb ixgbe interface. The systems involved were a X4-4 sending, and an X4-8 receiving.
The workload was not particularly high throughput - jumbo frames were in use, and all datagrams were less than the MTU - no IP reassembly was required. Overall throughput generally peaked at no more than 2Gbit/sec and was usually much less. This level of traffic should not have been anything close to a challenge for this platform, network interface or Solaris (220.127.116.11.0 in this case).
The missed_packets kstat counter on the receiving ixgbe interface was found to be steadily incrementing. This indicates NIC hardware FIFO overruns; the NIC was unable to offload data via DMA into driver memory at the rate it was arriving.
This type of packet loss is normally attributed to "PCIe back-pressure". The X4-8 receiving system's PCIe 3.0 subsystem really should not have been a bottleneck here. Although the ixgbe NIC (a standard dual 10Gb Niantic X1109A-Z) is a PCIe v2.0 part, the card and the PCIe I/O subsystem is capable of handling much higher data rates.
Various ixgbe driver tuning (more/larger RX rings, rss_udp_enable=1 to distribute the UDP workload among all RX rings, Ethernet flow control) was implemented. A different card in a different slot was also tried - even though the existing card was in a good spot. All tuning / card placement changes had no effect.
Sign In with your My Oracle Support account
Don't have a My Oracle Support account? Click to get started
My Oracle Support provides customers with access to over a
Million Knowledge Articles and hundreds of Community platforms