Oracle ZFS Storage Appliance : TCP flow control and congestion control
(Doc ID 2252566.1)
Last updated on FEBRUARY 14, 2019
Applies to:Oracle ZFS Storage ZS3-2 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS3-4 - Version All Versions to All Versions [Release All Releases]
Oracle ZFS Storage ZS4-4 - Version All Versions to All Versions [Release All Releases]
7000 Appliance OS (Fishworks)
The majority of Oracle's ZFS Appliance customers using NFS have a large number of clients accessing the appliance over the network.
In some cases one or two clients with single threaded applications must read huge amounts of sequential data over a high speed link.
One example of this would be video data where the appliance is being utilized in the Media and Entertainment Industry.
In some circumstances the read performance may not meet the customer's expectations. Some actions can be taken to increase the performance.
Versions of appliance software below 2013.06.05.5.2,1-1.2 have a congestion window set to 5. The congestion window is the number of packets that can be on the wire to the client.
The congestion window starts off at an infinite number, but as soon as a packet is dropped, it reduces to the values specified by the TCP congestion window constant, thus reducing the number of packets that can be sent at a time to 5.
In ak version 2013.06.05.5.2,1-1.2, the constant was increased from 5 to 16 decimal. Upgrading to the newer release can increase throughput on a network where there is some packet loss.
Note that the congestion window does not apply to clients that are on the same subnet as the ZFSSA. In that case, the value of in-flight packets is maintained at 64K.
If the client is Linux based, then there are a couple of network tuning parameters that are set by default on the client.
These parameters when interacting with the appliance default congestion algorithm of newreno, can adversely impact performance. These parameters are Large Receive Offload (LRO) and Generic Receive Offload (GRO)
LRO and GRO aggregate packets on the receiver (client) which reduces CPU overhead. But this also reduces the number of ACK's back to the ZFSSA.
The newreno TCP congestion algorithm is dependent on acknowledgements from the receiver to send more data. A router in between the 2 systems only exacerbates the poor performance
To determine whether LRO or GRO are enabled run the following on the linux client :
ethtool -k <interface name> | grep receive-offload
and check to see if it returns anything.
To disable both LRO and GRO run the following on the linux client:
ethtool -K <interface name> gro off lro off
At this time, the ZFSSA only supports the 'newreno' congestion algorithm, but 3 other algorithms are built into the base Solaris OS.
They are cubic, highspeed and vegas. An enhancement request to research adoption of other algorithms has been created.
That ER is "Bug 25772204 - ZFSSA could do a better job in adapting to TCP congestion control algorithms."
Devices implementing the TCP protocol use flow control to establish how much data can be sent to them.
The receiving device sends its receive window to the sending device. The sending device insures that it doesn't exceed that amount.
On a ZFS appliance, the receive window for a device that is receiving (reading) data from a ZFS appliance can be viewed with the "netstat -a" or "netstat -an" commands.
The value of "Swind" is the size of the receive window on the device receiving the data in Kbytes.
The ZFS appliance can only send the number of bytes in the Swind field (in this case 48 KB) before it must wait for an acknowledgement from the receiving device.
If it is noticed that the value for Swind is shrinking, it could be indicative of a problem on the receiving device.
To view full details, sign in with your My Oracle Support account.
Don't have a My Oracle Support account? Click to get started!
In this Document