Considerations for Sizing TCP Buffers on Hosts

Until very recently, it was the case that most operating systems shipped with default settings for TCP that were suitable for high-throughput communication over LANs, or for modest-throughput communication over larger networks.

In order to achieve good performance over "Long Fat Networks (LFNs)", i.e. paths with large round-trip times and high capacity, these systems had to be "tuned". In particular, the buffers for TCP had to be resized to permit large TCP windows to be used.

As networks got faster, and in particular, fast networks became available to more and more users, operating systems have adapted so that the "out of the box" performance over LFN paths has improved. The most important improvement was the introduction of automatic TCP buffer tuning, available in systems such as Linux (2.6.17 and later), Windows Vista, and FreeBSD 7.0. These systems adapt the size of TCP buffers as needed, and can thus support large windows by default.

So if you have such a modern system, you often don't have to tune anything to achieve good performance.

In an ideal operating system implementation, TCP buffer auto-tuning would always figure out exactly the right size of all buffers so that TCP's window can be made as large as necessary to support maximum throughput in the face of other limitations such as bottleneck capacity, cross traffic, and end-system (processing, disk-speed etc.) bottlenecks. These perfectly-sized buffers would be managed by the kernel's memory management so that only the required amount of memory would be used, and even though optimal buffer sizes can fluctuate over time, no ill effects from memory fragmentation, moving things around etc., would ever happen.

It is unclear how close currently existing systems are to the ideal that I have just presented, but it is reasonable to assume that all these goals are hard enough to attain. Therefore, even auto-tuning systems have configurable upper limits on the size of TCP buffers. They may also have configurable lower limits to avoid situations of degenerated performance, and initial default values that are used until enough dynamical information has been collected for auto-tuning to take effect.

In systems without buffer auto-tuning, the effect of the default TCP buffer sizes is much more important, because they will be used throughout the lifetime of a socket (connection).

The default TCP buffer size can be overridden (within the system-wide lower/upper) limits by individual applications. In the BSD socket API, this is done using setsockopt() with the SO_RCVBUF and SO_SNDBUF parameters.

Now, what are reasonable defaults/limits for TCP buffer sizes? Unfortunately, there is no recommended set of values that is suitable in all situations. Here is a proposed strategy:

Try to come up with a bandwidth-delay product that you want to support. For example, 500 Mb/s over a 100ms path (equivalent to 10Gb/s over a 5ms path). This is actually a _throughput_*delay product for a single TCP connection, and determines the maximum TCP window that the system must support. In this case, 500 Mb/s times 100ms equals 50 Mbits or 6.25 Megabytes.

If your system performs TCP buffer auto-tuning, then it is sufficient to make sure that the upper limit for the TCP buffer size is large enough. You may make the upper limit larger than strictly necessary, if you trust auto-tuning and kernel buffer management to work well enough that this doesn't cause problems.

If your system uses fixed TCP buffers, i.e. if it doesn't support buffer auto-tuning, you also need to set the upper limit sufficiently high. Unless your applications use setsockopt() to select large buffers, you also need to set the default buffer size to a high value. Compared to the auto-tuning case, setting these two excessively large values can have bad effects, because the buffers will actually be allocated and consume space in kernel memory.

Note that TCP implementations use separate buffers for sending and receiving, and have separate size limits and defaults for those two directions. So if you have different performance targets for sent and received traffic, then you should perform separate calculations to obtain appropriate configuration values.

-- SimonLeinen - 23 Jul 2009

Topic revision: r1 - 2009-07-23 - SimonLeinen
Copyright © 2004-2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.