Large TCP Windows
In order to achieve high data rates with
TCP over "
long fat networks", i.e. network paths with a large
bandwidth-delay product, TCP sinks (that is, hosts receiving data transported by TCP) must advertise a large TCP receive window (referred to as just 'the window', since there is not an equivalent advertised 'send window').
The window is a 16 bit value (bytes 15 and 16 in the TCP header) and so, in TCP as originally specified, it is limited to a value of 65535 (64K). The receive window sets an upper limit on the sustained throughput achieveable over a TCP connection since it represents the maximum amount of unacknowledged data (in bytes) there can be on the TCP path. Mathematically, achieveable throughput can never be more than WINDOW_SIZE/RTT, so for a trans-Atlantic link, with say an RTT (Round trip Time) of 150ms, the throughput is limited to a maximum of 3.4Mbps. With the emergence of "
long fat networks", the limit of 64K bytes (on some systems even just
32K bytes!) was clearly insufficient and so
RFC 1323 laid down (amongst other things) a way of
scaling the advertised window, such that the 16-bit window value can represent numbers larger than 64K.
The
TCP window scaling option increases the maximum window size fom 64KB to 1Gbyte, by shifting the window field left by up to 14. The window scale option is used only during the TCP 3-way handshake (both sides
must set the window scale option in their SYN segments if scaling is to be used in either direction).
It is important to use TCP timestamps option with large TCP windows. With the TCP timestamps option, each segment contains a timestamp. The receiver returns that timestamp in each ACK and this allows the sender to estimate the RTT. On the other hand with the TCP timestamps option the problem of wrapped sequence number could be solved (PAWS - Protection Against Wrapped Sequences) which could occur with large windows.
Most operating systems require manual tuning to use large TCP windows. The
OS-specific tuning section contains information on how to to this for a set of operating systems.
There are several potential issues when TCP Windows are larger than necessary:
- When there are many active TCP connection endpoints (sockets) on a system - such as a popular Web or file server - then a large TCP window size will lead to high consumption of system (kernel) memory. This can have a number of negative consequences: The system may run out of buffer space so that no new connections can be opened, or the high occupation of kernel memory (which typically must reside in actual RAM and cannot be "paged out" to disk) can "starve" other processes of access to fast memory (cache and RAM)
- Large windows can cause large "bursts" of consecutive segments/packets. When there is a bottleneck in the path - because of a slower link or because of cross-traffic - these bursts will fill up buffers in the network device (router or switch) in front of that bottleneck. The larger these bursts, the higher are the risks that this buffer overflows and causes multiple segments to be dropped. So a large window can lead to "sawtooth" behavior and worse link utilisation than with a just-big-enough window where TCP could operate at a steady rate.
Several methods for
automatic TCP buffer tuning have been developed to resolve these issues, some of which have been implemented in (at least) recent Linux versions.
References
- TCP Extensions for High Performance, RFC 1323, V. Jacobson, R. Braden, D. Borman, May 1992, ftp://ftp.rfc-editor.org/in-notes/rfc1323.txt
- Achieving Reliable High Performance in LFNs, S. Ubik and P. Cimbal, TNC 2003, May 2003, http://staff.cesnet.cz/~ubik/publications/2003/terena2003.pdf
- Debugging end-to-end performance in commodity operating systems, S. Ubik and P. Cimbal, PFLDnet 2003, February 2003, http://staff.cesnet.cz/~ubik/publications/2003/pfldnet2003.pdf
--
SimonLeinen - 27 Oct 2004 - 10 Jun 2005