BBR TCP (Bottleneck Bandwidth and RTT)

BBR is a new algorithm for TCP Congestion Control. It was tested in Google's data center networks as well as on some of their public-facing Web servers including Google.com and YouTube. It strives to optimize both throughput and latency/RTT by estimating the bottleneck bandwidth and RTT to compute a pacing rate. One goal—one that sets it apart from most traditional TCP variants—is to avoid filling up the bottleneck buffer, which might induce Bufferbloat.

Operation

A description of the algorithm was published in the September/October 2016 issue of ACM Queue. An implementation in the Linux kernel has been proposed as a patch. Dave Täht posted a preliminary evaluation ("a quick look...") on his blog. An good description of the BBR's motivation and approach is included in the proposed kernel patch (see below).

In the STARTUP phase, BBR tries to quickly approximate the bottleneck bandwidth. It does so by increasing the sending rate until the estimated bottleneck bandwidth stops growing.

Bottleneck bandwidth is estimated from the amount of data ACKed over a given period, filtered through a "windowed max-filter".

In the DRAIN phase, the sending (pacing) rate is reduced to get rid of the queue that BBR estimates to have created while probing the bottleneck bandwidth during STARTUP.

In steady state, BBR will pace to the estimated bottleneck bandwidth. Periodically it tries to improve its network model by doing probing:

PROBE_BW mode: BBR probes for a bandwidth increase at the bottleneck by increasing the pacing rate, then decreasing the rate to remove temporary queuing in case the bottleneck bandwidth hasn't grown.

PROBE_RTT mode: RTT is filtered through a windowed min-filter. Sometimes the algorithm will reduce the pacing rate to better approximate the base RTT in case queueing ("bufferbloat") is in effect.

State Diagram

A state diagram has been included in the Linux kernel source as a patch. I quote:

Here is a state transition diagram for BBR:

            |
            V
   +---> STARTUP  ----+
   |        |         |
   |        V         |
   |      DRAIN   ----+
   |        |         |
   |        V         |
   +---> PROBE_BW ----+
   |      ^    |      |
   |      |    |      |
   |      +----+      |
   |                  |
   +---- PROBE_RTT <--+

A BBR flow starts in STARTUP, and ramps up its sending rate quickly. When it estimates the pipe is full, it enters DRAIN to drain the queue. In steady state a BBR flow only uses PROBE_BW and PROBE_RTT. A long-lived BBR flow spends the vast majority of its time remaining (repeatedly) in PROBE_BW, fully probing and utilizing the pipe's bandwidth in a fair manner, with a small, bounded queue. If a flow has been continuously sending for the entire min_rtt window, and hasn't seen an RTT sample that matches or decreases its min_rtt estimate for 10 seconds, then it briefly enters PROBE_RTT to cut inflight to a minimum value to re-probe the path's two-way propagation delay (min_rtt). When exiting PROBE_RTT, if we estimated that we reached the full bw of the pipe then we enter PROBE_BW; otherwise we enter STARTUP to try to fill the pipe.

Implementation

Linux kernel implementation

An implementation for the Linux kernel was submitted in September 2016 and merged some time later. It has been included in official Linux kernel releases since 4.9.

The current kernel implementation relies on a scheduler that is capable of pacing, such as the fq scheduler. In May 2017, Éric Dumazet submitted a patch to implement pacing in TCP itself. This will remove the dependency on the fq scheduler. Two effects are that BBR will be simpler to enable, and that BBR can be used together with other schedulers (such as the popular fq_codel).

TCP_CC_INFO instrumentation

BBR exports its bandwidth and RTT estimates using the getsockopt(TCP_CC_INFO) interface, see struct tcp_bbr_info. User-space applications can use this information as hints for adapting their use of a given connection, e.g. by selecting appropriate audio or video encodings.

History/Related Work

Note that there is another variant of "BBR TCP", namely Bufferbloat Resistant TCP proposed by Michael Nowlan in his 2014 Ph.D. thesis. It shares the basic goals of (Google's) BBR TCP, namely to detect bottleneck rate and to avoid bufferbloat. But it uses a delay-based measurement approach rather than direct measurement of the amount of ACKed data per time, and doesn't seem to make use of pacing.

References

-- Simon Leinen - 2016-09-18–2017-07-04

Edit | Attach | Watch | Print version | History: r17 < r16 < r15 < r14 < r13 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r17 - 2017-07-04 - SimonLeinen
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2004-2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.