当前位置：首页 > news >正文

BBR ProbeRTT 和 ProbeBW 相互作用

news 2025/7/6 13:13:13

BBR 依靠 ProbeBW 和 ProbeRTT 分别测量两个正交量 maxbw 和 minrtt，正交的意思是它们两个无法同时测准，于是就不得不让它们相互作用，在 queuing 时测 maxbw，在闲置带宽中测 minrtt。ProbeBW 的过程影响 minrtt 的稳定性，而 ProbeRTT 除了影响带宽利用率(虽不那么很，如 bbr_update_min_rtt 函数注释所说 “BBR uses 200ms to approximately bound the performance penalty of PROBE_RTT’s cwnd capping to roughly 2% (200ms/10s)”)，还影响公平性。

近期在 BBR 讨论组里看到一个 issue 挺有趣，意思是 ProbeRTT 的时间相位差对公平性的影响。但仔细深究起来，ProbeRTT 对整个 ProbeBW 的影响还不止这个，归纳如下：

随机 phase 影响，A，B 流共存，若 A 从 ProbeRTT 退到 probe phase，B 退到 cruise phase，则 A 将夺走带宽，侧重算法问题；
相位差影响，A，B 流共存，若 A 流先于 B 退出 ProbeRTT，它将夺走更多带宽，因此 BBR 的 ProbeRTT 同步很重要，侧重实现问题。

第一个很容易理解，最后再说，看第二个，“先退出” 受测量的影响，BBR 依赖 RTT 测量，而测量则依赖实现，要是测量实现出了问题，BBR 的公平性和稳定性则无法保障，这就是为什么 BBR 不能自证稳定的原因之一，而 AIMD 则没有该问题。

给出模拟代码，模拟退出相位差为 3 的情况(即 y 在 x 后 3 个时间单位退出 ProbeRTT)：

for n in range(1, len(times)):if n % rttx == 0 and stx == 0 and pbwx == 0:x[n] = C*g1*x[n-1]*rttx/(g1*x[n-1]*rttx + wy[n-1])wx[n] = x[n]*rttxif prex != 0:print("Normal X", n, x[n-1], x[n], y[n-1], wy[n-1])prex = 0elif stx == 1:x[n] = x[n-1]wx[n] = 4if bakx == 0:bakx = x[n-1]if n - cntx == CNT:stx = 0pbwx = 0x[n] = bakxbakx = 0elif n % rttx == 0 and stx == 0 and pbwx != 0:x[n] = x[n-1]wx[n] = x[n]*rttxpbwx -= 1prex = 1else:x[n] = x[n-1]wx[n] = wx[n-1]
#############################if n % rtty == 0 and sty == 0 and pbwy == 0:...if n - cntx == CNT + 3:...#######################################################################if n % pRTT == 0:cntx = nstx = 1if n % pRTT == 0:cnty = nsty = 1r1[n] = wx[n]/x[n]r2[n] = wy[n]/y[n]if r1[n] < rttx:r1[n] = rttxif r2[n] < rtty:r2[n] = rtty

看结果：
在这里插入图片描述

确实不公平。那怎么办？

BBR2 给出的答案是进入 ProbeRTT 后获得 1/2 的 BDP 而不是固定值 4，这样即使有流先退出 ProbeRTT，它获得的空闲带宽也有限(因为其它流在 ProbeRTT 也没让出太多)，或者在 ProbeRTT 后的第一次 ProbeBW 中仅允许一个 minrtt 的持续时间。

但还有基于 BBR 本身的解法，即退出 ProbeRTT 后不允许直接进入 ProbeBW 1.25X 阶段，而至少要经过一个 round 的 refill 阶段，这个时间是流量的 RTT 数量级，非数据中心以 ms 计，数据中心以 50us 计，足以抹掉任何系统实现导致的 us 级调度误差。

加入这类效果的代码仅需改两行，如下：

for n in range(1, len(times)):....if n - cntx == CNT:stx = 0pbwx = 2x[n] = bakxbakx = 0....if n - cnty == CNT + 3:sty = 0pbwy = 0y[n] = bakybaky = 0

同样参数，这下就公平了：
在这里插入图片描述

回到文初列举的 ProbeRTT 对 ProbeBW 造成的两类干扰中的第一类，退出 ProbeRTT 时随机化 phase，先看模拟代码：

for n in range(1, len(times)):....if n - cntx == CNT:stx = 0pbwx = random.choice([0, 2])x[n] = bakxbakx = 0....if n - cnty == CNT + 3:sty = 0pbwy = random.choice([0, 2])y[n] = bakybaky = 0

再看实际结果：
在这里插入图片描述

来看一下随机化 phase 的理由：

Furthermore, to improve mixing and fairness, and to reduce queues when multiple BBR flows share a bottleneck, BBR randomizes the phases of ProbeBW gain cycling by randomly picking an initial phase—from among all but the 3/4 phase—when entering ProbeBW. Why not start cycling with 3/4? The main advantage of the 3/4 pacing_gain is to drain any queue that can be created by running a 5/4 pacing_gain when the pipe is already full. When exiting Drain or ProbeRTT and entering ProbeBW, there is no queue to drain, so the 3/4 gain does not provide that advantage. Using 3/4 in those contexts only has a cost: a link utilization for that round of 3/4 instead of 1. Since starting with 3/4 would have a cost but no benefit, and since entering ProbeBW happens at the start of any connection long enough to have a Drain, BBR uses this small optimization.

在我看来，先来一个 refill round 再随机化 phase，对于公平性而言或许更加高尚。

如本文所述，BBR 自身依赖实现的良好，但仍有很多问题未能论证，比如同步 ProbeBW 对 buffer 的占用率以及叠加丢包的影响，以及 ProbeRTT 行为的确定性等。

浙江温州皮鞋湿，下雨进水不会胖。

查看全文

http://www.mrgr.cn/news/49884.html