Skip to content

Commit aecfde2

Browse files
koen0607davem330
authored andcommitted
tcp: Ensure DCTCP reacts to losses
RFC8257 §3.5 explicitly states that "A DCTCP sender MUST react to loss episodes in the same way as conventional TCP". Currently, Linux DCTCP performs no cwnd reduction when losses are encountered. Optionally, the dctcp_clamp_alpha_on_loss resets alpha to its maximal value if a RTO happens. This behavior is sub-optimal for at least two reasons: i) it ignores losses triggering fast retransmissions; and ii) it causes unnecessary large cwnd reduction in the future if the loss was isolated as it resets the historical term of DCTCP's alpha EWMA to its maximal value (i.e., denoting a total congestion). The second reason has an especially noticeable effect when using DCTCP in high BDP environments, where alpha normally stays at low values. This patch replace the clamping of alpha by setting ssthresh to half of cwnd for both fast retransmissions and RTOs, at most once per RTT. Consequently, the dctcp_clamp_alpha_on_loss module parameter has been removed. The table below shows experimental results where we measured the drop probability of a PIE AQM (not applying ECN marks) at a bottleneck in the presence of a single TCP flow with either the alpha-clamping option enabled or the cwnd halving proposed by this patch. Results using reno or cubic are given for comparison. | Link | RTT | Drop TCP CC | speed | base+AQM | probability ==================|=========|==========|============ CUBIC | 40Mbps | 7+20ms | 0.21% RENO | | | 0.19% DCTCP-CLAMP-ALPHA | | | 25.80% DCTCP-HALVE-CWND | | | 0.22% ------------------|---------|----------|------------ CUBIC | 100Mbps | 7+20ms | 0.03% RENO | | | 0.02% DCTCP-CLAMP-ALPHA | | | 23.30% DCTCP-HALVE-CWND | | | 0.04% ------------------|---------|----------|------------ CUBIC | 800Mbps | 1+1ms | 0.04% RENO | | | 0.05% DCTCP-CLAMP-ALPHA | | | 18.70% DCTCP-HALVE-CWND | | | 0.06% We see that, without halving its cwnd for all source of losses, DCTCP drives the AQM to large drop probabilities in order to keep the queue length under control (i.e., it repeatedly faces RTOs). Instead, if DCTCP reacts to all source of losses, it can then be controlled by the AQM using similar drop levels than cubic or reno. Signed-off-by: Koen De Schepper <koen.de_schepper@nokia-bell-labs.com> Signed-off-by: Olivier Tilmans <olivier.tilmans@nokia-bell-labs.com> Cc: Bob Briscoe <research@bobbriscoe.net> Cc: Lawrence Brakmo <brakmo@fb.com> Cc: Florian Westphal <fw@strlen.de> Cc: Daniel Borkmann <borkmann@iogearbox.net> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Andrew Shewmaker <agshew@gmail.com> Cc: Glenn Judd <glenn.judd@morganstanley.com> Acked-by: Florian Westphal <fw@strlen.de> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent fae2708 commit aecfde2

File tree

1 file changed

+18
-18
lines changed

1 file changed

+18
-18
lines changed

net/ipv4/tcp_dctcp.c

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -67,11 +67,6 @@ static unsigned int dctcp_alpha_on_init __read_mostly = DCTCP_MAX_ALPHA;
6767
module_param(dctcp_alpha_on_init, uint, 0644);
6868
MODULE_PARM_DESC(dctcp_alpha_on_init, "parameter for initial alpha value");
6969

70-
static unsigned int dctcp_clamp_alpha_on_loss __read_mostly;
71-
module_param(dctcp_clamp_alpha_on_loss, uint, 0644);
72-
MODULE_PARM_DESC(dctcp_clamp_alpha_on_loss,
73-
"parameter for clamping alpha on loss");
74-
7570
static struct tcp_congestion_ops dctcp_reno;
7671

7772
static void dctcp_reset(const struct tcp_sock *tp, struct dctcp *ca)
@@ -164,21 +159,23 @@ static void dctcp_update_alpha(struct sock *sk, u32 flags)
164159
}
165160
}
166161

167-
static void dctcp_state(struct sock *sk, u8 new_state)
162+
static void dctcp_react_to_loss(struct sock *sk)
168163
{
169-
if (dctcp_clamp_alpha_on_loss && new_state == TCP_CA_Loss) {
170-
struct dctcp *ca = inet_csk_ca(sk);
164+
struct dctcp *ca = inet_csk_ca(sk);
165+
struct tcp_sock *tp = tcp_sk(sk);
171166

172-
/* If this extension is enabled, we clamp dctcp_alpha to
173-
* max on packet loss; the motivation is that dctcp_alpha
174-
* is an indicator to the extend of congestion and packet
175-
* loss is an indicator of extreme congestion; setting
176-
* this in practice turned out to be beneficial, and
177-
* effectively assumes total congestion which reduces the
178-
* window by half.
179-
*/
180-
ca->dctcp_alpha = DCTCP_MAX_ALPHA;
181-
}
167+
ca->loss_cwnd = tp->snd_cwnd;
168+
tp->snd_ssthresh = max(tp->snd_cwnd >> 1U, 2U);
169+
}
170+
171+
static void dctcp_state(struct sock *sk, u8 new_state)
172+
{
173+
if (new_state == TCP_CA_Recovery &&
174+
new_state != inet_csk(sk)->icsk_ca_state)
175+
dctcp_react_to_loss(sk);
176+
/* We handle RTO in dctcp_cwnd_event to ensure that we perform only
177+
* one loss-adjustment per RTT.
178+
*/
182179
}
183180

184181
static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
@@ -190,6 +187,9 @@ static void dctcp_cwnd_event(struct sock *sk, enum tcp_ca_event ev)
190187
case CA_EVENT_ECN_NO_CE:
191188
dctcp_ece_ack_update(sk, ev, &ca->prior_rcv_nxt, &ca->ce_state);
192189
break;
190+
case CA_EVENT_LOSS:
191+
dctcp_react_to_loss(sk);
192+
break;
193193
default:
194194
/* Don't care for the rest. */
195195
break;

0 commit comments

Comments
 (0)