Skip to content

Commit dca145f

Browse files
Eric Dumazetdavem330
authored andcommitted
tcp: allow for bigger reordering level
While testing upcoming Yaogong patch (converting out of order queue into an RB tree), I hit the max reordering level of linux TCP stack. Reordering level was limited to 127 for no good reason, and some network setups [1] can easily reach this limit and get limited throughput. Allow a new max limit of 300, and add a sysctl to allow admins to even allow bigger (or lower) values if needed. [1] Aggregation of links, per packet load balancing, fabrics not doing deep packet inspections, alternative TCP congestion modules... Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yaogong Wang <wygivan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1 parent 7aef06d commit dca145f

File tree

6 files changed

+23
-12
lines changed

6 files changed

+23
-12
lines changed

Documentation/networking/bonding.txt

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -2230,11 +2230,8 @@ balance-rr: This mode is the only mode that will permit a single
22302230

22312231
It is possible to adjust TCP/IP's congestion limits by
22322232
altering the net.ipv4.tcp_reordering sysctl parameter. The
2233-
usual default value is 3, and the maximum useful value is 127.
2234-
For a four interface balance-rr bond, expect that a single
2235-
TCP/IP stream will utilize no more than approximately 2.3
2236-
interface's worth of throughput, even after adjusting
2237-
tcp_reordering.
2233+
usual default value is 3. But keep in mind TCP stack is able
2234+
to automatically increase this when it detects reorders.
22382235

22392236
Note that the fraction of packets that will be delivered out of
22402237
order is highly variable, and is unlikely to be zero. The level

Documentation/networking/ip-sysctl.txt

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -376,9 +376,17 @@ tcp_orphan_retries - INTEGER
376376
may consume significant resources. Cf. tcp_max_orphans.
377377

378378
tcp_reordering - INTEGER
379-
Maximal reordering of packets in a TCP stream.
379+
Initial reordering level of packets in a TCP stream.
380+
TCP stack can then dynamically adjust flow reordering level
381+
between this initial value and tcp_max_reordering
380382
Default: 3
381383

384+
tcp_max_reordering - INTEGER
385+
Maximal reordering level of packets in a TCP stream.
386+
300 is a fairly conservative value, but you might increase it
387+
if paths are using per packet load balancing (like bonding rr mode)
388+
Default: 300
389+
382390
tcp_retrans_collapse - BOOLEAN
383391
Bug-to-bug compatibility with some broken printers.
384392
On retransmit try to send bigger packets to work around bugs in

include/linux/tcp.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -204,10 +204,10 @@ struct tcp_sock {
204204

205205
u16 urg_data; /* Saved octet of OOB data and control flags */
206206
u8 ecn_flags; /* ECN status bits. */
207-
u8 reordering; /* Packet reordering metric. */
207+
u8 keepalive_probes; /* num of allowed keep alive probes */
208+
u32 reordering; /* Packet reordering metric. */
208209
u32 snd_up; /* Urgent pointer */
209210

210-
u8 keepalive_probes; /* num of allowed keep alive probes */
211211
/*
212212
* Options received (usually on last packet, some only on SYN packets).
213213
*/

include/net/tcp.h

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,9 +70,6 @@ void tcp_time_wait(struct sock *sk, int state, int timeo);
7070
/* After receiving this amount of duplicate ACKs fast retransmit starts. */
7171
#define TCP_FASTRETRANS_THRESH 3
7272

73-
/* Maximal reordering. */
74-
#define TCP_MAX_REORDERING 127
75-
7673
/* Maximal number of ACKs sent quickly to accelerate slow-start. */
7774
#define TCP_MAX_QUICKACKS 16U
7875

@@ -252,6 +249,7 @@ extern int sysctl_tcp_abort_on_overflow;
252249
extern int sysctl_tcp_max_orphans;
253250
extern int sysctl_tcp_fack;
254251
extern int sysctl_tcp_reordering;
252+
extern int sysctl_tcp_max_reordering;
255253
extern int sysctl_tcp_dsack;
256254
extern long sysctl_tcp_mem[3];
257255
extern int sysctl_tcp_wmem[3];

net/ipv4/sysctl_net_ipv4.c

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -495,6 +495,13 @@ static struct ctl_table ipv4_table[] = {
495495
.mode = 0644,
496496
.proc_handler = proc_dointvec
497497
},
498+
{
499+
.procname = "tcp_max_reordering",
500+
.data = &sysctl_tcp_max_reordering,
501+
.maxlen = sizeof(int),
502+
.mode = 0644,
503+
.proc_handler = proc_dointvec
504+
},
498505
{
499506
.procname = "tcp_dsack",
500507
.data = &sysctl_tcp_dsack,

net/ipv4/tcp_input.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,7 @@ int sysctl_tcp_window_scaling __read_mostly = 1;
8181
int sysctl_tcp_sack __read_mostly = 1;
8282
int sysctl_tcp_fack __read_mostly = 1;
8383
int sysctl_tcp_reordering __read_mostly = TCP_FASTRETRANS_THRESH;
84+
int sysctl_tcp_max_reordering __read_mostly = 300;
8485
EXPORT_SYMBOL(sysctl_tcp_reordering);
8586
int sysctl_tcp_dsack __read_mostly = 1;
8687
int sysctl_tcp_app_win __read_mostly = 31;
@@ -833,7 +834,7 @@ static void tcp_update_reordering(struct sock *sk, const int metric,
833834
if (metric > tp->reordering) {
834835
int mib_idx;
835836

836-
tp->reordering = min(TCP_MAX_REORDERING, metric);
837+
tp->reordering = min(sysctl_tcp_max_reordering, metric);
837838

838839
/* This exciting event is worth to be remembered. 8) */
839840
if (ts)

0 commit comments

Comments
 (0)