Skip to content

Commit 9493174

Browse files
chuckleveramschuma-ntap
authored andcommitted
xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers
Send buffer space is shared between the RPC-over-RDMA header and an RPC message. A large RPC-over-RDMA header means less space is available for the associated RPC message, which then has to be moved via an RDMA Read or Write. As more segments are added to the chunk lists, the header increases in size. Typical modern hardware needs only a few segments to convey the maximum payload size, but some devices and registration modes may need a lot of segments to convey data payload. Sometimes so many are needed that the remaining space in the Send buffer is not enough for the RPC message. Sending such a message usually fails. To ensure a transport can always make forward progress, cap the number of RDMA segments that are allowed in chunk lists. This prevents less-capable devices and memory registrations from consuming a large portion of the Send buffer by reducing the maximum data payload that can be conveyed with such devices. For now I choose an arbitrary maximum of 8 RDMA segments. This allows a maximum size RPC-over-RDMA header to fit nicely in the current 1024 byte inline threshold with over 700 bytes remaining for an inline RPC message. The current maximum data payload of NFS READ or WRITE requests is one megabyte. To convey that payload on a client with 4KB pages, each chunk segment would need to handle 32 or more data pages. This is well within the capabilities of FMR. For physical registration, the maximum payload size on platforms with 4KB pages is reduced to 32KB. For FRWR, a device's maximum page list depth would need to be at least 34 to support the maximum 1MB payload. A device with a smaller maximum page list depth means the maximum data payload is reduced when using that device. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Steve Wise <swise@opengridcomputing.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
1 parent 29c5542 commit 9493174

File tree

5 files changed

+23
-26
lines changed

5 files changed

+23
-26
lines changed

net/sunrpc/xprtrdma/fmr_ops.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ static size_t
4848
fmr_op_maxpages(struct rpcrdma_xprt *r_xprt)
4949
{
5050
return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
51-
rpcrdma_max_segments(r_xprt) * RPCRDMA_MAX_FMR_SGES);
51+
RPCRDMA_MAX_HDR_SEGS * RPCRDMA_MAX_FMR_SGES);
5252
}
5353

5454
static int

net/sunrpc/xprtrdma/frwr_ops.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ frwr_op_maxpages(struct rpcrdma_xprt *r_xprt)
243243
struct rpcrdma_ia *ia = &r_xprt->rx_ia;
244244

245245
return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
246-
rpcrdma_max_segments(r_xprt) * ia->ri_max_frmr_depth);
246+
RPCRDMA_MAX_HDR_SEGS * ia->ri_max_frmr_depth);
247247
}
248248

249249
static void

net/sunrpc/xprtrdma/physical_ops.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ static size_t
4747
physical_op_maxpages(struct rpcrdma_xprt *r_xprt)
4848
{
4949
return min_t(unsigned int, RPCRDMA_MAX_DATA_SEGS,
50-
rpcrdma_max_segments(r_xprt));
50+
RPCRDMA_MAX_HDR_SEGS);
5151
}
5252

5353
static int

net/sunrpc/xprtrdma/verbs.c

Lines changed: 0 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1271,25 +1271,3 @@ rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *r_xprt, unsigned int count)
12711271
rpcrdma_recv_buffer_put(rep);
12721272
return rc;
12731273
}
1274-
1275-
/* How many chunk list items fit within our inline buffers?
1276-
*/
1277-
unsigned int
1278-
rpcrdma_max_segments(struct rpcrdma_xprt *r_xprt)
1279-
{
1280-
struct rpcrdma_create_data_internal *cdata = &r_xprt->rx_data;
1281-
int bytes, segments;
1282-
1283-
bytes = min_t(unsigned int, cdata->inline_wsize, cdata->inline_rsize);
1284-
bytes -= RPCRDMA_HDRLEN_MIN;
1285-
if (bytes < sizeof(struct rpcrdma_segment) * 2) {
1286-
pr_warn("RPC: %s: inline threshold too small\n",
1287-
__func__);
1288-
return 0;
1289-
}
1290-
1291-
segments = 1 << (fls(bytes / sizeof(struct rpcrdma_segment)) - 1);
1292-
dprintk("RPC: %s: max chunk list size = %d segments\n",
1293-
__func__, segments);
1294-
return segments;
1295-
}

net/sunrpc/xprtrdma/xprt_rdma.h

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -144,6 +144,26 @@ rdmab_to_msg(struct rpcrdma_regbuf *rb)
144144

145145
#define RPCRDMA_DEF_GFP (GFP_NOIO | __GFP_NOWARN)
146146

147+
/* To ensure a transport can always make forward progress,
148+
* the number of RDMA segments allowed in header chunk lists
149+
* is capped at 8. This prevents less-capable devices and
150+
* memory registrations from overrunning the Send buffer
151+
* while building chunk lists.
152+
*
153+
* Elements of the Read list take up more room than the
154+
* Write list or Reply chunk. 8 read segments means the Read
155+
* list (or Write list or Reply chunk) cannot consume more
156+
* than
157+
*
158+
* ((8 + 2) * read segment size) + 1 XDR words, or 244 bytes.
159+
*
160+
* And the fixed part of the header is another 24 bytes.
161+
*
162+
* The smallest inline threshold is 1024 bytes, ensuring that
163+
* at least 750 bytes are available for RPC messages.
164+
*/
165+
#define RPCRDMA_MAX_HDR_SEGS (8)
166+
147167
/*
148168
* struct rpcrdma_rep -- this structure encapsulates state required to recv
149169
* and complete a reply, asychronously. It needs several pieces of
@@ -456,7 +476,6 @@ struct rpcrdma_regbuf *rpcrdma_alloc_regbuf(struct rpcrdma_ia *,
456476
void rpcrdma_free_regbuf(struct rpcrdma_ia *,
457477
struct rpcrdma_regbuf *);
458478

459-
unsigned int rpcrdma_max_segments(struct rpcrdma_xprt *);
460479
int rpcrdma_ep_post_extra_recv(struct rpcrdma_xprt *, unsigned int);
461480

462481
int frwr_alloc_recovery_wq(void);

0 commit comments

Comments
 (0)