Skip to content

Commit ccede75

Browse files
chuckleveramschuma-ntap
authored andcommitted
xprtrdma: Spread reply processing over more CPUs
Commit d8f532d ("xprtrdma: Invoke rpcrdma_reply_handler directly from RECV completion") introduced a performance regression for NFS I/O small enough to not need memory registration. In multi- threaded benchmarks that generate primarily small I/O requests, IOPS throughput is reduced by nearly a third. This patch restores the previous level of throughput. Because workqueues are typically BOUND (in particular ib_comp_wq, nfsiod_workqueue, and rpciod_workqueue), NFS/RDMA workloads tend to aggregate on the CPU that is handling Receive completions. The usual approach to addressing this problem is to create a QP and CQ for each CPU, and then schedule transactions on the QP for the CPU where you want the transaction to complete. The transaction then does not require an extra context switch during completion to end up on the same CPU where the transaction was started. This approach doesn't work for the Linux NFS/RDMA client because currently the Linux NFS client does not support multiple connections per client-server pair, and the RDMA core API does not make it straightforward for ULPs to determine which CPU is responsible for handling Receive completions for a CQ. So for the moment, record the CPU number in the rpcrdma_req before the transport sends each RPC Call. Then during Receive completion, queue the RPC completion on that same CPU. Additionally, move all RPC completion processing to the deferred handler so that even RPCs with simple small replies complete on the CPU that sent the corresponding RPC Call. Fixes: d8f532d ("xprtrdma: Invoke rpcrdma_reply_handler ...") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
1 parent c156618 commit ccede75

File tree

4 files changed

+5
-6
lines changed

4 files changed

+5
-6
lines changed

net/sunrpc/xprtrdma/rpc_rdma.c

Lines changed: 1 addition & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1408,11 +1408,7 @@ void rpcrdma_reply_handler(struct rpcrdma_rep *rep)
14081408
dprintk("RPC: %s: reply %p completes request %p (xid 0x%08x)\n",
14091409
__func__, rep, req, be32_to_cpu(rep->rr_xid));
14101410

1411-
if (list_empty(&req->rl_registered) &&
1412-
!test_bit(RPCRDMA_REQ_F_TX_RESOURCES, &req->rl_flags))
1413-
rpcrdma_complete_rqst(rep);
1414-
else
1415-
queue_work(rpcrdma_receive_wq, &rep->rr_work);
1411+
queue_work_on(req->rl_cpu, rpcrdma_receive_wq, &rep->rr_work);
14161412
return;
14171413

14181414
out_badstatus:

net/sunrpc/xprtrdma/transport.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@
5252
#include <linux/slab.h>
5353
#include <linux/seq_file.h>
5454
#include <linux/sunrpc/addr.h>
55+
#include <linux/smp.h>
5556

5657
#include "xprt_rdma.h"
5758

@@ -656,6 +657,7 @@ xprt_rdma_allocate(struct rpc_task *task)
656657
task->tk_pid, __func__, rqst->rq_callsize,
657658
rqst->rq_rcvsize, req);
658659

660+
req->rl_cpu = smp_processor_id();
659661
req->rl_connect_cookie = 0; /* our reserved value */
660662
rpcrdma_set_xprtdata(rqst, req);
661663
rqst->rq_buffer = req->rl_sendbuf->rg_base;

net/sunrpc/xprtrdma/verbs.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ rpcrdma_alloc_wq(void)
8383
struct workqueue_struct *recv_wq;
8484

8585
recv_wq = alloc_workqueue("xprtrdma_receive",
86-
WQ_MEM_RECLAIM | WQ_UNBOUND | WQ_HIGHPRI,
86+
WQ_MEM_RECLAIM | WQ_HIGHPRI,
8787
0);
8888
if (!recv_wq)
8989
return -ENOMEM;

net/sunrpc/xprtrdma/xprt_rdma.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,6 +342,7 @@ enum {
342342
struct rpcrdma_buffer;
343343
struct rpcrdma_req {
344344
struct list_head rl_list;
345+
int rl_cpu;
345346
unsigned int rl_connect_cookie;
346347
struct rpcrdma_buffer *rl_buffer;
347348
struct rpcrdma_rep *rl_reply;

0 commit comments

Comments
 (0)