Skip to content

Commit ec62f40

Browse files
chuckleveramschuma-ntap
authored andcommitted
xprtrdma: Ensure ia->ri_id->qp is not NULL when reconnecting
Devesh Sharma <Devesh.Sharma@Emulex.Com> reports that after a disconnect, his HCA is failing to create a fresh QP, leaving ia_ri->ri_id->qp set to NULL. But xprtrdma still allows RPCs to wake up and post LOCAL_INV as they exit, causing an oops. rpcrdma_ep_connect() is allowing the wake-up by leaking the QP creation error code (-EPERM in this case) to the RPC client's generic layer. xprt_connect_status() does not recognize -EPERM, so it kills pending RPC tasks immediately rather than retrying the connect. Re-arrange the QP creation logic so that when it fails on reconnect, it leaves ->qp with the old QP rather than NULL. If pending RPC tasks wake and exit, LOCAL_INV work requests will flush rather than oops. On initial connect, leaving ->qp == NULL is OK, since there are no pending RPCs that might use ->qp. But be sure not to try to destroy a NULL QP when rpcrdma_ep_connect() is retried. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
1 parent 65866f8 commit ec62f40

File tree

1 file changed

+20
-9
lines changed

1 file changed

+20
-9
lines changed

net/sunrpc/xprtrdma/verbs.c

Lines changed: 20 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -867,6 +867,7 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
867867
if (ep->rep_connected != 0) {
868868
struct rpcrdma_xprt *xprt;
869869
retry:
870+
dprintk("RPC: %s: reconnecting...\n", __func__);
870871
rc = rpcrdma_ep_disconnect(ep, ia);
871872
if (rc && rc != -ENOTCONN)
872873
dprintk("RPC: %s: rpcrdma_ep_disconnect"
@@ -879,7 +880,7 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
879880
id = rpcrdma_create_id(xprt, ia,
880881
(struct sockaddr *)&xprt->rx_data.addr);
881882
if (IS_ERR(id)) {
882-
rc = PTR_ERR(id);
883+
rc = -EHOSTUNREACH;
883884
goto out;
884885
}
885886
/* TEMP TEMP TEMP - fail if new device:
@@ -893,20 +894,30 @@ rpcrdma_ep_connect(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia)
893894
printk("RPC: %s: can't reconnect on "
894895
"different device!\n", __func__);
895896
rdma_destroy_id(id);
896-
rc = -ENETDOWN;
897+
rc = -ENETUNREACH;
897898
goto out;
898899
}
899900
/* END TEMP */
901+
rc = rdma_create_qp(id, ia->ri_pd, &ep->rep_attr);
902+
if (rc) {
903+
dprintk("RPC: %s: rdma_create_qp failed %i\n",
904+
__func__, rc);
905+
rdma_destroy_id(id);
906+
rc = -ENETUNREACH;
907+
goto out;
908+
}
900909
rdma_destroy_qp(ia->ri_id);
901910
rdma_destroy_id(ia->ri_id);
902911
ia->ri_id = id;
903-
}
904-
905-
rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
906-
if (rc) {
907-
dprintk("RPC: %s: rdma_create_qp failed %i\n",
908-
__func__, rc);
909-
goto out;
912+
} else {
913+
dprintk("RPC: %s: connecting...\n", __func__);
914+
rc = rdma_create_qp(ia->ri_id, ia->ri_pd, &ep->rep_attr);
915+
if (rc) {
916+
dprintk("RPC: %s: rdma_create_qp failed %i\n",
917+
__func__, rc);
918+
/* do not update ep->rep_connected */
919+
return -ENETUNREACH;
920+
}
910921
}
911922

912923
/* XXX Tavor device performs badly with 2K MTU! */

0 commit comments

Comments
 (0)