Skip to content

Commit 1535212

Browse files
GuoqingJiangshligit
authored andcommitted
md-cluster: fix locking when node joins cluster during message broadcast
If a node joins the cluster while a message broadcast is under way, a lock issue could happen as follows. For a cluster which included two nodes, if node A is calling __sendmsg before up-convert CR to EX on ack, and node B released CR on ack. But if a new node C joins the cluster and it doesn't receive the message which A sent before, so it could hold CR on ack before A up-convert CR to EX on ack. So a node joining the cluster should get an EX lock on the "token" first to ensure no broadcast is ongoing, then release it after held CR on ack. Reviewed-by: NeilBrown <neilb@suse.com> Signed-off-by: Guoqing Jiang <gqjiang@suse.com> Signed-off-by: Shaohua Li <shli@fb.com>
1 parent 5b0fb33 commit 1535212

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

drivers/md/md-cluster.c

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -781,17 +781,24 @@ static int join(struct mddev *mddev, int nodes)
781781
cinfo->token_lockres = lockres_init(mddev, "token", NULL, 0);
782782
if (!cinfo->token_lockres)
783783
goto err;
784-
cinfo->ack_lockres = lockres_init(mddev, "ack", ack_bast, 0);
785-
if (!cinfo->ack_lockres)
786-
goto err;
787784
cinfo->no_new_dev_lockres = lockres_init(mddev, "no-new-dev", NULL, 0);
788785
if (!cinfo->no_new_dev_lockres)
789786
goto err;
790787

788+
ret = dlm_lock_sync(cinfo->token_lockres, DLM_LOCK_EX);
789+
if (ret) {
790+
ret = -EAGAIN;
791+
pr_err("md-cluster: can't join cluster to avoid lock issue\n");
792+
goto err;
793+
}
794+
cinfo->ack_lockres = lockres_init(mddev, "ack", ack_bast, 0);
795+
if (!cinfo->ack_lockres)
796+
goto err;
791797
/* get sync CR lock on ACK. */
792798
if (dlm_lock_sync(cinfo->ack_lockres, DLM_LOCK_CR))
793799
pr_err("md-cluster: failed to get a sync CR lock on ACK!(%d)\n",
794800
ret);
801+
dlm_unlock_sync(cinfo->token_lockres);
795802
/* get sync CR lock on no-new-dev. */
796803
if (dlm_lock_sync(cinfo->no_new_dev_lockres, DLM_LOCK_CR))
797804
pr_err("md-cluster: failed to get a sync CR lock on no-new-dev!(%d)\n", ret);

0 commit comments

Comments
 (0)