Skip to content

Commit e339dd8

Browse files
Brian Fosterdjwong
authored andcommitted
xfs: use sync buffer I/O for sync delwri queue submission
If a delwri queue occurs of a buffer that sits on a delwri queue wait list, the queue sets _XBF_DELWRI_Q without changing the state of ->b_list. This occurs, for example, if another thread beats the current delwri waiter thread to the buffer lock after I/O completion. Once the waiter acquires the lock, it removes the buffer from the wait list and leaves a buffer with _XBF_DELWRI_Q set but not populated on a list. This results in a lost buffer submission and in turn can result in assert failures due to _XBF_DELWRI_Q being set on buffer reclaim or filesystem lockups if the buffer happens to cover an item in the AIL. This problem has been reproduced by repeated iterations of xfs/305 on high CPU count (28xcpu) systems with limited memory (~1GB). Dirty dquot reclaim races with an xfsaild push of a separate dquot backed by the same buffer such that the buffer sits on the reclaim wait list at the time xfsaild attempts to queue it. Since the latter dquot has been flush locked but the underlying buffer not submitted for I/O, the dquot pins the AIL and causes the filesystem to livelock. This race is essentially made possible by the buffer lock cycle involved with waiting on a synchronous delwri queue submission. Close the race by using synchronous buffer I/O for respective delwri queue submission. This means the buffer remains locked across the I/O and so is inaccessible from other contexts while in the intermediate wait list state. The sync buffer I/O wait mechanism is factored into a helper such that sync delwri buffer submission and serialization are batched operations. Designed-by: Dave Chinner <dchinner@redhat.com> Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
1 parent eaebb51 commit e339dd8

File tree

1 file changed

+41
-39
lines changed

1 file changed

+41
-39
lines changed

fs/xfs/xfs_buf.c

Lines changed: 41 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -1531,6 +1531,20 @@ xfs_buf_submit(
15311531
xfs_buf_rele(bp);
15321532
}
15331533

1534+
/*
1535+
* Wait for I/O completion of a sync buffer and return the I/O error code.
1536+
*/
1537+
static int
1538+
xfs_buf_iowait(
1539+
struct xfs_buf *bp)
1540+
{
1541+
trace_xfs_buf_iowait(bp, _RET_IP_);
1542+
wait_for_completion(&bp->b_iowait);
1543+
trace_xfs_buf_iowait_done(bp, _RET_IP_);
1544+
1545+
return bp->b_error;
1546+
}
1547+
15341548
/*
15351549
* Synchronous buffer IO submission path, read or write.
15361550
*/
@@ -1553,12 +1567,7 @@ xfs_buf_submit_wait(
15531567
error = __xfs_buf_submit(bp);
15541568
if (error)
15551569
goto out;
1556-
1557-
/* wait for completion before gathering the error from the buffer */
1558-
trace_xfs_buf_iowait(bp, _RET_IP_);
1559-
wait_for_completion(&bp->b_iowait);
1560-
trace_xfs_buf_iowait_done(bp, _RET_IP_);
1561-
error = bp->b_error;
1570+
error = xfs_buf_iowait(bp);
15621571

15631572
out:
15641573
/*
@@ -1961,16 +1970,11 @@ xfs_buf_cmp(
19611970
}
19621971

19631972
/*
1964-
* submit buffers for write.
1965-
*
1966-
* When we have a large buffer list, we do not want to hold all the buffers
1967-
* locked while we block on the request queue waiting for IO dispatch. To avoid
1968-
* this problem, we lock and submit buffers in groups of 50, thereby minimising
1969-
* the lock hold times for lists which may contain thousands of objects.
1970-
*
1971-
* To do this, we sort the buffer list before we walk the list to lock and
1972-
* submit buffers, and we plug and unplug around each group of buffers we
1973-
* submit.
1973+
* Submit buffers for write. If wait_list is specified, the buffers are
1974+
* submitted using sync I/O and placed on the wait list such that the caller can
1975+
* iowait each buffer. Otherwise async I/O is used and the buffers are released
1976+
* at I/O completion time. In either case, buffers remain locked until I/O
1977+
* completes and the buffer is released from the queue.
19741978
*/
19751979
static int
19761980
xfs_buf_delwri_submit_buffers(
@@ -2012,21 +2016,22 @@ xfs_buf_delwri_submit_buffers(
20122016
trace_xfs_buf_delwri_split(bp, _RET_IP_);
20132017

20142018
/*
2015-
* We do all IO submission async. This means if we need
2016-
* to wait for IO completion we need to take an extra
2017-
* reference so the buffer is still valid on the other
2018-
* side. We need to move the buffer onto the io_list
2019-
* at this point so the caller can still access it.
2019+
* If we have a wait list, each buffer (and associated delwri
2020+
* queue reference) transfers to it and is submitted
2021+
* synchronously. Otherwise, drop the buffer from the delwri
2022+
* queue and submit async.
20202023
*/
20212024
bp->b_flags &= ~(_XBF_DELWRI_Q | XBF_WRITE_FAIL);
2022-
bp->b_flags |= XBF_WRITE | XBF_ASYNC;
2025+
bp->b_flags |= XBF_WRITE;
20232026
if (wait_list) {
2024-
xfs_buf_hold(bp);
2027+
bp->b_flags &= ~XBF_ASYNC;
20252028
list_move_tail(&bp->b_list, wait_list);
2026-
} else
2029+
__xfs_buf_submit(bp);
2030+
} else {
2031+
bp->b_flags |= XBF_ASYNC;
20272032
list_del_init(&bp->b_list);
2028-
2029-
xfs_buf_submit(bp);
2033+
xfs_buf_submit(bp);
2034+
}
20302035
}
20312036
blk_finish_plug(&plug);
20322037

@@ -2073,9 +2078,11 @@ xfs_buf_delwri_submit(
20732078

20742079
list_del_init(&bp->b_list);
20752080

2076-
/* locking the buffer will wait for async IO completion. */
2077-
xfs_buf_lock(bp);
2078-
error2 = bp->b_error;
2081+
/*
2082+
* Wait on the locked buffer, check for errors and unlock and
2083+
* release the delwri queue reference.
2084+
*/
2085+
error2 = xfs_buf_iowait(bp);
20792086
xfs_buf_relse(bp);
20802087
if (!error)
20812088
error = error2;
@@ -2121,23 +2128,18 @@ xfs_buf_delwri_pushbuf(
21212128

21222129
/*
21232130
* Delwri submission clears the DELWRI_Q buffer flag and returns with
2124-
* the buffer on the wait list with an associated reference. Rather than
2131+
* the buffer on the wait list with the original reference. Rather than
21252132
* bounce the buffer from a local wait list back to the original list
21262133
* after I/O completion, reuse the original list as the wait list.
21272134
*/
21282135
xfs_buf_delwri_submit_buffers(&submit_list, buffer_list);
21292136

21302137
/*
2131-
* The buffer is now under I/O and wait listed as during typical delwri
2132-
* submission. Lock the buffer to wait for I/O completion. Rather than
2133-
* remove the buffer from the wait list and release the reference, we
2134-
* want to return with the buffer queued to the original list. The
2135-
* buffer already sits on the original list with a wait list reference,
2136-
* however. If we let the queue inherit that wait list reference, all we
2137-
* need to do is reset the DELWRI_Q flag.
2138+
* The buffer is now locked, under I/O and wait listed on the original
2139+
* delwri queue. Wait for I/O completion, restore the DELWRI_Q flag and
2140+
* return with the buffer unlocked and on the original queue.
21382141
*/
2139-
xfs_buf_lock(bp);
2140-
error = bp->b_error;
2142+
error = xfs_buf_iowait(bp);
21412143
bp->b_flags |= _XBF_DELWRI_Q;
21422144
xfs_buf_unlock(bp);
21432145

0 commit comments

Comments
 (0)