Skip to content

Commit 174edb0

Browse files
committed
xfs: store in-progress CoW allocations in the refcount btree
Due to the way the CoW algorithm in XFS works, there's an interval during which blocks allocated to handle a CoW can be lost -- if the FS goes down after the blocks are allocated but before the block remapping takes place. This is exacerbated by the cowextsz hint -- allocated reservations can sit around for a while, waiting to get used. Since the refcount btree doesn't normally store records with refcount of 1, we can use it to record these in-progress extents. In-progress blocks cannot be shared because they're not user-visible, so there shouldn't be any conflicts with other programs. This is a better solution than holding EFIs during writeback because (a) EFIs can't be relogged currently, (b) even if they could, EFIs are bound by available log space, which puts an unnecessary upper bound on how much CoW we can have in flight, and (c) we already have a mechanism to track blocks. At mount time, read the refcount records and free anything we find with a refcount of 1 because those were in-progress when the FS went down. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
1 parent 5e7e605 commit 174edb0

File tree

10 files changed

+551
-11
lines changed

10 files changed

+551
-11
lines changed

fs/xfs/libxfs/xfs_bmap.c

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4631,6 +4631,17 @@ xfs_bmapi_write(
46314631
goto error0;
46324632
if (bma.blkno == NULLFSBLOCK)
46334633
break;
4634+
4635+
/*
4636+
* If this is a CoW allocation, record the data in
4637+
* the refcount btree for orphan recovery.
4638+
*/
4639+
if (whichfork == XFS_COW_FORK) {
4640+
error = xfs_refcount_alloc_cow_extent(mp, dfops,
4641+
bma.blkno, bma.length);
4642+
if (error)
4643+
goto error0;
4644+
}
46344645
}
46354646

46364647
/* Deal with the allocated space we found. */

fs/xfs/libxfs/xfs_format.h

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1375,7 +1375,8 @@ struct xfs_owner_info {
13751375
#define XFS_RMAP_OWN_INOBT (-6ULL) /* Inode btree blocks */
13761376
#define XFS_RMAP_OWN_INODES (-7ULL) /* Inode chunk */
13771377
#define XFS_RMAP_OWN_REFC (-8ULL) /* refcount tree */
1378-
#define XFS_RMAP_OWN_MIN (-9ULL) /* guard */
1378+
#define XFS_RMAP_OWN_COW (-9ULL) /* cow allocations */
1379+
#define XFS_RMAP_OWN_MIN (-10ULL) /* guard */
13791380

13801381
#define XFS_RMAP_NON_INODE_OWNER(owner) (!!((owner) & (1ULL << 63)))
13811382

@@ -1477,6 +1478,17 @@ unsigned int xfs_refc_block(struct xfs_mount *mp);
14771478
* data) are not tracked here. Free space is also not tracked here.
14781479
* This is consistent with pre-reflink XFS.
14791480
*/
1481+
1482+
/*
1483+
* Extents that are being used to stage a copy on write are stored
1484+
* in the refcount btree with a refcount of 1 and the upper bit set
1485+
* on the startblock. This speeds up mount time deletion of stale
1486+
* staging extents because they're all at the right side of the tree.
1487+
*/
1488+
#define XFS_REFC_COW_START ((xfs_agblock_t)(1U << 31))
1489+
#define REFCNTBT_COWFLAG_BITLEN 1
1490+
#define REFCNTBT_AGBLOCK_BITLEN 31
1491+
14801492
struct xfs_refcount_rec {
14811493
__be32 rc_startblock; /* starting block number */
14821494
__be32 rc_blockcount; /* count of blocks */

0 commit comments

Comments
 (0)