Skip to content

Commit 5ce5555

Browse files
fdmananakdave
authored andcommitted
Btrfs: fix deadlock when writing out free space caches
When writing out a block group free space cache we can end deadlocking with ourselves on an extent buffer lock resulting in a warning like the following: [245043.379979] WARNING: CPU: 4 PID: 2608 at fs/btrfs/locking.c:251 btrfs_tree_lock+0x1be/0x1d0 [btrfs] [245043.392792] CPU: 4 PID: 2608 Comm: btrfs-transacti Tainted: G W I 4.16.8 #1 [245043.395489] RIP: 0010:btrfs_tree_lock+0x1be/0x1d0 [btrfs] [245043.396791] RSP: 0018:ffffc9000424b840 EFLAGS: 00010246 [245043.398093] RAX: 0000000000000a30 RBX: ffff8807e20a3d20 RCX: 0000000000000001 [245043.399414] RDX: 0000000000000001 RSI: 0000000000000002 RDI: ffff8807e20a3d20 [245043.400732] RBP: 0000000000000001 R08: ffff88041f39a700 R09: ffff880000000000 [245043.402021] R10: 0000000000000040 R11: ffff8807e20a3d20 R12: ffff8807cb220630 [245043.403296] R13: 0000000000000001 R14: ffff8807cb220628 R15: ffff88041fbdf000 [245043.404780] FS: 0000000000000000(0000) GS:ffff88082fc80000(0000) knlGS:0000000000000000 [245043.406050] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [245043.407321] CR2: 00007fffdbdb9f10 CR3: 0000000001c09005 CR4: 00000000000206e0 [245043.408670] Call Trace: [245043.409977] btrfs_search_slot+0x761/0xa60 [btrfs] [245043.411278] btrfs_insert_empty_items+0x62/0xb0 [btrfs] [245043.412572] btrfs_insert_item+0x5b/0xc0 [btrfs] [245043.413922] btrfs_create_pending_block_groups+0xfb/0x1e0 [btrfs] [245043.415216] do_chunk_alloc+0x1e5/0x2a0 [btrfs] [245043.416487] find_free_extent+0xcd0/0xf60 [btrfs] [245043.417813] btrfs_reserve_extent+0x96/0x1e0 [btrfs] [245043.419105] btrfs_alloc_tree_block+0xfb/0x4a0 [btrfs] [245043.420378] __btrfs_cow_block+0x127/0x550 [btrfs] [245043.421652] btrfs_cow_block+0xee/0x190 [btrfs] [245043.422979] btrfs_search_slot+0x227/0xa60 [btrfs] [245043.424279] ? btrfs_update_inode_item+0x59/0x100 [btrfs] [245043.425538] ? iput+0x72/0x1e0 [245043.426798] write_one_cache_group.isra.49+0x20/0x90 [btrfs] [245043.428131] btrfs_start_dirty_block_groups+0x102/0x420 [btrfs] [245043.429419] btrfs_commit_transaction+0x11b/0x880 [btrfs] [245043.430712] ? start_transaction+0x8e/0x410 [btrfs] [245043.432006] transaction_kthread+0x184/0x1a0 [btrfs] [245043.433341] kthread+0xf0/0x130 [245043.434628] ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs] [245043.435928] ? kthread_create_worker_on_cpu+0x40/0x40 [245043.437236] ret_from_fork+0x1f/0x30 [245043.441054] ---[ end trace 15abaa2aaf36827f ]--- This is because at write_one_cache_group() when we are COWing a leaf from the extent tree we end up allocating a new block group (chunk) and, because we have hit a threshold on the number of bytes reserved for system chunks, we attempt to finalize the creation of new block groups from the current transaction, by calling btrfs_create_pending_block_groups(). However here we also need to modify the extent tree in order to insert a block group item, and if the location for this new block group item happens to be in the same leaf that we were COWing earlier, we deadlock since btrfs_search_slot() tries to write lock the extent buffer that we locked before at write_one_cache_group(). We have already hit similar cases in the past and commit d9a0540 ("Btrfs: fix deadlock when finalizing block group creation") fixed some of those cases by delaying the creation of pending block groups at the known specific spots that could lead to a deadlock. This change reworks that commit to be more generic so that we don't have to add similar logic to every possible path that can lead to a deadlock. This is done by making __btrfs_cow_block() disallowing the creation of new block groups (setting the transaction's can_flush_pending_bgs to false) before it attempts to allocate a new extent buffer for either the extent, chunk or device trees, since those are the trees that pending block creation modifies. Once the new extent buffer is allocated, it allows creation of pending block groups to happen again. This change depends on a recent patch from Josef which is not yet in Linus' tree, named "btrfs: make sure we create all new block groups" in order to avoid occasional warnings at btrfs_trans_release_chunk_metadata(). Fixes: d9a0540 ("Btrfs: fix deadlock when finalizing block group creation") CC: stable@vger.kernel.org # 4.4+ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199753 Link: https://lore.kernel.org/linux-btrfs/CAJtFHUTHna09ST-_EEiyWmDH6gAqS6wa=zMNMBsifj8ABu99cw@mail.gmail.com/ Reported-by: E V <eliventer@gmail.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent 7ed586d commit 5ce5555

File tree

2 files changed

+23
-10
lines changed

2 files changed

+23
-10
lines changed

fs/btrfs/ctree.c

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1014,9 +1014,26 @@ static noinline int __btrfs_cow_block(struct btrfs_trans_handle *trans,
10141014
if ((root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID) && parent)
10151015
parent_start = parent->start;
10161016

1017+
/*
1018+
* If we are COWing a node/leaf from the extent, chunk or device trees,
1019+
* make sure that we do not finish block group creation of pending block
1020+
* groups. We do this to avoid a deadlock.
1021+
* COWing can result in allocation of a new chunk, and flushing pending
1022+
* block groups (btrfs_create_pending_block_groups()) can be triggered
1023+
* when finishing allocation of a new chunk. Creation of a pending block
1024+
* group modifies the extent, chunk and device trees, therefore we could
1025+
* deadlock with ourselves since we are holding a lock on an extent
1026+
* buffer that btrfs_create_pending_block_groups() may try to COW later.
1027+
*/
1028+
if (root == fs_info->extent_root ||
1029+
root == fs_info->chunk_root ||
1030+
root == fs_info->dev_root)
1031+
trans->can_flush_pending_bgs = false;
1032+
10171033
cow = btrfs_alloc_tree_block(trans, root, parent_start,
10181034
root->root_key.objectid, &disk_key, level,
10191035
search_start, empty_size);
1036+
trans->can_flush_pending_bgs = true;
10201037
if (IS_ERR(cow))
10211038
return PTR_ERR(cow);
10221039

fs/btrfs/extent-tree.c

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -2954,7 +2954,6 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
29542954
struct btrfs_delayed_ref_head *head;
29552955
int ret;
29562956
int run_all = count == (unsigned long)-1;
2957-
bool can_flush_pending_bgs = trans->can_flush_pending_bgs;
29582957

29592958
/* We'll clean this up in btrfs_cleanup_transaction */
29602959
if (trans->aborted)
@@ -2971,7 +2970,6 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
29712970
#ifdef SCRAMBLE_DELAYED_REFS
29722971
delayed_refs->run_delayed_start = find_middle(&delayed_refs->root);
29732972
#endif
2974-
trans->can_flush_pending_bgs = false;
29752973
ret = __btrfs_run_delayed_refs(trans, count);
29762974
if (ret < 0) {
29772975
btrfs_abort_transaction(trans, ret);
@@ -3002,7 +3000,6 @@ int btrfs_run_delayed_refs(struct btrfs_trans_handle *trans,
30023000
goto again;
30033001
}
30043002
out:
3005-
trans->can_flush_pending_bgs = can_flush_pending_bgs;
30063003
return 0;
30073004
}
30083005

@@ -4589,11 +4586,9 @@ static int do_chunk_alloc(struct btrfs_trans_handle *trans, u64 flags,
45894586
* the block groups that were made dirty during the lifetime of the
45904587
* transaction.
45914588
*/
4592-
if (trans->can_flush_pending_bgs &&
4593-
trans->chunk_bytes_reserved >= (u64)SZ_2M) {
4589+
if (trans->chunk_bytes_reserved >= (u64)SZ_2M)
45944590
btrfs_create_pending_block_groups(trans);
4595-
btrfs_trans_release_chunk_metadata(trans);
4596-
}
4591+
45974592
return ret;
45984593
}
45994594

@@ -10132,9 +10127,10 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
1013210127
struct btrfs_block_group_item item;
1013310128
struct btrfs_key key;
1013410129
int ret = 0;
10135-
bool can_flush_pending_bgs = trans->can_flush_pending_bgs;
1013610130

10137-
trans->can_flush_pending_bgs = false;
10131+
if (!trans->can_flush_pending_bgs)
10132+
return;
10133+
1013810134
while (!list_empty(&trans->new_bgs)) {
1013910135
block_group = list_first_entry(&trans->new_bgs,
1014010136
struct btrfs_block_group_cache,
@@ -10159,7 +10155,7 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans)
1015910155
next:
1016010156
list_del_init(&block_group->bg_list);
1016110157
}
10162-
trans->can_flush_pending_bgs = can_flush_pending_bgs;
10158+
btrfs_trans_release_chunk_metadata(trans);
1016310159
}
1016410160

1016510161
int btrfs_make_block_group(struct btrfs_trans_handle *trans, u64 bytes_used,

0 commit comments

Comments
 (0)