Skip to content

Commit ec35e48

Browse files
masonclkdave
authored andcommitted
btrfs: fix refcount_t usage when deleting btrfs_delayed_nodes
refcounts have a generic implementation and an asm optimized one. The generic version has extra debugging to make sure that once a refcount goes to zero, refcount_inc won't increase it. The btrfs delayed inode code wasn't expecting this, and we're tripping over the warnings when the generic refcounts are used. We ended up with this race: Process A Process B btrfs_get_delayed_node() spin_lock(root->inode_lock) radix_tree_lookup() __btrfs_release_delayed_node() refcount_dec_and_test(&delayed_node->refs) our refcount is now zero refcount_add(2) <--- warning here, refcount unchanged spin_lock(root->inode_lock) radix_tree_delete() With the generic refcounts, we actually warn again when process B above tries to release his refcount because refcount_add() turned into a no-op. We saw this in production on older kernels without the asm optimized refcounts. The fix used here is to use refcount_inc_not_zero() to detect when the object is in the middle of being freed and return NULL. This is almost always the right answer anyway, since we usually end up pitching the delayed_node if it didn't have fresh data in it. This also changes __btrfs_release_delayed_node() to remove the extra check for zero refcounts before radix tree deletion. btrfs_get_delayed_node() was the only path that was allowing refcounts to go from zero to one. Fixes: 6de5f18 ("btrfs: fix refcount_t usage when deleting btrfs_delayed_node") CC: <stable@vger.kernel.org> # 4.12+ Signed-off-by: Chris Mason <clm@fb.com> Reviewed-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent beed926 commit ec35e48

File tree

1 file changed

+34
-11
lines changed

1 file changed

+34
-11
lines changed

fs/btrfs/delayed-inode.c

Lines changed: 34 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -87,16 +87,38 @@ static struct btrfs_delayed_node *btrfs_get_delayed_node(
8787

8888
spin_lock(&root->inode_lock);
8989
node = radix_tree_lookup(&root->delayed_nodes_tree, ino);
90+
9091
if (node) {
9192
if (btrfs_inode->delayed_node) {
9293
refcount_inc(&node->refs); /* can be accessed */
9394
BUG_ON(btrfs_inode->delayed_node != node);
9495
spin_unlock(&root->inode_lock);
9596
return node;
9697
}
97-
btrfs_inode->delayed_node = node;
98-
/* can be accessed and cached in the inode */
99-
refcount_add(2, &node->refs);
98+
99+
/*
100+
* It's possible that we're racing into the middle of removing
101+
* this node from the radix tree. In this case, the refcount
102+
* was zero and it should never go back to one. Just return
103+
* NULL like it was never in the radix at all; our release
104+
* function is in the process of removing it.
105+
*
106+
* Some implementations of refcount_inc refuse to bump the
107+
* refcount once it has hit zero. If we don't do this dance
108+
* here, refcount_inc() may decide to just WARN_ONCE() instead
109+
* of actually bumping the refcount.
110+
*
111+
* If this node is properly in the radix, we want to bump the
112+
* refcount twice, once for the inode and once for this get
113+
* operation.
114+
*/
115+
if (refcount_inc_not_zero(&node->refs)) {
116+
refcount_inc(&node->refs);
117+
btrfs_inode->delayed_node = node;
118+
} else {
119+
node = NULL;
120+
}
121+
100122
spin_unlock(&root->inode_lock);
101123
return node;
102124
}
@@ -254,17 +276,18 @@ static void __btrfs_release_delayed_node(
254276
mutex_unlock(&delayed_node->mutex);
255277

256278
if (refcount_dec_and_test(&delayed_node->refs)) {
257-
bool free = false;
258279
struct btrfs_root *root = delayed_node->root;
280+
259281
spin_lock(&root->inode_lock);
260-
if (refcount_read(&delayed_node->refs) == 0) {
261-
radix_tree_delete(&root->delayed_nodes_tree,
262-
delayed_node->inode_id);
263-
free = true;
264-
}
282+
/*
283+
* Once our refcount goes to zero, nobody is allowed to bump it
284+
* back up. We can delete it now.
285+
*/
286+
ASSERT(refcount_read(&delayed_node->refs) == 0);
287+
radix_tree_delete(&root->delayed_nodes_tree,
288+
delayed_node->inode_id);
265289
spin_unlock(&root->inode_lock);
266-
if (free)
267-
kmem_cache_free(delayed_node_cache, delayed_node);
290+
kmem_cache_free(delayed_node_cache, delayed_node);
268291
}
269292
}
270293

0 commit comments

Comments
 (0)