Skip to content

Commit 875f1d0

Browse files
committed
iov_iter: add ITER_BVEC_FLAG_NO_REF flag
For ITER_BVEC, if we're holding on to kernel pages, the caller doesn't need to grab a reference to the bvec pages, and drop that same reference on IO completion. This is essentially safe for any ITER_BVEC, but some use cases end up reusing pages and uncondtionally dropping a page reference on completion. And example of that is sendfile(2), that ends up being a splice_in + splice_out on the pipe pages. Add a flag that tells us it's fine to not grab a page reference to the bvec pages, since that caller knows not to drop a reference when it's done with the pages. Signed-off-by: Jens Axboe <axboe@kernel.dk>
1 parent bf33a76 commit 875f1d0

File tree

2 files changed

+22
-5
lines changed

2 files changed

+22
-5
lines changed

fs/io_uring.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -855,6 +855,9 @@ static int io_import_fixed(struct io_ring_ctx *ctx, int rw,
855855
iov_iter_bvec(iter, rw, imu->bvec, imu->nr_bvecs, offset + len);
856856
if (offset)
857857
iov_iter_advance(iter, offset);
858+
859+
/* don't drop a reference to these pages */
860+
iter->type |= ITER_BVEC_FLAG_NO_REF;
858861
return 0;
859862
}
860863

include/linux/uio.h

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,14 +23,23 @@ struct kvec {
2323
};
2424

2525
enum iter_type {
26-
ITER_IOVEC = 0,
27-
ITER_KVEC = 2,
28-
ITER_BVEC = 4,
29-
ITER_PIPE = 8,
30-
ITER_DISCARD = 16,
26+
/* set if ITER_BVEC doesn't hold a bv_page ref */
27+
ITER_BVEC_FLAG_NO_REF = 2,
28+
29+
/* iter types */
30+
ITER_IOVEC = 4,
31+
ITER_KVEC = 8,
32+
ITER_BVEC = 16,
33+
ITER_PIPE = 32,
34+
ITER_DISCARD = 64,
3135
};
3236

3337
struct iov_iter {
38+
/*
39+
* Bit 0 is the read/write bit, set if we're writing.
40+
* Bit 1 is the BVEC_FLAG_NO_REF bit, set if type is a bvec and
41+
* the caller isn't expecting to drop a page reference when done.
42+
*/
3443
unsigned int type;
3544
size_t iov_offset;
3645
size_t count;
@@ -84,6 +93,11 @@ static inline unsigned char iov_iter_rw(const struct iov_iter *i)
8493
return i->type & (READ | WRITE);
8594
}
8695

96+
static inline bool iov_iter_bvec_no_ref(const struct iov_iter *i)
97+
{
98+
return (i->type & ITER_BVEC_FLAG_NO_REF) != 0;
99+
}
100+
87101
/*
88102
* Total number of bytes covered by an iovec.
89103
*

0 commit comments

Comments
 (0)