Skip to content

Commit c771c14

Browse files
Eryu Guandjwong
authored andcommitted
iomap: invalidate page caches should be after iomap_dio_complete() in direct write
After XFS switching to iomap based DIO (commit acdda3a ("xfs: use iomap_dio_rw")), I started to notice dio29/dio30 tests failures from LTP run on ppc64 hosts, and they can be reproduced on x86_64 hosts with 512B/1k block size XFS too. dio29 diotest3 -b 65536 -n 100 -i 1000 -o 1024000 dio30 diotest6 -b 65536 -n 100 -i 1000 -o 1024000 The failure message is like: bufcmp: offset 0: Expected: 0x62, got 0x0 diotest03 1 TPASS : Read with Direct IO, Write without diotest03 2 TFAIL : diotest3.c:142: comparsion failed; child=98 offset=1425408 diotest03 3 TFAIL : diotest3.c:194: Write Direct-child 98 failed Direct write wrote 0x62 but buffer read got zero. This is because, when doing direct write to a hole or preallocated file, we invalidate the page caches before converting the extent from unwritten state to normal state, which is done by iomap_dio_complete(), thus leave a window for other buffer reader to cache the unwritten state extent. Consider this case, with sub-page blocksize XFS, two processes are direct writing to different blocksize-aligned regions (say 512B) of the same preallocated file, and reading the region back via buffered I/O to compare contents. process A, region [0,512] process B, region [512,1024] xfs_file_write_iter xfs_file_aio_dio_write iomap_dio_rw iomap_apply invalidate_inode_pages2_range xfs_file_write_iter xfs_file_aio_dio_write iomap_dio_rw iomap_apply invalidate_inode_pages2_range iomap_dio_complete xfs_file_read_iter xfs_file_buffered_aio_read generic_file_read_iter do_generic_file_read <readahead fills pagecache with 0> iomap_dio_complete xfs_file_read_iter <read gets 0 from pagecache> Process A first invalidates page caches, at this point the underlying extent is still in unwritten state (iomap_dio_complete not called yet), and process B finishs direct write and populates page caches via readahead, which caches zeros in page for region A, then process A reads zeros from page cache, instead of the actual data. Fix it by invalidating page caches after converting unwritten extent to make sure we read content from disk after extent state changed, as what we did before switching to iomap based dio. Also introduce a new 'start' variable to save the original write offset (iomap_dio_complete() updates iocb->ki_pos), and a 'err' variable for invalidating caches result, cause we can't reuse 'ret' anymore. Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
1 parent c1ae3cf commit c771c14

File tree

1 file changed

+10
-7
lines changed

1 file changed

+10
-7
lines changed

fs/iomap.c

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -846,7 +846,8 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
846846
struct address_space *mapping = iocb->ki_filp->f_mapping;
847847
struct inode *inode = file_inode(iocb->ki_filp);
848848
size_t count = iov_iter_count(iter);
849-
loff_t pos = iocb->ki_pos, end = iocb->ki_pos + count - 1, ret = 0;
849+
loff_t pos = iocb->ki_pos, start = pos;
850+
loff_t end = iocb->ki_pos + count - 1, ret = 0;
850851
unsigned int flags = IOMAP_DIRECT;
851852
struct blk_plug plug;
852853
struct iomap_dio *dio;
@@ -887,12 +888,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
887888
}
888889

889890
if (mapping->nrpages) {
890-
ret = filemap_write_and_wait_range(mapping, iocb->ki_pos, end);
891+
ret = filemap_write_and_wait_range(mapping, start, end);
891892
if (ret)
892893
goto out_free_dio;
893894

894895
ret = invalidate_inode_pages2_range(mapping,
895-
iocb->ki_pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
896+
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
896897
WARN_ON_ONCE(ret);
897898
ret = 0;
898899
}
@@ -941,6 +942,8 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
941942
__set_current_state(TASK_RUNNING);
942943
}
943944

945+
ret = iomap_dio_complete(dio);
946+
944947
/*
945948
* Try again to invalidate clean pages which might have been cached by
946949
* non-direct readahead, or faulted in by get_user_pages() if the source
@@ -949,12 +952,12 @@ iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
949952
* this invalidation fails, tough, the write still worked...
950953
*/
951954
if (iov_iter_rw(iter) == WRITE && mapping->nrpages) {
952-
ret = invalidate_inode_pages2_range(mapping,
953-
iocb->ki_pos >> PAGE_SHIFT, end >> PAGE_SHIFT);
954-
WARN_ON_ONCE(ret);
955+
int err = invalidate_inode_pages2_range(mapping,
956+
start >> PAGE_SHIFT, end >> PAGE_SHIFT);
957+
WARN_ON_ONCE(err);
955958
}
956959

957-
return iomap_dio_complete(dio);
960+
return ret;
958961

959962
out_free_dio:
960963
kfree(dio);

0 commit comments

Comments
 (0)