Skip to content

Commit ce40be7

Browse files
committed
Merge branch 'for-3.7/core' of git://git.kernel.dk/linux-block
Pull block IO update from Jens Axboe: "Core block IO bits for 3.7. Not a huge round this time, it contains: - First series from Kent cleaning up and generalizing bio allocation and freeing. - WRITE_SAME support from Martin. - Mikulas patches to prevent O_DIRECT crashes when someone changes the block size of a device. - Make bio_split() work on data-less bio's (like trim/discards). - A few other minor fixups." Fixed up silent semantic mis-merge as per Mikulas Patocka and Andrew Morton. It is due to the VM no longer using a prio-tree (see commit 6b2dbba: "mm: replace vma prio_tree with an interval tree"). So make set_blocksize() use mapping_mapped() instead of open-coding the internal VM knowledge that has changed. * 'for-3.7/core' of git://git.kernel.dk/linux-block: (26 commits) block: makes bio_split support bio without data scatterlist: refactor the sg_nents scatterlist: add sg_nents fs: fix include/percpu-rwsem.h export error percpu-rw-semaphore: fix documentation typos fs/block_dev.c:1644:5: sparse: symbol 'blkdev_mmap' was not declared blockdev: turn a rw semaphore into a percpu rw semaphore Fix a crash when block device is read and block size is changed at the same time block: fix request_queue->flags initialization block: lift the initial queue bypass mode on blk_register_queue() instead of blk_init_allocated_queue() block: ioctl to zero block ranges block: Make blkdev_issue_zeroout use WRITE SAME block: Implement support for WRITE SAME block: Consolidate command flag and queue limit checks for merges block: Clean up special command handling logic block/blk-tag.c: Remove useless kfree block: remove the duplicated setting for congestion_threshold block: reject invalid queue attribute values block: Add bio_clone_bioset(), bio_clone_kmalloc() block: Consolidate bio_alloc_bioset(), bio_kmalloc() ...
2 parents ba0a5a3 + 02f3939 commit ce40be7

33 files changed

+770
-464
lines changed

Documentation/ABI/testing/sysfs-block

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -206,3 +206,17 @@ Description:
206206
when a discarded area is read the discard_zeroes_data
207207
parameter will be set to one. Otherwise it will be 0 and
208208
the result of reading a discarded area is undefined.
209+
210+
What: /sys/block/<disk>/queue/write_same_max_bytes
211+
Date: January 2012
212+
Contact: Martin K. Petersen <martin.petersen@oracle.com>
213+
Description:
214+
Some devices support a write same operation in which a
215+
single data block can be written to a range of several
216+
contiguous blocks on storage. This can be used to wipe
217+
areas on disk or to initialize drives in a RAID
218+
configuration. write_same_max_bytes indicates how many
219+
bytes can be written in a single write same command. If
220+
write_same_max_bytes is 0, write same is not supported
221+
by the device.
222+

Documentation/block/biodoc.txt

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -465,7 +465,6 @@ struct bio {
465465
bio_end_io_t *bi_end_io; /* bi_end_io (bio) */
466466
atomic_t bi_cnt; /* pin count: free when it hits zero */
467467
void *bi_private;
468-
bio_destructor_t *bi_destructor; /* bi_destructor (bio) */
469468
};
470469

471470
With this multipage bio design:
@@ -647,10 +646,6 @@ for a non-clone bio. There are the 6 pools setup for different size biovecs,
647646
so bio_alloc(gfp_mask, nr_iovecs) will allocate a vec_list of the
648647
given size from these slabs.
649648

650-
The bi_destructor() routine takes into account the possibility of the bio
651-
having originated from a different source (see later discussions on
652-
n/w to block transfers and kvec_cb)
653-
654649
The bio_get() routine may be used to hold an extra reference on a bio prior
655650
to i/o submission, if the bio fields are likely to be accessed after the
656651
i/o is issued (since the bio may otherwise get freed in case i/o completion

Documentation/percpu-rw-semaphore.txt

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
Percpu rw semaphores
2+
--------------------
3+
4+
Percpu rw semaphores is a new read-write semaphore design that is
5+
optimized for locking for reading.
6+
7+
The problem with traditional read-write semaphores is that when multiple
8+
cores take the lock for reading, the cache line containing the semaphore
9+
is bouncing between L1 caches of the cores, causing performance
10+
degradation.
11+
12+
Locking for reading is very fast, it uses RCU and it avoids any atomic
13+
instruction in the lock and unlock path. On the other hand, locking for
14+
writing is very expensive, it calls synchronize_rcu() that can take
15+
hundreds of milliseconds.
16+
17+
The lock is declared with "struct percpu_rw_semaphore" type.
18+
The lock is initialized percpu_init_rwsem, it returns 0 on success and
19+
-ENOMEM on allocation failure.
20+
The lock must be freed with percpu_free_rwsem to avoid memory leak.
21+
22+
The lock is locked for read with percpu_down_read, percpu_up_read and
23+
for write with percpu_down_write, percpu_up_write.
24+
25+
The idea of using RCU for optimized rw-lock was introduced by
26+
Eric Dumazet <eric.dumazet@gmail.com>.
27+
The code was written by Mikulas Patocka <mpatocka@redhat.com>

block/blk-core.c

Lines changed: 24 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -606,8 +606,8 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
606606
/*
607607
* A queue starts its life with bypass turned on to avoid
608608
* unnecessary bypass on/off overhead and nasty surprises during
609-
* init. The initial bypass will be finished at the end of
610-
* blk_init_allocated_queue().
609+
* init. The initial bypass will be finished when the queue is
610+
* registered by blk_register_queue().
611611
*/
612612
q->bypass_depth = 1;
613613
__set_bit(QUEUE_FLAG_BYPASS, &q->queue_flags);
@@ -694,7 +694,7 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
694694
q->request_fn = rfn;
695695
q->prep_rq_fn = NULL;
696696
q->unprep_rq_fn = NULL;
697-
q->queue_flags = QUEUE_FLAG_DEFAULT;
697+
q->queue_flags |= QUEUE_FLAG_DEFAULT;
698698

699699
/* Override internal queue lock with supplied lock pointer */
700700
if (lock)
@@ -710,11 +710,6 @@ blk_init_allocated_queue(struct request_queue *q, request_fn_proc *rfn,
710710
/* init elevator */
711711
if (elevator_init(q, NULL))
712712
return NULL;
713-
714-
blk_queue_congestion_threshold(q);
715-
716-
/* all done, end the initial bypass */
717-
blk_queue_bypass_end(q);
718713
return q;
719714
}
720715
EXPORT_SYMBOL(blk_init_allocated_queue);
@@ -1657,8 +1652,8 @@ generic_make_request_checks(struct bio *bio)
16571652
goto end_io;
16581653
}
16591654

1660-
if (unlikely(!(bio->bi_rw & REQ_DISCARD) &&
1661-
nr_sectors > queue_max_hw_sectors(q))) {
1655+
if (likely(bio_is_rw(bio) &&
1656+
nr_sectors > queue_max_hw_sectors(q))) {
16621657
printk(KERN_ERR "bio too big device %s (%u > %u)\n",
16631658
bdevname(bio->bi_bdev, b),
16641659
bio_sectors(bio),
@@ -1699,8 +1694,12 @@ generic_make_request_checks(struct bio *bio)
16991694

17001695
if ((bio->bi_rw & REQ_DISCARD) &&
17011696
(!blk_queue_discard(q) ||
1702-
((bio->bi_rw & REQ_SECURE) &&
1703-
!blk_queue_secdiscard(q)))) {
1697+
((bio->bi_rw & REQ_SECURE) && !blk_queue_secdiscard(q)))) {
1698+
err = -EOPNOTSUPP;
1699+
goto end_io;
1700+
}
1701+
1702+
if (bio->bi_rw & REQ_WRITE_SAME && !bdev_write_same(bio->bi_bdev)) {
17041703
err = -EOPNOTSUPP;
17051704
goto end_io;
17061705
}
@@ -1810,15 +1809,20 @@ EXPORT_SYMBOL(generic_make_request);
18101809
*/
18111810
void submit_bio(int rw, struct bio *bio)
18121811
{
1813-
int count = bio_sectors(bio);
1814-
18151812
bio->bi_rw |= rw;
18161813

18171814
/*
18181815
* If it's a regular read/write or a barrier with data attached,
18191816
* go through the normal accounting stuff before submission.
18201817
*/
1821-
if (bio_has_data(bio) && !(rw & REQ_DISCARD)) {
1818+
if (bio_has_data(bio)) {
1819+
unsigned int count;
1820+
1821+
if (unlikely(rw & REQ_WRITE_SAME))
1822+
count = bdev_logical_block_size(bio->bi_bdev) >> 9;
1823+
else
1824+
count = bio_sectors(bio);
1825+
18221826
if (rw & WRITE) {
18231827
count_vm_events(PGPGOUT, count);
18241828
} else {
@@ -1864,11 +1868,10 @@ EXPORT_SYMBOL(submit_bio);
18641868
*/
18651869
int blk_rq_check_limits(struct request_queue *q, struct request *rq)
18661870
{
1867-
if (rq->cmd_flags & REQ_DISCARD)
1871+
if (!rq_mergeable(rq))
18681872
return 0;
18691873

1870-
if (blk_rq_sectors(rq) > queue_max_sectors(q) ||
1871-
blk_rq_bytes(rq) > queue_max_hw_sectors(q) << 9) {
1874+
if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
18721875
printk(KERN_ERR "%s: over max size limit.\n", __func__);
18731876
return -EIO;
18741877
}
@@ -2340,7 +2343,7 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
23402343
req->buffer = bio_data(req->bio);
23412344

23422345
/* update sector only for requests with clear definition of sector */
2343-
if (req->cmd_type == REQ_TYPE_FS || (req->cmd_flags & REQ_DISCARD))
2346+
if (req->cmd_type == REQ_TYPE_FS)
23442347
req->__sector += total_bytes >> 9;
23452348

23462349
/* mixed attributes always follow the first bio */
@@ -2781,16 +2784,10 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
27812784
blk_rq_init(NULL, rq);
27822785

27832786
__rq_for_each_bio(bio_src, rq_src) {
2784-
bio = bio_alloc_bioset(gfp_mask, bio_src->bi_max_vecs, bs);
2787+
bio = bio_clone_bioset(bio_src, gfp_mask, bs);
27852788
if (!bio)
27862789
goto free_and_out;
27872790

2788-
__bio_clone(bio, bio_src);
2789-
2790-
if (bio_integrity(bio_src) &&
2791-
bio_integrity_clone(bio, bio_src, gfp_mask, bs))
2792-
goto free_and_out;
2793-
27942791
if (bio_ctr && bio_ctr(bio, bio_src, data))
27952792
goto free_and_out;
27962793

@@ -2807,7 +2804,7 @@ int blk_rq_prep_clone(struct request *rq, struct request *rq_src,
28072804

28082805
free_and_out:
28092806
if (bio)
2810-
bio_free(bio, bs);
2807+
bio_put(bio);
28112808
blk_rq_unprep_clone(rq);
28122809

28132810
return -ENOMEM;

block/blk-lib.c

Lines changed: 103 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,6 +129,80 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
129129
}
130130
EXPORT_SYMBOL(blkdev_issue_discard);
131131

132+
/**
133+
* blkdev_issue_write_same - queue a write same operation
134+
* @bdev: target blockdev
135+
* @sector: start sector
136+
* @nr_sects: number of sectors to write
137+
* @gfp_mask: memory allocation flags (for bio_alloc)
138+
* @page: page containing data to write
139+
*
140+
* Description:
141+
* Issue a write same request for the sectors in question.
142+
*/
143+
int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
144+
sector_t nr_sects, gfp_t gfp_mask,
145+
struct page *page)
146+
{
147+
DECLARE_COMPLETION_ONSTACK(wait);
148+
struct request_queue *q = bdev_get_queue(bdev);
149+
unsigned int max_write_same_sectors;
150+
struct bio_batch bb;
151+
struct bio *bio;
152+
int ret = 0;
153+
154+
if (!q)
155+
return -ENXIO;
156+
157+
max_write_same_sectors = q->limits.max_write_same_sectors;
158+
159+
if (max_write_same_sectors == 0)
160+
return -EOPNOTSUPP;
161+
162+
atomic_set(&bb.done, 1);
163+
bb.flags = 1 << BIO_UPTODATE;
164+
bb.wait = &wait;
165+
166+
while (nr_sects) {
167+
bio = bio_alloc(gfp_mask, 1);
168+
if (!bio) {
169+
ret = -ENOMEM;
170+
break;
171+
}
172+
173+
bio->bi_sector = sector;
174+
bio->bi_end_io = bio_batch_end_io;
175+
bio->bi_bdev = bdev;
176+
bio->bi_private = &bb;
177+
bio->bi_vcnt = 1;
178+
bio->bi_io_vec->bv_page = page;
179+
bio->bi_io_vec->bv_offset = 0;
180+
bio->bi_io_vec->bv_len = bdev_logical_block_size(bdev);
181+
182+
if (nr_sects > max_write_same_sectors) {
183+
bio->bi_size = max_write_same_sectors << 9;
184+
nr_sects -= max_write_same_sectors;
185+
sector += max_write_same_sectors;
186+
} else {
187+
bio->bi_size = nr_sects << 9;
188+
nr_sects = 0;
189+
}
190+
191+
atomic_inc(&bb.done);
192+
submit_bio(REQ_WRITE | REQ_WRITE_SAME, bio);
193+
}
194+
195+
/* Wait for bios in-flight */
196+
if (!atomic_dec_and_test(&bb.done))
197+
wait_for_completion(&wait);
198+
199+
if (!test_bit(BIO_UPTODATE, &bb.flags))
200+
ret = -ENOTSUPP;
201+
202+
return ret;
203+
}
204+
EXPORT_SYMBOL(blkdev_issue_write_same);
205+
132206
/**
133207
* blkdev_issue_zeroout - generate number of zero filed write bios
134208
* @bdev: blockdev to issue
@@ -140,7 +214,7 @@ EXPORT_SYMBOL(blkdev_issue_discard);
140214
* Generate and issue number of bios with zerofiled pages.
141215
*/
142216

143-
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
217+
int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
144218
sector_t nr_sects, gfp_t gfp_mask)
145219
{
146220
int ret;
@@ -190,4 +264,32 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
190264

191265
return ret;
192266
}
267+
268+
/**
269+
* blkdev_issue_zeroout - zero-fill a block range
270+
* @bdev: blockdev to write
271+
* @sector: start sector
272+
* @nr_sects: number of sectors to write
273+
* @gfp_mask: memory allocation flags (for bio_alloc)
274+
*
275+
* Description:
276+
* Generate and issue number of bios with zerofiled pages.
277+
*/
278+
279+
int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
280+
sector_t nr_sects, gfp_t gfp_mask)
281+
{
282+
if (bdev_write_same(bdev)) {
283+
unsigned char bdn[BDEVNAME_SIZE];
284+
285+
if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
286+
ZERO_PAGE(0)))
287+
return 0;
288+
289+
bdevname(bdev, bdn);
290+
pr_err("%s: WRITE SAME failed. Manually zeroing.\n", bdn);
291+
}
292+
293+
return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
294+
}
193295
EXPORT_SYMBOL(blkdev_issue_zeroout);

0 commit comments

Comments
 (0)