Skip to content

Commit 799959d

Browse files
committed
Simplify distance heuristics in read_stream.c.
Make the distance control heuristics simpler and more aggressive in preparation for asynchronous I/O. The v17 version of read_stream.c made a conservative choice to limit the look-ahead distance when streaming sequential blocks, because it couldn't benefit very much from looking ahead further yet. It had a three-behavior model where only random I/O would rapidly increase the look-ahead distance, to support read-ahead advice. Sequential I/O would move it towards the io_combine_limit setting, just enough to build one full-sized synchronous I/O at a time, and then expect kernel read-ahead to avoid I/O stalls. That already left I/O performance on the table with advice-based I/O concurrency, since sequential blocks could be followed by random jumps, eg with the proposed streaming Bitmap Heap Scan patch. It is time to delete the cautious middle option and adjust the distance based on recent I/O needs only, since asynchronous reads will need to be started ahead of time whether random or sequential. It is still limited by io_combine_limit, *_io_concurrency, buffer availability and strategy ring size, as before. Reviewed-by: Andres Freund <andres@anarazel.de> (earlier version) Tested-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGK_%3D4CVmMHvsHjOVrK6t4F%3DLBpFzsrr3R%2BaJYN8kcTfWg%40mail.gmail.com
1 parent 7ea8cd1 commit 799959d

File tree

1 file changed

+16
-52
lines changed

1 file changed

+16
-52
lines changed

src/backend/storage/aio/read_stream.c

Lines changed: 16 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -17,30 +17,12 @@
1717
* pending read. When that isn't possible, the existing pending read is sent
1818
* to StartReadBuffers() so that a new one can begin to form.
1919
*
20-
* The algorithm for controlling the look-ahead distance tries to classify the
21-
* stream into three ideal behaviors:
22-
*
23-
* A) No I/O is necessary, because the requested blocks are fully cached
24-
* already. There is no benefit to looking ahead more than one block, so
25-
* distance is 1. This is the default initial assumption.
26-
*
27-
* B) I/O is necessary, but read-ahead advice is undesirable because the
28-
* access is sequential and we can rely on the kernel's read-ahead heuristics,
29-
* or impossible because direct I/O is enabled, or the system doesn't support
30-
* read-ahead advice. There is no benefit in looking ahead more than
31-
* io_combine_limit, because in this case the only goal is larger read system
32-
* calls. Looking further ahead would pin many buffers and perform
33-
* speculative work for no benefit.
34-
*
35-
* C) I/O is necessary, it appears to be random, and this system supports
36-
* read-ahead advice. We'll look further ahead in order to reach the
37-
* configured level of I/O concurrency.
38-
*
39-
* The distance increases rapidly and decays slowly, so that it moves towards
40-
* those levels as different I/O patterns are discovered. For example, a
41-
* sequential scan of fully cached data doesn't bother looking ahead, but a
42-
* sequential scan that hits a region of uncached blocks will start issuing
43-
* increasingly wide read calls until it plateaus at io_combine_limit.
20+
* The algorithm for controlling the look-ahead distance is based on recent
21+
* cache hit and miss history. When no I/O is necessary, there is no benefit
22+
* in looking ahead more than one block. This is the default initial
23+
* assumption, but when blocks needing I/O are streamed, the distance is
24+
* increased rapidly to try to benefit from I/O combining and concurrency. It
25+
* is reduced gradually when cached blocks are streamed.
4426
*
4527
* The main data structure is a circular queue of buffers of size
4628
* max_pinned_buffers plus some extra space for technical reasons, ready to be
@@ -333,7 +315,7 @@ read_stream_start_pending_read(ReadStream *stream)
333315
/* Remember whether we need to wait before returning this buffer. */
334316
if (!need_wait)
335317
{
336-
/* Look-ahead distance decays, no I/O necessary (behavior A). */
318+
/* Look-ahead distance decays, no I/O necessary. */
337319
if (stream->distance > 1)
338320
stream->distance--;
339321
}
@@ -634,7 +616,7 @@ read_stream_begin_impl(int flags,
634616
/*
635617
* Skip the initial ramp-up phase if the caller says we're going to be
636618
* reading the whole relation. This way we start out assuming we'll be
637-
* doing full io_combine_limit sized reads (behavior B).
619+
* doing full io_combine_limit sized reads.
638620
*/
639621
if (flags & READ_STREAM_FULL)
640622
stream->distance = Min(max_pinned_buffers, stream->io_combine_limit);
@@ -725,10 +707,10 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
725707
#ifndef READ_STREAM_DISABLE_FAST_PATH
726708

727709
/*
728-
* A fast path for all-cached scans (behavior A). This is the same as the
729-
* usual algorithm, but it is specialized for no I/O and no per-buffer
730-
* data, so we can skip the queue management code, stay in the same buffer
731-
* slot and use singular StartReadBuffer().
710+
* A fast path for all-cached scans. This is the same as the usual
711+
* algorithm, but it is specialized for no I/O and no per-buffer data, so
712+
* we can skip the queue management code, stay in the same buffer slot and
713+
* use singular StartReadBuffer().
732714
*/
733715
if (likely(stream->fast_path))
734716
{
@@ -848,28 +830,10 @@ read_stream_next_buffer(ReadStream *stream, void **per_buffer_data)
848830
if (++stream->oldest_io_index == stream->max_ios)
849831
stream->oldest_io_index = 0;
850832

851-
if (stream->ios[io_index].op.flags & READ_BUFFERS_ISSUE_ADVICE)
852-
{
853-
/* Distance ramps up fast (behavior C). */
854-
distance = stream->distance * 2;
855-
distance = Min(distance, stream->max_pinned_buffers);
856-
stream->distance = distance;
857-
}
858-
else
859-
{
860-
/* No advice; move towards io_combine_limit (behavior B). */
861-
if (stream->distance > stream->io_combine_limit)
862-
{
863-
stream->distance--;
864-
}
865-
else
866-
{
867-
distance = stream->distance * 2;
868-
distance = Min(distance, stream->io_combine_limit);
869-
distance = Min(distance, stream->max_pinned_buffers);
870-
stream->distance = distance;
871-
}
872-
}
833+
/* Look-ahead distance ramps up rapidly after we do I/O. */
834+
distance = stream->distance * 2;
835+
distance = Min(distance, stream->max_pinned_buffers);
836+
stream->distance = distance;
873837

874838
/*
875839
* If we've reached the first block of a sequential region we're

0 commit comments

Comments
 (0)