Skip to content

gh-117151: IO performance improvement, increase io.DEFAULT_BUFFER_SIZE to 128k #118144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions Doc/library/functions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1405,10 +1405,10 @@ are always available. They are listed here in alphabetical order.
:func:`io.TextIOWrapper.reconfigure`. When no *buffering* argument is
given, the default buffering policy works as follows:

* Binary files are buffered in fixed-size chunks; the size of the buffer is
chosen using a heuristic trying to determine the underlying device's "block
size" and falling back on :const:`io.DEFAULT_BUFFER_SIZE`. On many systems,
the buffer will typically be 4096 or 8192 bytes long.
* Binary files are buffered in fixed-size chunks; the size of the buffer
is ``max(min(blocksize, 8 MiB), DEFAULT_BUFFER_SIZE)``
when the device block size is available.
On most systems, the buffer will typically be 128 kilobytes long.

* "Interactive" text files (files for which :meth:`~io.IOBase.isatty`
returns ``True``) use line buffering. Other text files use the policy
Expand Down
15 changes: 8 additions & 7 deletions Lib/_pyio.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,9 @@
valid_seek_flags.add(os.SEEK_HOLE)
valid_seek_flags.add(os.SEEK_DATA)

# open() uses st_blksize whenever we can
DEFAULT_BUFFER_SIZE = 8 * 1024 # bytes
# open() uses max(min(blocksize, 8 MiB), DEFAULT_BUFFER_SIZE)
# when the device block size is available.
DEFAULT_BUFFER_SIZE = 128 * 1024 # bytes

# NOTE: Base classes defined here are registered with the "official" ABCs
# defined in io.py. We don't use real inheritance though, because we don't want
Expand Down Expand Up @@ -123,10 +124,10 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
the size of a fixed-size chunk buffer. When no buffering argument is
given, the default buffering policy works as follows:

* Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying device's
"block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
On many systems, the buffer will typically be 4096 or 8192 bytes long.
* Binary files are buffered in fixed-size chunks; the size of the buffer
is max(min(blocksize, 8 MiB), DEFAULT_BUFFER_SIZE)
when the device block size is available.
On most systems, the buffer will typically be 128 kilobytes long.

* "Interactive" text files (files for which isatty() returns True)
use line buffering. Other text files use the policy described above
Expand Down Expand Up @@ -242,7 +243,7 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
buffering = -1
line_buffering = True
if buffering < 0:
buffering = raw._blksize
buffering = max(min(raw._blksize, 8192 * 1024), DEFAULT_BUFFER_SIZE)
if buffering < 0:
raise ValueError("invalid buffering size")
if buffering == 0:
Expand Down
10 changes: 10 additions & 0 deletions Lib/test/test_file.py
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,16 @@ def testSetBufferSize(self):
with self.assertWarnsRegex(RuntimeWarning, 'line buffering'):
self._checkBufferSize(1)

def testDefaultBufferSize(self):
with self.open(TESTFN, 'wb') as f:
blksize = f.raw._blksize
f.write(b"\0" * 5_000_000)

with self.open(TESTFN, 'rb') as f:
data = f.read1()
expected_size = max(min(blksize, 8192 * 1024), io.DEFAULT_BUFFER_SIZE)
self.assertEqual(len(data), expected_size)

def testTruncateOnWindows(self):
# SF bug <https://bugs.python.org/issue801631>
# "file.truncate fault on windows"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
Increase ``io.DEFAULT_BUFFER_SIZE`` from 8k to 128k and adjust :func:`open` on
platforms where :meth:`os.fstat` provides a ``st_blksize`` field (such as Linux)
to use ``max(min(blocksize, 8 MiB), io.DEFAULT_BUFFER_SIZE)`` rather
than always using the device block size. This should improve I/O performance.
Patch by Romain Morotti.
12 changes: 6 additions & 6 deletions Modules/_io/_iomodule.c
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,7 @@ PyDoc_STRVAR(module_doc,
"DEFAULT_BUFFER_SIZE\n"
"\n"
" An int containing the default buffer size used by the module's buffered\n"
" I/O classes. open() uses the file's blksize (as obtained by os.stat) if\n"
" possible.\n"
" I/O classes.\n"
);


Expand Down Expand Up @@ -132,9 +131,9 @@ the size of a fixed-size chunk buffer. When no buffering argument is
given, the default buffering policy works as follows:

* Binary files are buffered in fixed-size chunks; the size of the buffer
is chosen using a heuristic trying to determine the underlying device's
"block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
On many systems, the buffer will typically be 4096 or 8192 bytes long.
is max(min(blocksize, 8 MiB), DEFAULT_BUFFER_SIZE)
when the device block size is available.
On most systems, the buffer will typically be 128 kilobytes long.

* "Interactive" text files (files for which isatty() returns True)
use line buffering. Other text files use the policy described above
Expand Down Expand Up @@ -200,7 +199,7 @@ static PyObject *
_io_open_impl(PyObject *module, PyObject *file, const char *mode,
int buffering, const char *encoding, const char *errors,
const char *newline, int closefd, PyObject *opener)
/*[clinic end generated code: output=aefafc4ce2b46dc0 input=cd034e7cdfbf4e78]*/
/*[clinic end generated code: output=aefafc4ce2b46dc0 input=28027fdaabb8d744]*/
{
size_t i;

Expand Down Expand Up @@ -371,6 +370,7 @@ _io_open_impl(PyObject *module, PyObject *file, const char *mode,
Py_DECREF(blksize_obj);
if (buffering == -1 && PyErr_Occurred())
goto error;
buffering = Py_MAX(Py_MIN(buffering, 8192 * 1024), DEFAULT_BUFFER_SIZE);
}
if (buffering < 0) {
PyErr_SetString(PyExc_ValueError,
Expand Down
2 changes: 1 addition & 1 deletion Modules/_io/_iomodule.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ extern Py_ssize_t _PyIO_find_line_ending(
*/
extern int _PyIO_trap_eintr(void);

#define DEFAULT_BUFFER_SIZE (8 * 1024) /* bytes */
#define DEFAULT_BUFFER_SIZE (128 * 1024) /* bytes */

/*
* Offset type for positioning.
Expand Down
8 changes: 4 additions & 4 deletions Modules/_io/clinic/_iomodule.c.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading