Skip to content

Commit b1b4f96

Browse files
morottirmmancom
andauthored
gh-117151: IO performance improvement, increase io.DEFAULT_BUFFER_SIZE to 128k (GH-118144)
Co-authored-by: rmorotti <romain.morotti@man.com>
1 parent 4bf25a0 commit b1b4f96

File tree

7 files changed

+38
-22
lines changed

7 files changed

+38
-22
lines changed

Doc/library/functions.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1405,10 +1405,10 @@ are always available. They are listed here in alphabetical order.
14051405
:func:`io.TextIOWrapper.reconfigure`. When no *buffering* argument is
14061406
given, the default buffering policy works as follows:
14071407

1408-
* Binary files are buffered in fixed-size chunks; the size of the buffer is
1409-
chosen using a heuristic trying to determine the underlying device's "block
1410-
size" and falling back on :const:`io.DEFAULT_BUFFER_SIZE`. On many systems,
1411-
the buffer will typically be 4096 or 8192 bytes long.
1408+
* Binary files are buffered in fixed-size chunks; the size of the buffer
1409+
is ``max(min(blocksize, 8 MiB), DEFAULT_BUFFER_SIZE)``
1410+
when the device block size is available.
1411+
On most systems, the buffer will typically be 128 kilobytes long.
14121412

14131413
* "Interactive" text files (files for which :meth:`~io.IOBase.isatty`
14141414
returns ``True``) use line buffering. Other text files use the policy

Lib/_pyio.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,9 @@
2323
valid_seek_flags.add(os.SEEK_HOLE)
2424
valid_seek_flags.add(os.SEEK_DATA)
2525

26-
# open() uses st_blksize whenever we can
27-
DEFAULT_BUFFER_SIZE = 8 * 1024 # bytes
26+
# open() uses max(min(blocksize, 8 MiB), DEFAULT_BUFFER_SIZE)
27+
# when the device block size is available.
28+
DEFAULT_BUFFER_SIZE = 128 * 1024 # bytes
2829

2930
# NOTE: Base classes defined here are registered with the "official" ABCs
3031
# defined in io.py. We don't use real inheritance though, because we don't want
@@ -123,10 +124,10 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
123124
the size of a fixed-size chunk buffer. When no buffering argument is
124125
given, the default buffering policy works as follows:
125126
126-
* Binary files are buffered in fixed-size chunks; the size of the buffer
127-
is chosen using a heuristic trying to determine the underlying device's
128-
"block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
129-
On many systems, the buffer will typically be 4096 or 8192 bytes long.
127+
* Binary files are buffered in fixed-size chunks; the size of the buffer
128+
is max(min(blocksize, 8 MiB), DEFAULT_BUFFER_SIZE)
129+
when the device block size is available.
130+
On most systems, the buffer will typically be 128 kilobytes long.
130131
131132
* "Interactive" text files (files for which isatty() returns True)
132133
use line buffering. Other text files use the policy described above
@@ -242,7 +243,7 @@ def open(file, mode="r", buffering=-1, encoding=None, errors=None,
242243
buffering = -1
243244
line_buffering = True
244245
if buffering < 0:
245-
buffering = raw._blksize
246+
buffering = max(min(raw._blksize, 8192 * 1024), DEFAULT_BUFFER_SIZE)
246247
if buffering < 0:
247248
raise ValueError("invalid buffering size")
248249
if buffering == 0:

Lib/test/test_file.py

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -216,6 +216,16 @@ def testSetBufferSize(self):
216216
with self.assertWarnsRegex(RuntimeWarning, 'line buffering'):
217217
self._checkBufferSize(1)
218218

219+
def testDefaultBufferSize(self):
220+
with self.open(TESTFN, 'wb') as f:
221+
blksize = f.raw._blksize
222+
f.write(b"\0" * 5_000_000)
223+
224+
with self.open(TESTFN, 'rb') as f:
225+
data = f.read1()
226+
expected_size = max(min(blksize, 8192 * 1024), io.DEFAULT_BUFFER_SIZE)
227+
self.assertEqual(len(data), expected_size)
228+
219229
def testTruncateOnWindows(self):
220230
# SF bug <https://bugs.python.org/issue801631>
221231
# "file.truncate fault on windows"
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
Increase ``io.DEFAULT_BUFFER_SIZE`` from 8k to 128k and adjust :func:`open` on
2+
platforms where :meth:`os.fstat` provides a ``st_blksize`` field (such as Linux)
3+
to use ``max(min(blocksize, 8 MiB), io.DEFAULT_BUFFER_SIZE)`` rather
4+
than always using the device block size. This should improve I/O performance.
5+
Patch by Romain Morotti.

Modules/_io/_iomodule.c

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -60,8 +60,7 @@ PyDoc_STRVAR(module_doc,
6060
"DEFAULT_BUFFER_SIZE\n"
6161
"\n"
6262
" An int containing the default buffer size used by the module's buffered\n"
63-
" I/O classes. open() uses the file's blksize (as obtained by os.stat) if\n"
64-
" possible.\n"
63+
" I/O classes.\n"
6564
);
6665

6766

@@ -132,9 +131,9 @@ the size of a fixed-size chunk buffer. When no buffering argument is
132131
given, the default buffering policy works as follows:
133132
134133
* Binary files are buffered in fixed-size chunks; the size of the buffer
135-
is chosen using a heuristic trying to determine the underlying device's
136-
"block size" and falling back on `io.DEFAULT_BUFFER_SIZE`.
137-
On many systems, the buffer will typically be 4096 or 8192 bytes long.
134+
is max(min(blocksize, 8 MiB), DEFAULT_BUFFER_SIZE)
135+
when the device block size is available.
136+
On most systems, the buffer will typically be 128 kilobytes long.
138137
139138
* "Interactive" text files (files for which isatty() returns True)
140139
use line buffering. Other text files use the policy described above
@@ -200,7 +199,7 @@ static PyObject *
200199
_io_open_impl(PyObject *module, PyObject *file, const char *mode,
201200
int buffering, const char *encoding, const char *errors,
202201
const char *newline, int closefd, PyObject *opener)
203-
/*[clinic end generated code: output=aefafc4ce2b46dc0 input=cd034e7cdfbf4e78]*/
202+
/*[clinic end generated code: output=aefafc4ce2b46dc0 input=28027fdaabb8d744]*/
204203
{
205204
size_t i;
206205

@@ -371,6 +370,7 @@ _io_open_impl(PyObject *module, PyObject *file, const char *mode,
371370
Py_DECREF(blksize_obj);
372371
if (buffering == -1 && PyErr_Occurred())
373372
goto error;
373+
buffering = Py_MAX(Py_MIN(buffering, 8192 * 1024), DEFAULT_BUFFER_SIZE);
374374
}
375375
if (buffering < 0) {
376376
PyErr_SetString(PyExc_ValueError,

Modules/_io/_iomodule.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ extern Py_ssize_t _PyIO_find_line_ending(
7878
*/
7979
extern int _PyIO_trap_eintr(void);
8080

81-
#define DEFAULT_BUFFER_SIZE (8 * 1024) /* bytes */
81+
#define DEFAULT_BUFFER_SIZE (128 * 1024) /* bytes */
8282

8383
/*
8484
* Offset type for positioning.

Modules/_io/clinic/_iomodule.c.h

Lines changed: 4 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)