Skip to content

gh-117151: IO performance improvement, increase io.DEFAULT_BUFFER_SIZE to 128k #118144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 19 commits into from
Mar 7, 2025

Conversation

morotti
Copy link
Contributor

@morotti morotti commented Apr 22, 2024

See discussion gh-117151

This patch adjusts the buffer size. That gives 3 to 5 times I/O performance improvement on modern hardware.

  • increase io.DEFAULT_BUFFER_SIZE to 128k
  • fix open() to use max(st_blksize, io.DEFAULT_BUFFER_SIZE)

@ghost
Copy link

ghost commented Apr 22, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link

bedevere-app bot commented Apr 22, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@encukou encukou added the performance Performance or resource usage label Apr 22, 2024
@bedevere-app
Copy link

bedevere-app bot commented Apr 22, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@morotti
Copy link
Contributor Author

morotti commented Apr 25, 2024

@serhiy-storchaka hello, you were kind to review my first PR for buffering performance (118037), would you be able to review this PR too?

@eendebakpt
Copy link
Contributor

eendebakpt commented Apr 30, 2024

@morotti You still need to write a news entry for the PR. For your other PR this is was not required, but due to the performance impact of this PR I think we should. The most convenient way for this is to click on the Details button next to bedevere/news of the CI checks which will open a tool to add the news entry. For more information also see https://devguide.python.org/core-developers/committing/#updating-news-and-what-s-new-in-python

The CLA is not yet working, but it worked for the other PR (which was created after this one), so maybe it will resolve itself automatically in a couple of days

@eendebakpt
Copy link
Contributor

Pinging @benjaminp as expert on io. Could you review this PR? Are there other benchmarks (besides the one in the corresponding issue) we can perform to test this PR?

@morotti
Copy link
Contributor Author

morotti commented Apr 30, 2024

Thank you for reviewing,

The CLA check is green now and I added a news entry.

@morotti
Copy link
Contributor Author

morotti commented May 17, 2024

@eendebakpt @benjaminp could you review?

@eendebakpt
Copy link
Contributor

@eendebakpt @benjaminp could you review?

@morotti The PR looks good from my side, but I am no core dev so I cannot approve. Currently many core devs are at pycon US, so I think we should wait a bit more. If in a few weeks there has been no further response, we can post a message to discourse (see https://devguide.python.org/getting-started/pull-request-lifecycle/#reviewing).

@itamaro itamaro added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 23, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @itamaro for commit 3668d1a 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label May 23, 2024
@itamaro itamaro added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 1, 2024
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @itamaro for commit 28e7bb7 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jun 1, 2024
@erlend-aasland
Copy link
Contributor

Requesting Serhiy's review, since he reviewd the other linked (and merged) PR #118037.

Copy link
Member

@serhiy-storchaka serhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was convinced that such a change could be beneficial.

But for the case if it has unexpected effect, please ask on https://discuss.python.org/. Update also the C implementation of open() which is used by default.

@serhiy-storchaka
Copy link
Member

Please do not use rebase and force-push. It makes reviewing more difficult.

@morotti
Copy link
Contributor Author

morotti commented Feb 8, 2025

I think you were looking into the winconsoleio optimization?
you have a bug ticket here #121940
and commit here cmaloney@7adcb0d

I think we should keep it as a separate issue for a separate PR, or we'll never complete anything. ^^

I do have an old branch with comments, from when I was investigating too https://github.com/man-group/cpython/blob/io-buffer-size-archive/Modules/_io/winconsoleio.c#L51 (The comments and the branch are obsolete as some other people have redone some of the buffering already, the 10 year old code was counting characters wrong or something).

TL;DR Old versions of windows were buggy with large writes to the console, theoretically above 65k, practically with as little as 26k, so there is code in winconsole to split writes into smaller writes. This is a windows XP era bug. No longer an issue in windows 8 (oldest windows supported by the interpreter). I think we're in agreement that the old code can be removed. I think you have branches and PR to do that. :)

@morotti
Copy link
Contributor Author

morotti commented Feb 10, 2025

alright, I've removed the constants _MAXIMUM_BUFFER_SIZE. build green.

I think we're good to merge

@erlend-aasland
Copy link
Contributor

(rebasing on master again because CI builds no longer work)

Please use git merge --no-ff main instead of rebasing. This has already been pointed out in earlier review.

@morotti
Copy link
Contributor Author

morotti commented Feb 11, 2025

I've updated the branch to main, anything else you want?

@morotti
Copy link
Contributor Author

morotti commented Feb 17, 2025

@cmaloney @gpshead @erlend-aasland the PR is approved and passing builds. can this be merged?

@bedevere-app
Copy link

bedevere-app bot commented Feb 17, 2025

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@morotti morotti requested a review from gpshead February 18, 2025 18:08
@morotti
Copy link
Contributor Author

morotti commented Feb 20, 2025

@gpshead I've made the changes, would you be able to review again?

@morotti
Copy link
Contributor Author

morotti commented Mar 6, 2025

hello @gpshead @cmaloney @serhiy-storchaka @eendebakpt would you be able to review and unblock the PR?

Copy link
Member

@gpshead gpshead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change looks good, but this PR was created with the checkbox allowing committers to make direct edits to the PR branch disabled so we can't take care of trivia that may be blocking it ourselves.

can you merge master so that it reruns modern CI?

@morotti
Copy link
Contributor Author

morotti commented Mar 7, 2025

@gpshead merged with master.

sorry, I haven't found any checkbox to allow to make direct edits or I would have ticket. I suspect it's because the fork is in my work organization, maybe that comes with extra restrictions.

@gpshead gpshead merged commit b1b4f96 into python:main Mar 7, 2025
42 checks passed
seehwan pushed a commit to seehwan/cpython that referenced this pull request Apr 16, 2025
…ER_SIZE to 128k (pythonGH-118144)

Co-authored-by: rmorotti <romain.morotti@man.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants