Skip to content

gh-74028: Introduce a prefetch parameter to Executor.map to handle large iterators #114975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

Jason-Y-Z
Copy link
Contributor

@Jason-Y-Z Jason-Y-Z commented Feb 3, 2024

Introduce a prefetch parameter to Executor.map, so that large and even unbounded iterators can be handled.
This is a continuation of #18566, with backward compatibility, which is to say when the new prefetch parameter is not specified, we default to the current behaviour.
cc @graingert @rdarder @kumaraditya303 @brianquinlan


📚 Documentation preview 📚: https://cpython-previews--114975.org.readthedocs.build/

@Jason-Y-Z Jason-Y-Z force-pushed the fix-issue-29842 branch 7 times, most recently from 46ea84e to 67b7b0f Compare February 4, 2024 11:24
@Jason-Y-Z Jason-Y-Z changed the title bpo-29842: Introduce a prefetch parameter to Executor.map to handle l… gh-114948: Introduce a prefetch parameter to Executor.map to handle l… Feb 6, 2024
@gaogaotiantian
Copy link
Member

Code itself aside, I don't think this PR solves the current issue.

Yes, with the prefetch argument, the executor will only schedule a certain amount of tasks in the beginning. However, if you enumerate the result iterator, it will just have the exact same effect - blocking the manager thread due to a large amount of submits.

Also, this design means the worker process won't work, until the user requests the result - which is very counter intuitive. If the user maps a large amount of data, they would hope that the data is being processed in the background by the executor, not wait there until the user asks for the result.

So, from my personal perspective, the design is not what we want for the executor. If you want to fix this, you need more than this.

Copy link
Member

@gaogaotiantian gaogaotiantian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read the docs again and the docs clearly states:

the iterables are collected immediately rather than lazily;

So this could break backward compatibility. Also the original issue is solvable by chunksize. Unfortunately, I don't think this PR is worth more effort.

@vinismarques
Copy link

Also the original issue is solvable by chunksize

Unfortunately chunksize is not available for ThreadPoolExecutor, which is best suited for I/O operations. Do you know of any better alternatives that can handle large iterators?

@gaogaotiantian
Copy link
Member

Unfortunately chunksize is not available for ThreadPoolExecutor, which is best suited for I/O operations. Do you know of any better alternatives that can handle large iterators?

I believe the reason ThreadPoolExecutor does not have the argument, is because thread pool does not suffer from it. There's no expensive submit wakeup pipe queue so you can do large iterators just fine.

@Jason-Y-Z Jason-Y-Z changed the title gh-114948: Introduce a prefetch parameter to Executor.map to handle l… bpo-29842: Introduce a prefetch parameter to Executor.map to handle l… Feb 10, 2024
@Jason-Y-Z
Copy link
Contributor Author

Thanks for the discussion! Would love @graingert @rdarder @kumaraditya303 @brianquinlan your thoughts as well

@Jason-Y-Z
Copy link
Contributor Author

Jason-Y-Z commented Feb 10, 2024

So this could break backward compatibility.

I'm not sure whether I'm following fully here, but my intention was if we use the default or None for prefetch, we submit all tasks in the beginning. And I'm pretty sure that's how I implemented it, but I can very much likely have missed something, so very open to feedback here. It would be really useful to have an example here. @gaogaotiantian

Also the original issue is solvable by chunksize.

Do we mean the issue in bpo-29842 as well?

Unfortunately, I don't think this PR is worth more effort.

No worries at all, less work for me if that's case :)

@gaogaotiantian
Copy link
Member

I'm not sure whether I'm following fully here, but my intention was if we use the default or None for prefetch, we submit all tasks in the beginning. And I'm pretty sure that's how I implemented it, but I can very much likely have missed something, so very open to feedback here. It would be really useful to have an example here. @gaogaotiantian

My wording is inaccurate in that comment. What I really meant is - with the prefetch argument set, the iterables will not be resolved immediately, as compared to what the current docs states, which is definitely not the end of the world, especially considering prefetch itself expresses the meaning of "lazily". However, I would still be concerned because that makes the behavior more complicated that what it currently documented.

Do we mean the issue in bpo-29842 as well?

No, that one will not be solved by chunksize.

I think the fundamental factors we need beforep proceeding this, is the gain v cost. What do we get from the change?

  • Making executing.map similar to map sounds great and consistent, but we can't have that. What we can have is an optional "less eager" solution. That's not the consistency we hoped.
  • When will prefetch be used? Is there any benefit using it when the input is not infinite (compared to chunksize)? The current implementation only submits the task when it needs the result, which causes a delay in communication. So by doing it lazily it might take much more time to finish the iterable.
  • If this only helps when the iterable is infinite, how common is that? What's the workaround and if that's acceptable?

I'm not saying no. Just how I consider this issue. This might be why the original issue has been sitting there for a couple of years - it might be a serious issue to the users.

@Jason-Y-Z
Copy link
Contributor Author

Thanks for the comments! @gaogaotiantian

However, I would still be concerned because that makes the behavior more complicated that what it currently documented.

Happy to update the documentation if helpful.

When will prefetch be used? Is there any benefit using it when the input is not infinite (compared to chunksize)? The current implementation only submits the task when it needs the result, which causes a delay in communication. So by doing it lazily it might take much more time to finish the iterable.

I guess the main rationale is to give the users the flexibility, to choose how many items, from the input iterator, they would like to be processed at a time. This is certainly useful in the infinite iterator case, and I would imagine this to be useful in cases where, I don't want too many tasks to be processed at one time as well.
For example, I might be sending some requests to downstream services and I don't want that service to be overloaded. This will come in handy to be a simple rate-limiting mechanism. (Might be a poor example but I hope you get my point.)

@gaogaotiantian
Copy link
Member

Sorry but I still do not get the rationale for it. I understood your statements, but it did not convince me that this is a feature that worth the effort. The main concern is the usage - the current implementation will not submit task until the result is asked, after the prefetched ones, that just does not feel right to me.

You'll need some core dev behind this anyway (I'm not one). If you can find someone who likes the idea, you might be able to make some progress on this PR.

@Jason-Y-Z Jason-Y-Z changed the title bpo-29842: Introduce a prefetch parameter to Executor.map to handle l… gh-74028: Introduce a prefetch parameter to Executor.map to handle l… Feb 18, 2024
@Jason-Y-Z
Copy link
Contributor Author

Jason-Y-Z commented Feb 19, 2024

Hey @gpshead, sorry for tagging, but since this is concurrency related, I thought you might be interested

KangOl added a commit to odoo-dev/upgrade-util that referenced this pull request Jun 6, 2024
TLDR: RTFM

Once upon a time, in a countryside farm in Belgium...

At first, the upgrade of databases was straightforward. But, as time
passed, the size of the databases grew, and some CPU-intensive
computations took so much time that a solution needed to be found.
Hopefully, the Python standard library has the perfect module for this
task: `concurrent.futures`.
Then, Python 3.10 appeared, and the usage of `ProcessPoolExecutor`
started to sometimes hang for no apparent reasons. Soon, our hero finds
out he wasn't the only one to suffer from this issue[^1].
Unfortunately, the proposed solution looked overkill. Still, it
revealed that the issue had already been known[^2] for a few years.
Despite the fact that an official patch wasn't ready to be committed,
discussion about its legitimacy[^3] leads our hero to a nicer solution.

By default, `ProcessPoolExecutor.map` submits elements one by one to the
pool. This is pretty inefficient when there are a lot of elements to
process. This can be changed by using a large value for the *chunksize*
argument.

Who would have thought that a bigger chunk size would solve a
performance issue?
As always, the response was in the documentation[^4].

[^1]: https://stackoverflow.com/questions/74633896/processpoolexecutor-using-map-hang-on-large-load
[^2]: python/cpython#74028
[^3]: python/cpython#114975 (review)
[^4]: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map
robodoo pushed a commit to odoo/upgrade-util that referenced this pull request Jun 6, 2024
TLDR: RTFM

Once upon a time, in a countryside farm in Belgium...

At first, the upgrade of databases was straightforward. But, as time
passed, the size of the databases grew, and some CPU-intensive
computations took so much time that a solution needed to be found.
Hopefully, the Python standard library has the perfect module for this
task: `concurrent.futures`.
Then, Python 3.10 appeared, and the usage of `ProcessPoolExecutor`
started to sometimes hang for no apparent reasons. Soon, our hero finds
out he wasn't the only one to suffer from this issue[^1].
Unfortunately, the proposed solution looked overkill. Still, it
revealed that the issue had already been known[^2] for a few years.
Despite the fact that an official patch wasn't ready to be committed,
discussion about its legitimacy[^3] leads our hero to a nicer solution.

By default, `ProcessPoolExecutor.map` submits elements one by one to the
pool. This is pretty inefficient when there are a lot of elements to
process. This can be changed by using a large value for the *chunksize*
argument.

Who would have thought that a bigger chunk size would solve a
performance issue?
As always, the response was in the documentation[^4].

[^1]: https://stackoverflow.com/questions/74633896/processpoolexecutor-using-map-hang-on-large-load
[^2]: python/cpython#74028
[^3]: python/cpython#114975 (review)
[^4]: https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map

closes #94

Signed-off-by: Nicolas Seinlet (nse) <nse@odoo.com>
@hugovk hugovk changed the title gh-74028: Introduce a prefetch parameter to Executor.map to handle l… gh-74028: Introduce a prefetch parameter to Executor.map to handle large iterators Jun 14, 2024
@Jason-Y-Z Jason-Y-Z closed this Aug 3, 2024
@Jason-Y-Z Jason-Y-Z deleted the fix-issue-29842 branch August 3, 2024 19:40
@ebonnal
Copy link
Contributor

ebonnal commented Oct 17, 2024

Hi, fyi here is a follow up PR: #125663 🙏🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants