-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
Expose max_queue_size in ThreadPoolExecutor #73781
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi! I think the ThreadPoolExecutor should allow to set the maximum size of the underlying queue. The situation I ran into recently was that I used ThreadPoolExecutor to parallelize AWS API calls; I had to move data from one S3 bucket to another (~150M objects). Contrary to what I expected the maximum size of the underlying queue doesn't have a non-zero value by default. Thus my process ended up consuming gigabytes of memory, because it put more items into the queue than the threads were able to work off: The queue just kept growing. (It ran on K8s and the pod was rightfully killed eventually.) Of course there ways to work around this. One could use more threads, to some extent. Or you could use your own queue with a defined maximum size. But I think it's more work for users of Python than necessary. |
Hello again, there's a reviewed PR open for this issue and it hasn't even received authoritative feedback yet (ie whether or not you intend to support this feature at all). I would be very happy if a core dev could look over this change before everyone forgets about it :) |
Ping. That's really a two-line change, can be easily reviewed in 15 minutes :) |
Prayslayer, please don't shove. Your PR request was responded to by Mariatta so it wasn't ignored. Making decisions about API expansions takes a while (making sure it fits the intended use, that it isn't a bug factory itself, that it is broadly useful, that it is the best solution to the problem, that is doesn't complicate the implementation or limit future opportunities, that there are unforeseen problems). Among the core developers, there are only a couple part-time contributors who are qualified to make these assessments for the multi-processing module (those devs don't include me). |
My project we're going into the underlying _work_queue and blocking adding more elements based on unfinished_tasks to accomplish this, bubbling this up to the API would be a welcome addition. |
Please note the PR here has some review comments that need addressing. I'm cc'ing Thomas Moreau, who has done a lot of work recently on the concurrent.futures internals. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: