-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Fix for making the initial binning in HGBT parallel #29386
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix for making the initial binning in HGBT parallel #29386
Conversation
Current timings
Timings from the previous PR
|
It should I think, I pushed a commit with FYI we were discussing this with other maintainers and here is a quick summary:
|
So as expected Pyodide CI build is green. I guess now comes the more complicated decision about what do we do about Personally I would slightly lean towards sticking with
Let's ping in no particular order @ogrisel and @jeremiedbb (who were part of the discussion I tried to make a summary of in #29386 (comment)) as well as @betatim @lorentzenchr and @jjerphan (who were involved in the original PR #28064). The alternative to using |
So according to #29386 (comment), using joblib doesn't comes with an overhead, it's the opposite even. So I'm even more in favor of sticking with joblib and not introduce a new unnecessary pattern. Maybe we could add a comment to use the return generator feature from joblib when 1.4 is our min version, but I'm not sure it would have a visible impact given that this task is already a small part of HGBT in terms of time spent. |
OK let's merge this, sticking to We can potentially revisit using |
Thanks @OmarManzoor for the fix! |
While I recognize that this PR fixes a bug, this new pattern was a 1-1 copy-paste from python standard library, see https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example, and as a developer I find it quite strange to disfavor that. The impact on timing is negligible anyway. |
I don't strongly disagree, I think this should be discussed in a separate dedicated issue to try to have the conversation in a single place rather than spread in multiple PRs right now ... |
@lorentzenchr, I was not saying that it's not a valid pattern or that it's new in the general python ecosystem. It's new in scikit-learn, because so far joblib was used for both multi-processing and multi-threading situations. I don't know the historical reason, maybe to have a common syntax. So what I'm saying is that I'd rather stick with one pattern unless it brings a net benefit. |
…#29386) Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Reference Issues/PRs
Follow up of #28064
What does this implement/fix? Explain your changes.
Any other comments?
CC: @lesteve will this work?