Skip to content

Fix for making the initial binning in HGBT parallel #29386

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 3, 2024

Conversation

OmarManzoor
Copy link
Contributor

Reference Issues/PRs

Follow up of #28064

What does this implement/fix? Explain your changes.

  • Attempts to fix an error within Pyodide that occurred due to the introduction of parallelisation in the initial binning in HGBT.
  • Uses Joblib with threading backend instead of ThreadPoolExecutor.

Any other comments?

CC: @lesteve will this work?

Copy link

github-actions bot commented Jul 2, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 84eedc3. Link to the linter CI: here

@OmarManzoor
Copy link
Contributor Author

Current timings

n_threads fit time fit_transformtime
1 316 ms ± 405 μs 417 ms ± 11.8 ms
4 97.9 ms ± 586 μs 125 ms ± 743 μs

Timings from the previous PR

n_threads fit time fit_transformtime
1 378 ms ± 2.57 ms (old 465 ms ± 3.08 ms) 602 ms ± 8.5 ms (old 682 ms ± 4.33 ms)
4 103 ms ± 1.06 ms (old 137 ms ± 3.04 ms) 188 ms ± 9.76 ms (old 227 ms ± 3.85 ms)

@lesteve
Copy link
Member

lesteve commented Jul 2, 2024

CC: @lesteve will this work?

It should I think, I pushed a commit with [pyodide] to trigger the Pyodide build and see what happens.

FYI we were discussing this with other maintainers and here is a quick summary:

  • here is Olivier's comment about why it may make sense to use concurrent.futures directly ENH make initial binning in HGBT parallel #28064 (comment)
  • using joblib would kind of be more consistent with the rest of the codebase, but there may be some overhead. To be honest, benchmarks are hard but it is not like ENH make initial binning in HGBT parallel #28064 made a strong effort at benchmarking. Also, doing micro-benchmarkings, may not be a super useful use of our time ...
  • ENH make initial binning in HGBT parallel #28064 used as_completed to act on the results as soon as they come back. joblib.Parallel will wait that all the task finish before doing anything with the results. joblib 1.4 has returns_as='unordered_generator', see changelog that would do something similar. This feature is kind of new (and as such has probably not been battle-tested yet) and we probably don't want to require joblib>=1.4, released April 2024, just to use it for this particular use case.

@lesteve
Copy link
Member

lesteve commented Jul 2, 2024

So as expected Pyodide CI build is green.

I guess now comes the more complicated decision about what do we do about joblib with backend='threading' vs concurrent.futures?

Personally I would slightly lean towards sticking with joblib. It does feel like we are going to recreate a nano-joblib inside scikit-learn if we go down the concurrent.futures route (but this may still be manageable, not sure) for example:

  • should n_threads=1 mean serial as in joblib, or one worker thread as was implemented in ENH make initial binning in HGBT parallel #28064 (which is what is causing the issue with Pyodide).
  • should we suppport n_threads=-1 like joblib does? I think this will not happen here because there is a n_threads = _openmp_effective_n_threads(self.n_threads) before reaching the _BinMapper.fit code but you could imagine having the issue in another place where you need this additional logic rather than passing n_jobs to joblib.Parallel and not thinking about it too much.

Let's ping in no particular order @ogrisel and @jeremiedbb (who were part of the discussion I tried to make a summary of in #29386 (comment)) as well as @betatim @lorentzenchr and @jjerphan (who were involved in the original PR #28064).

The alternative to using joblib is not doing things in parallel if n_threads == 1. I don't think there is a good reason to create a worker thread when n_threads=1 outside of Pyodide. If we insist we want a "we are inside Pyodide" check, see their doc.

@jeremiedbb
Copy link
Member

So according to #29386 (comment), using joblib doesn't comes with an overhead, it's the opposite even. So I'm even more in favor of sticking with joblib and not introduce a new unnecessary pattern.

Maybe we could add a comment to use the return generator feature from joblib when 1.4 is our min version, but I'm not sure it would have a visible impact given that this task is already a small part of HGBT in terms of time spent.

@lesteve
Copy link
Member

lesteve commented Jul 3, 2024

OK let's merge this, sticking to joblib seems more conservative/less controversial for now and avoids the Pyodide issue.

We can potentially revisit using concurrent.futures in a separate PR with maybe more detailed benchmarks and discussion to motivate it.

@lesteve lesteve merged commit 82404ba into scikit-learn:main Jul 3, 2024
31 checks passed
@lesteve
Copy link
Member

lesteve commented Jul 3, 2024

Thanks @OmarManzoor for the fix!

@OmarManzoor OmarManzoor deleted the fix_for_parallel_binning branch July 3, 2024 15:35
@lorentzenchr
Copy link
Member

not introduce a new unnecessary pattern

While I recognize that this PR fixes a bug, this new pattern was a 1-1 copy-paste from python standard library, see https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example, and as a developer I find it quite strange to disfavor that.

The impact on timing is negligible anyway.

@lesteve
Copy link
Member

lesteve commented Jul 4, 2024

I don't strongly disagree, I think this should be discussed in a separate dedicated issue to try to have the conversation in a single place rather than spread in multiple PRs right now ...

@jeremiedbb
Copy link
Member

this new pattern was a 1-1 copy-paste from python standard library

@lorentzenchr, I was not saying that it's not a valid pattern or that it's new in the general python ecosystem. It's new in scikit-learn, because so far joblib was used for both multi-processing and multi-threading situations. I don't know the historical reason, maybe to have a common syntax. So what I'm saying is that I'd rather stick with one pattern unless it brings a net benefit.

snath-xoc pushed a commit to snath-xoc/scikit-learn that referenced this pull request Jul 5, 2024
…#29386)

Co-authored-by: Loïc Estève <loic.esteve@ymail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants