ENH make initial binning in HGBT parallel #28064

lorentzenchr · 2024-01-04T19:58:37Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This PR make the the finding of thresholds/quantiles in _BinMapper.fit parallel with with concurrent.futures.ThreadPoolExecutor, see https://docs.python.org/3/library/concurrent.futures.html#threadpoolexecutor-example. This works indeed one of the recommended ways to make numpy parallel in pure Python, see https://numpy.org/doc/stable/reference/random/multithreading.html.

Any other comments?

Is there a reason, we never use a ThreadPoolExecutor?

The gain in execution speed is low as only a fraction of the fit time is spend finding the thresholds.

github-actions · 2024-01-04T19:59:49Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 5b171d0. Link to the linter CI: here}

jjerphan · 2024-01-05T16:23:39Z

sklearn/ensemble/_hist_gradient_boosting/binning.py

@@ -6,6 +6,7 @@
 approximately the same number of samples.
 """
 # Author: Nicolas Hug
+import concurrent.futures


Is there a reason, we never use a ThreadPoolExecutor?

I think loky is preferred and used as a back-end via joblib instead.

But this here uses ThreadPoolExecutor, not ProcessPoolExecutor.

I see.

I think @ogrisel is much more knowledgeable than me to make an educated decision.

This pattern is interesting ; I wonder whether we could abstract it and reuse it in some places.

I confirm that loky is only useful for process-based parallelism. For Python-level thread-based parallelism, ThreadPoolExecutor is perfectly fine.

ogrisel · 2024-01-08T10:00:46Z

Is there a reason, we never use a ThreadPoolExecutor?

We traditionally used joblib.Parallel(prefer="threads") (overridable selection of the threading backend) or joblib.Parallel(backend="threading") (non-overriable selection of the threading backend) for Python level thread-based, parallelism in scikit-learn.

Internally, joblib with the "threading" backend uses multiprocessing.ThreadPool which is the ancestor of concurrent.futures.ThreadPoolExecutor but they are very similar. I find the API of concurrent.futures.ThreadPoolExecutor cleaner than multiprocessing.ThreadPool, but joblib users should not see any difference.

In cases where we want to hard-code the uses of threads, I think that ThreadPoolExecutor and joblib.Parallel(backend="threading") are both valid options. ThreadPoolExecutor might be slightly lower-overhead (no joblib backend negotiation abstractions) and offers a richer API than joblib that makes it possible to consumes the results on the fly (with as_completed) while this is only possible with the dev version of joblib (via the return_as="unordered_generator" new parameter).

The gain in execution speed is low as only a fraction of the fit time is spend finding the thresholds.

Could you please post the results of a quick ad-hoc timeit for that step alone to quantify the speed-up out of curiosity?

Note that _BinMapper.fit finds the thresholds on a bounded random subsample of the training set (while _BinMapper.transform transforms the full training set but is already using OpenMP threads in its Cython code.

So when n_samples is large (in the millions or more), I expect _BinMapper.fit to be negligible compared to _BinMapper.transform and the latter should already be near-optimal in terms of parallelism.

ogrisel · 2024-01-08T10:53:20Z

Note: prior to making the code more complex for parallelization, maybe we should investigate optimizing the single-threaded variant: at the moment we sort each (subsampled) columns data twice:

https://github.com/lorentzenchr/scikit-learn/blob/c959ccf3660b22b7a04733e99fbed5dae525bbe0/sklearn/ensemble/_hist_gradient_boosting/binning.py#L50C4-L50C41

Once in np.unique and once in np.percentile. Maybe there is a way to sort only once instead.

lorentzenchr · 2024-01-11T08:49:24Z

Edit: Update (long) after merge of #28102, commit 9c5e16d

`n_threads`	`fit` time	`fit_transform`time
1	378 ms	602 ms
4	103 ms	188 ms

import numpy as np

from sklearn.datasets import make_classification
from sklearn.ensemble._hist_gradient_boosting.binning import _BinMapper


n_samples, n_features = 200_000, 20
n_bins = 256

X, y = make_classification(n_samples=n_samples, n_features=n_features)
categorical_remapped = np.zeros(n_features, dtype=bool)

bin_mapper = _BinMapper(
    n_bins=n_bins,
    is_categorical=categorical_remapped,
    known_categories=None,
    random_state=1,
    n_threads=1,
)
%timeit bin_mapper.fit(X)  # 378 ms ± 2.57 ms (old 465 ms ± 3.08 ms)
%timeit bin_mapper.fit_transform(X)  # 602 ms ± 8.5 ms (old 682 ms ± 4.33 ms)

bin_mapper = _BinMapper(
    n_bins=n_bins,
    is_categorical=categorical_remapped,
    known_categories=None,
    random_state=1,
    n_threads=4,
)
%timeit bin_mapper.fit(X)  # 103 ms ± 1.06 ms (old 137 ms ± 3.04 ms)
%timeit bin_mapper.fit_transform(X)  #  188 ms ± 9.76 ms (old 227 ms ± 3.85 ms)

OmarManzoor

LGTM. Thanks @lorentzenchr

lesteve · 2024-07-01T08:28:27Z

So this broke Pyodide, very likely because you can not start a thread in Pyodide, see build log

Stack-trace

_____________________ test_bin_mapper_n_features_transform _____________________

    def test_bin_mapper_n_features_transform():
>       mapper = _BinMapper(n_bins=42, random_state=42).fit(DATA)
        w          = <concurrent.futures.thread._WorkItem object at 0x111cf120>
/lib/python312.zip/concurrent/futures/thread.py:202: in _adjust_thread_count
    t.start()
        num_threads = 0
        self       = <concurrent.futures.thread.ThreadPoolExecutor object at 0x113cbb50>
        t          = <Thread(ThreadPoolExecutor-0_0, initial)>
        thread_name = 'ThreadPoolExecutor-0_0'
        weakref_cb = <function ThreadPoolExecutor._adjust_thread_count.<locals>.weakref_cb at 0x11454080>
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <Thread(ThreadPoolExecutor-0_0, initial)>

    def start(self):
        """Start the thread's activity.
    
        It must be called at most once per thread object. It arranges for the
        object's run() method to be invoked in a separate thread of control.
    
        This method will raise a RuntimeError if called more than once on the
        same thread object.
    
        """
        if not self._initialized:
            raise RuntimeError("thread.__init__() not called")
    
        if self._started.is_set():
            raise RuntimeError("threads can only be started once")
    
        with _active_limbo_lock:
            _limbo[self] = self
        try:
>           _start_new_thread(self._bootstrap, ())
E           RuntimeError: can't start new thread

self       = <Thread(ThreadPoolExecutor-0_0, initial)>

/lib/python312.zip/threading.py:992: RuntimeError

The work-around is to check whether we are inside Pyodide and not use multi-threading in this case? I guess Pyodide is the only case where you can not create threads.

I don't have a strong opinion on using joblib for multi-threading, but using joblib.Parallel would have switched to n_jobs=1 inside Pyodide. Side-comment: I think returns_as='unordered_generator' is in joblib 1.4 and not only in the joblib dev version see changelog. Having said that not sure we want to require joblib>=1.4, released April 2024, to use this feature.

ENH make initial binning in HGBT parallel

c959ccf

github-actions bot added the module:ensemble label Jan 4, 2024

jjerphan reviewed Jan 5, 2024

View reviewed changes

lorentzenchr mentioned this pull request Jan 11, 2024

ENH sort before binning in HGBT #28102

Merged

Merge branch 'main' into hgbt_parallel_quantile_binning

9c5e16d

lorentzenchr added Quick Review For PRs that are quick to review Performance labels May 23, 2024

lorentzenchr added 2 commits May 23, 2024 09:17

Merge branch 'main' into hgbt_parallel_quantile_binning

e6ed0de

DOC add whatsnew 1.6

9b96fb6

lorentzenchr added this to the 1.6 milestone May 23, 2024

OmarManzoor approved these changes Jun 28, 2024

View reviewed changes

Merge branch 'main' into hgbt_parallel_quantile_binning

5b171d0

OmarManzoor added the Waiting for Second Reviewer First reviewer is done, need a second one! label Jun 28, 2024

betatim approved these changes Jun 28, 2024

View reviewed changes

OmarManzoor merged commit 2107404 into scikit-learn:main Jun 28, 2024
31 checks passed

lorentzenchr deleted the hgbt_parallel_quantile_binning branch June 28, 2024 15:21

jeremiedbb mentioned this pull request Jul 2, 2024

Release 1.5.1 #29382

Merged

11 tasks

OmarManzoor mentioned this pull request Jul 2, 2024

Fix for making the initial binning in HGBT parallel #29386

Merged

snath-xoc pushed a commit to snath-xoc/scikit-learn that referenced this pull request Jul 5, 2024

ENH make initial binning in HGBT parallel (scikit-learn#28064)

9704aef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH make initial binning in HGBT parallel #28064

ENH make initial binning in HGBT parallel #28064

lorentzenchr commented Jan 4, 2024

github-actions bot commented Jan 4, 2024 •

edited

Loading

jjerphan Jan 5, 2024 •

edited

Loading

lorentzenchr Jan 5, 2024

jjerphan Jan 5, 2024

jjerphan Jan 8, 2024

ogrisel Jan 8, 2024 •

edited

Loading

ogrisel commented Jan 8, 2024 •

edited

Loading

ogrisel commented Jan 8, 2024

lorentzenchr commented Jan 11, 2024 •

edited

Loading

OmarManzoor left a comment

lesteve commented Jul 1, 2024 •

edited

Loading

ENH make initial binning in HGBT parallel #28064

ENH make initial binning in HGBT parallel #28064

Conversation

lorentzenchr commented Jan 4, 2024

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Jan 4, 2024 • edited Loading

✔️ Linting Passed

jjerphan Jan 5, 2024 • edited Loading

Choose a reason for hiding this comment

lorentzenchr Jan 5, 2024

Choose a reason for hiding this comment

jjerphan Jan 5, 2024

Choose a reason for hiding this comment

jjerphan Jan 8, 2024

Choose a reason for hiding this comment

ogrisel Jan 8, 2024 • edited Loading

Choose a reason for hiding this comment

ogrisel commented Jan 8, 2024 • edited Loading

ogrisel commented Jan 8, 2024

lorentzenchr commented Jan 11, 2024 • edited Loading

OmarManzoor left a comment

Choose a reason for hiding this comment

lesteve commented Jul 1, 2024 • edited Loading

github-actions bot commented Jan 4, 2024 •

edited

Loading

jjerphan Jan 5, 2024 •

edited

Loading

ogrisel Jan 8, 2024 •

edited

Loading

ogrisel commented Jan 8, 2024 •

edited

Loading

lorentzenchr commented Jan 11, 2024 •

edited

Loading

lesteve commented Jul 1, 2024 •

edited

Loading