PERF set openmp to use only physical cores by default #26082

ogrisel · 2023-04-04T09:49:01Z

Follow-up on #25918.
Related to:

Multicore scalability of the Histogram-based GBDT #14306 (comment)
DOC Add demo on parallelization with context manager using different backends #25714 (comment) (extremely pathological case on small data)
PERF: models with multithreading being slower than the same model with a single thread #25822
I also observed this when running a tutorial using hist gradient boosting in the past.

I think the performance of scikit-learn OpenMP enabled Cython routines would be more robust if we disable non-physical cores by default, even if it can result on a small performance slowdown in the rare cases where SMT was helpful.

Note that in itself, this is not a final fix for #14306 as for this estimator one might still want to:

experiment with sample-wise chunking and parallelizing over samples, at least when n_samples >> max(n_features, n_threads) (see Multicore scalability of the Histogram-based GBDT #14306 (comment))
adjust more finally set n_threads based on n_features and n_samples (and the availability of SMT cores).

Furthermore, for the non-HGBDT estimators, we might also want to conduct an empirical study to see if they can benefit from extra SMT threads robustly for various input data shapes, on a case by case basis.

ogrisel · 2023-04-04T09:52:27Z

@adrinjalali it would be nice if you could re-run your skops tests with this branch to see if it fixes the problem for you.

I expect that it would help at least a bit. Maybe we could refine that later by setting n_threads=1 in estimators that are called with very small training or test data on a case by case basis but this is a lot more work compared to this minimal PR.

jeremiedbb · 2023-04-04T09:55:30Z

@adrinjalali it would be nice if you could re-run your skops tests with this branch to see if it fixes the problem for you.

I suspect that for very small problems like these, completely disabling multi-threading might give even better perf

jeremiedbb

I guess we want an entry in the changelog

ogrisel · 2023-04-04T10:05:59Z

I guess we want an entry in the changelog

I pushed an entry. Let's check if the reference work in the HTML rendered by the CI.

sklearn/utils/_openmp_helpers.pyx

ogrisel · 2023-04-04T12:15:09Z

The reference to the section on "Parallelism" works as expected.

ogrisel · 2023-04-04T13:28:29Z

I discovered that when calling HistGradientBoostingClassifier many times on a small data, the overhead of loky.cpu_count(only_physical_cores=only_physical_cores) is quite significant (around 30%) which is sad because this is unlikely to change during the lifecycle of a Python program. This is visible by running such a program twice, once with OMP_NUM_THREADS to disable the calls to cpu_count vs another time without (that causes the call to cpu_count each time).

If we cache this, then the 30% overhead goes away. Let me do a quick PR to that PR to show the difference:

Cache repeated calls to cpu_count. ogrisel/scikit-learn#15

We could implement this as a new option in loky and joblib (or as a new method with an explicit name such as cached_cpu_count). However since we don't want to make scikit-learn depend on the latest release of joblib and loky we might need a backport anyway.

adrinjalali · 2023-04-04T14:38:21Z

I installed this PR in editable mode, so that might explain some slower times, but overall, they follow the same trend:

this PR
1:
27.84user 0.22system 0:28.11elapsed 99%CPU (0avgtext+0avgdata 315780maxresident)k
0inputs+440outputs (0major+54221minor)pagefaults 0swaps

6:
39.01user 7.18system 0:29.83elapsed 154%CPU (0avgtext+0avgdata 319744maxresident)k
0inputs+432outputs (5major+53718minor)pagefaults 0swaps

12:
54.00user 13.16system 0:31.99elapsed 209%CPU (0avgtext+0avgdata 346272maxresident)k
10360inputs+5424outputs (7major+54790minor)pagefaults 0swaps

auto:
44.72user 13.67system 0:31.04elapsed 188%CPU (0avgtext+0avgdata 319984maxresident)k
0inputs+448outputs (4major+52854minor)pagefaults 0swaps


1.2.2
1:
23.48user 0.21system 0:23.77elapsed 99%CPU (0avgtext+0avgdata 310256maxresident)k
0inputs+416outputs (0major+54664minor)pagefaults 0swaps

6:
34.46user 7.66system 0:24.75elapsed 170%CPU (0avgtext+0avgdata 328668maxresident)k
0inputs+424outputs (3major+54777minor)pagefaults 0swaps

12:
50.01user 14.23system 0:26.12elapsed 245%CPU (0avgtext+0avgdata 340644maxresident)k
0inputs+424outputs (6major+53719minor)pagefaults 0swaps

auto:
50.48user 14.00system 0:26.66elapsed 241%CPU (0avgtext+0avgdata 320436maxresident)k
0inputs+432outputs (4major+53971minor)pagefaults 0swaps

ogrisel · 2023-04-04T15:30:03Z

Thanks.

It's good to see that auto is no longer the worst offender so it's already beneficial, but as @jeremiedbb suspected, for your workload, we would need to detect small datasets and reduce the number of threads in a data-dependent way to get the best performance.

Could you also quickly check the impact of my cache-cpu_count branch?

adrinjalali · 2023-04-04T17:19:50Z

This is with your cache-cpu_count branch, definitely improves on this PR.

1:
29.82user 0.22system 0:30.11elapsed 99%CPU (0avgtext+0avgdata 315688maxresident)k
0inputs+552outputs (0major+56575minor)pagefaults 0swaps

6:
40.87user 7.63system 0:30.83elapsed 157%CPU (0avgtext+0avgdata 322120maxresident)k
0inputs+304outputs (3major+56236minor)pagefaults 0swaps

12:
57.95user 13.76system 0:33.44elapsed 214%CPU (0avgtext+0avgdata 325436maxresident)k
0inputs+304outputs (5major+53129minor)pagefaults 0swaps

auto:
46.57user 15.30system 0:32.46elapsed 190%CPU (0avgtext+0avgdata 322316maxresident)k
0inputs+304outputs (4major+54861minor)pagefaults 0swaps

jeremiedbb · 2023-04-05T08:04:54Z

I discovered that when calling HistGradientBoostingClassifier many times on a small data, the overhead of loky.cpu_count(only_physical_cores=only_physical_cores) is quite significant (around 30%)

@ogrisel what's the duration of these ? I measured loky.cpu_count:

only_physical_cores=False : 1ms
only_physical_cores=True : 27ms

Since loky already caches the result of only_physical_cores=True, all subsequent calls should last no longer than 1ms. Did your calls to HistGradientBoostingClassifier last around 1ms ?

ogrisel · 2023-04-05T09:30:43Z

This is with your cache-cpu_count branch, definitely improves on this PR.

Thanks Adrin. However I barely see any change on your benchmark between the current state of #26082 (this PR) and the cache-cpu_count branch. But that's alright.

Here is what I used for my quick tests:

from sklearn.datasets import make_classification
from sklearn.ensemble import HistGradientBoostingClassifier
from time import perf_counter


X, y = make_classification(n_samples=100, n_features=5)

n_iter = 100
tic = perf_counter()
for i in range(n_iter):
    HistGradientBoostingClassifier().fit(X, y)
toc = perf_counter()

print(f"fitting {n_iter} HGBC: {toc - tic:.3f} s")

It's ~4 seconds for 100 small gradient boosting models so ~40 ms per model.

But I get very different results for joblib.cpu_count() with the vendored loky:

In [4]: from joblib import cpu_count

In [5]: %timeit cpu_count(only_physical_cores=True)
61.4 µs ± 4.25 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [6]: %timeit cpu_count(only_physical_cores=False)
57.5 µs ± 198 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

So I don't understand why I thought caching would be needed in the first place.

ogrisel · 2023-04-05T09:41:31Z

Here are the output of the above scripts on the 3 branches on a machine 2 physical cores and 4 logical cores:

# No cache and use logical cores on main:
(dev) ogrisel@ici:~/code/scikit-learn (main) $ python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 5.882 s
(dev) ogrisel@ici:~/code/scikit-learn (main) $ OMP_NUM_THREADS=4 python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 4.301 s
(dev) ogrisel@ici:~/code/scikit-learn (main) $ OMP_NUM_THREADS=2 python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 3.704 s

# With the cache + limit to only physical cores by default:
(dev) ogrisel@ici:~/code/scikit-learn (main) $ git checkout cache-cpu_count && make in &> /dev/null
(dev) ogrisel@ici:~/code/scikit-learn (cache-cpu_count) $ python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 3.824 s
(dev) ogrisel@ici:~/code/scikit-learn (cache-cpu_count) $ OMP_NUM_THREADS=4 python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 4.629 s
(dev) ogrisel@ici:~/code/scikit-learn (cache-cpu_count) $ OMP_NUM_THREADS=2 python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 3.259 s

# Same without the cache:
(dev) ogrisel@ici:~/code/scikit-learn (cache-cpu_count) $ git checkout openmp-only-physical-cores-true-by-default && make in &> /dev/null
(dev) ogrisel@ici:~/code/scikit-learn (openmp-only-physical-cores-true-by-default) $ python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 4.376 s
(dev) ogrisel@ici:~/code/scikit-learn (openmp-only-physical-cores-true-by-default) $ OMP_NUM_THREADS=4 python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 4.344 s
(dev) ogrisel@ici:~/code/scikit-learn (openmp-only-physical-cores-true-by-default) $ OMP_NUM_THREADS=2 python ~/tmp/bench_small_hgbc.py 
fitting 100 HGBC: 3.880 s

So for this extremely small, repeated workload, caching the CPU count seems to be helpful.

There is a bit of variability between runs though. I should have seeded all the RNGs to remove some of it. But still both caching and limitting and restricting to logical threads seem to help.

jeremiedbb · 2023-04-05T09:45:36Z

The funny thing is that when you set OMP_NUM_THREADS, this value has priority and joblib.cpu_count is not called. So I don't know how to interpret these benchmarks 😄

jeremiedbb · 2023-04-05T09:49:39Z

Without setting the random seed for your benchmarks, I don't think the comparisons between branches are fair

ogrisel · 2023-04-05T12:21:36Z

Alternatively, we could get the number of physical cores on startup (module load), and use that instead of ever calling this method in the code maybe?

cpu_count does a bunch of extra system calls. I would rather not do those at module import time. I think the cache in _openmp_effective_n_threads is enough.

ogrisel · 2023-04-05T12:22:04Z

I will update the script and benchmark results to seed the RNGs.

ogrisel · 2023-04-05T12:47:36Z

New version of the script (with fixed seeds) with an even smaller dataset and more iterations to better assess all the kinds of overhead.

from sklearn.datasets import make_classification
from sklearn.ensemble import HistGradientBoostingClassifier
from time import perf_counter
import numpy as np

rng = np.random.RandomState(0)
X = rng.randn(10, 2)
y = rng.randint(0, 2, size=X.shape[0])

n_iter = 300
tic = perf_counter()
for i in range(n_iter):
    HistGradientBoostingClassifier(random_state=0).fit(X, y)
toc = perf_counter()

print(f"fitting {n_iter} HGBC: {toc - tic:.3f} s")

reference measurements on main:

(main) $ OMP_NUM_THREADS=1 python ~/tmp/bench_small_hgbc.py 
fitting 300 HGBC: 4.656 s
(main) $ OMP_NUM_THREADS=2 python ~/tmp/bench_small_hgbc.py 
fitting 300 HGBC: 4.750 s
(main) $ OMP_NUM_THREADS=4 python ~/tmp/bench_small_hgbc.py 
fitting 300 HGBC: 5.370 s
(main) $ python ~/tmp/bench_small_hgbc.py 
fitting 300 HGBC: 7.748 s

when OMP_NUM_THREADS is set, cpu_count is not called so we only see the impact of the tuning of the number of threads for this task with very little expected parallelism.

when OMP_NUM_THREADS is not set, cpu_count(only_physical_count=False) is called each time, adding some overhead on top of a suboptimal choice of n_threads.

on this PR, without caching cpu_count:

(openmp-only-physical-cores-true-by-default) $ python ~/tmp/bench_small_hgbc.py 
fitting 300 HGBC: 6.487 s

-> even if the number of cpu count is optimal, the overhead of counting them is problematic

on cache-cpu_count (updated):

$ python ~/tmp/bench_small_hgbc.py 
fitting 300 HGBC: 4.771 s

So this confirms that both the caching and the using protecting against oversubscription are useful.

ogrisel · 2023-04-05T12:49:33Z

Let me try on a machine with more cores to confirm.

jeremiedbb · 2023-04-05T12:52:41Z

To avoid the bad interaction with OMP_NUM_THREADS you can call threadpool_limits(1/2/4) instead in bench_small_hgbc.py . This way you can have the bench for all number of threads for the 3 branches

ogrisel · 2023-04-05T13:13:59Z

To avoid the bad interaction with OMP_NUM_THREADS you can call threadpool_limits(1/2/4) instead in bench_small_hgbc.py . This way you can have the bench for all number of threads for the 3 branches

Yes but then it would not allow to go above cpu_count(only_physical_cores=True) anymore (with the new code).

jeremiedbb · 2023-04-05T13:30:49Z

Besides caching, the issue is that HistGrandientBoostingClassifier calls _openmp_effective_n_threads way more than once per fit. When I run you benchmark script, I see that it's called 306 times per fit ! Oo

jeremiedbb · 2023-04-05T14:16:38Z

I ran my own benchmark only for cpu_count. With caching, _openmp_effective_n_threads takes ~1µs, which is way below the duration of 1 call to hgbt. Even if called 300 times, it has no impact on the duration. So I'm in favor of using a cache.

I'm also still in favor of using only physical cores by default.

thomasjpfan

I'm +1 on using physical cores by default. Out of curiosity, how does cpu_count work with Intel's P+E cores?

For example, the i9-12900k has 8 performance and 8 efficiency cores. Does cpu_count return 16 in this case?

jeremiedbb · 2023-04-05T15:21:09Z

For example, the i9-12900k has 8 performance and 8 efficiency cores. Does cpu_count return 16 in this case?

It relies on lscpu (linux) so I guess it returns 16. It seems that there are 16 physical cores. 8 with hyperthreading and 8 without. So I'd say cpu_count() returns 16 and cpu_count(only_physical_cores=True) returns 24. But the only way to be sure is to run it 😄

ogrisel · 2023-04-05T16:05:40Z

For example, the i9-12900k has 8 performance and 8 efficiency cores. Does cpu_count return 16 in this case?

Probably. On Apple Silicon M1 I get cpu_count(only_physical_cores=True) return 8 with a similar 4xP + 4xE configuration.

EDIT: actually this is on macOS. Let me try with linux in docker container.

I confirm loky returns 8 cores on M1 under Linux as well.

Here is the output of lscpu in that VM:

/# lscpu 
Architecture:                    aarch64
CPU op-mode(s):                  64-bit
Byte Order:                      Little Endian
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              1
Core(s) per socket:              8
Socket(s):                       1
Vendor ID:                       0x00
Model:                           0
Stepping:                        0x0
BogoMIPS:                        48.00
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Not affected
Vulnerability Spec store bypass: Vulnerable
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization
Vulnerability Spectre v2:        Not affected
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm jscvt fcma lrcpc dcpop sha3 asimddp sha512 asimdfhm dit uscat ilrcpc flagm ssbs sb paca pacg dc
                                 podp flagm2 frint

ogrisel · 2023-04-05T16:32:00Z

Let me finish a few measurements on another machine with 20 physical cores and 40 logical cores before merging.

jeremiedbb · 2023-04-05T16:33:22Z

Let me finish a few measurements on another machine with 20 physical cores and 40 logical cores before merging.

And merge the caching mechanism in this branch ;)

ogrisel · 2023-04-05T16:45:01Z

I confirm that on a machine with 40 logical cores and 20 physical cores, both effects are important:

on main (with logical cores and no cached cpu counts) this is quite catastrophic. I had to do a Ctrl-C after 1 or 2 minutes!
on this branch (only physical cores) without caching the calls to cpu_count:

python ~/bench_small_hgbdt.py 
fitting 300 HGBC: 14.600 s

so already much better but not ideal either.

on cache-cpu_count (only physical cores and caching):

$ python ~/bench_small_hgbdt.py 
fitting 300 HGBC: 8.114 s

Also for reference, here are the numbers when manually setting the OMP threads (cpu_count is not called in that case) on the same machine:

$ OMP_NUM_THREADS=1 python ~/bench_small_hgbdt.py 
fitting 300 HGBC: 6.305 s
$ OMP_NUM_THREADS=2 python ~/bench_small_hgbdt.py 
fitting 300 HGBC: 6.635 s
$ OMP_NUM_THREADS=20 python ~/bench_small_hgbdt.py 
fitting 300 HGBC: 8.949 s
$ OMP_NUM_THREADS=40 python ~/bench_small_hgbdt.py 
fitting 300 HGBC: 11.638 s

So the combination of only_physical_cores=True + caching cpu_count calls is close enough to optimal. The remaining discrepancy will have to be implemented by setting a number of threads on a case by case basis by crafting a cost heuristic that is estimator + data shape specific.

Cache repeated calls to cpu_count.

ogrisel · 2023-04-05T16:49:55Z

I merged the cached cpu_count sub-PR to this PR. Let's wait for a full CI run before merging to main.

ogrisel · 2023-04-06T08:28:00Z

All is green, merging!

…6082)

* MAINT Clean deprecated losses in (hist) gradient boosting for 1.3 (scikit-learn#25834) * MAINT Clean deprecation of normalize in calibration_curve for 1.3 (scikit-learn#25833) * BLD Clean command removes generated from cython templates (scikit-learn#25839) * PERF Implement `PairwiseDistancesReduction` backend for `KNeighbors.predict_proba` (scikit-learn#24076) Signed-off-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Added Parameter Validation for datasets.make_circles (scikit-learn#25848) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MNT use a single job by default with sphinx build (scikit-learn#25836) * BLD Generate warning automatically for templated cython files (scikit-learn#25842) * MAINT parameter validation for sklearn.datasets.fetch_lfw_people (scikit-learn#25820) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for metrics.fbeta_score (scikit-learn#25841) * TST add global_random_seed fixture to sklearn/covariance/tests/test_robust_covariance.py (scikit-learn#25821) * MAINT Parameter validation for linear_model.orthogonal_mp (scikit-learn#25817) * TST activate common tests for TSNE (scikit-learn#25374) * CI Update lock files (scikit-learn#25849) * MAINT Added Parameter Validation for metrics.mean_gamma_deviance (scikit-learn#25853) * MAINT Parameters validation for feature_selection.mutual_info_regression (scikit-learn#25850) * MAINT parameter validation metrics.class_likelihood_ratios (scikit-learn#25863) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Ensure disjoint interval constraints (scikit-learn#25797) * MAINT Parameters validation for utils.gen_batches (scikit-learn#25864) * TST use global_random_seed in test_dict_vectorizer.py (scikit-learn#24533) * TST use global_random_seed in test_pls.py (scikit-learn#24526) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * TST use global_random_seed in test_gpc.py (scikit-learn#24600) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * DOC Fix overlapping plot axis in bench_sample_without_replacement.py (scikit-learn#25870) * MAINT Use contiguous memoryviews in _random.pyx (scikit-learn#25871) * MAINT parameter validation sklearn.datasets.fetch_lfw_pair (scikit-learn#25857) * MAINT Parameters validation for metrics.classification_report (scikit-learn#25868) * Empty commit * DOC fix docstring dtype parameter in OrdinalEncoder (scikit-learn#25877) * MAINT Clean up depreacted "log" loss of SGDClassifier for 1.3 (scikit-learn#25865) * ENH Adds TargetEncoder (scikit-learn#25334) Co-authored-by: Andreas Mueller <t3kcit@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jovan Stojanovic <62058944+jovan-stojanovic@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * CI make it possible to cancel running Azure jobs (scikit-learn#25876) * MAINT Clean-up deprecated if_delegate_has_method for 1.3 (scikit-learn#25879) * MAINT Parameter validation for tree.export_text (scikit-learn#25867) * DOC impact of `tol` for solvers in RidgeClassifier (scikit-learn#25530) * MAINT Parameters validation for metrics.hinge_loss (scikit-learn#25880) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for metrics.ndcg_score (scikit-learn#25885) * ENH KMeans initialization account for sample weights (scikit-learn#25752) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * TST use global_random_seed in sklearn/tests/test_dummy.py (scikit-learn#25884) * DOC improve calibration user guide (scikit-learn#25687) * ENH Support for sparse matrices added to `sklearn.metrics.silhouette_samples` (scikit-learn#24677) Co-authored-by: Sahil Gupta <sahil@Sahils-MBP.lan> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT validate_params for plot_tree (scikit-learn#25882) Co-authored-by: Itay <itayvegh@gmail.com> * MAINT add missing space in error message in SVM (scikit-learn#25913) * FIX Adds requires_y tag to TargetEncoder (scikit-learn#25917) * MAINT Consistent cython types continued (scikit-learn#25810) * TST Speed-up common tests of DictionaryLearning (scikit-learn#25892) * TST Speed-up test_dbscan_optics_parity (scikit-learn#25893) * ENH add np.nan option for zero_division in precision/recall/f-score (scikit-learn#25531) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT Parameters validation for datasets.make_low_rank_matrix (scikit-learn#25901) * MAINT Parameter validation for metrics.cluster.adjusted_mutual_info_score (scikit-learn#25898) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * TST Speed-up test_partial_dependence.test_output_shape (scikit-learn#25895) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MAINT Parameters validation for datasets.make_regression (scikit-learn#25899) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for metrics.mean_squared_log_error (scikit-learn#25924) * TST Use global_random_seed in tests/test_naive_bayes.py (scikit-learn#25890) * TST add global_random_seed fixture to sklearn/datasets/tests/test_covtype.py (scikit-learn#25904) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for datasets.make_multilabel_classification (scikit-learn#25920) * Fixed feature mapping typo (scikit-learn#25934) * MAINT switch to newer codecov uploader (scikit-learn#25919) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> * TST Speed-up test suite when using pytest-xdist (scikit-learn#25918) * DOC update license year to 2023 (scikit-learn#25936) * FIX Remove spurious feature names warning in IsolationForest (scikit-learn#25931) * TST fix unstable test_newrand_set_seed (scikit-learn#25940) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Clean-up deprecated max_features="auto" in trees/forests/gb (scikit-learn#25941) * MAINT LogisticRegression informative error msg when penaly=elasticnet and l1_ratio is None (scikit-learn#25925) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Clean-up remaining SGDClassifier(loss="log") (scikit-learn#25938) * FIX Fixes pandas extension arrays in check_array (scikit-learn#25813) * FIX Fixes pandas extension arrays with objects in check_array (scikit-learn#25814) * CI Disable pytest-xdist in pylatest_pip_openblas_pandas build (scikit-learn#25943) * MAINT remove deprecated call to resources.content (scikit-learn#25951) * DOC note on calibration impact on ranking (scikit-learn#25900) * Remove loguniform fix, use scipy.stats instead (scikit-learn#24665) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Fix broken links in cluster.dbscan module (scikit-learn#25958) * DOC Fix lars Xy shape (scikit-learn#25952) * ENH Add drop_intermediate parameter to metrics.precision_recall_curve (scikit-learn#24668) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX improve error message when computing NDCG with a single document (scikit-learn#25672) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT introduce _get_response_values and _check_response_methods (scikit-learn#23073) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Extend message for large sparse matrices support (scikit-learn#25961) Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> * MAINT Parameters validation for datasets.make_gaussian_quantiles (scikit-learn#25959) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.d2_tweedie_score (scikit-learn#25975) * MAINT Parameters validation for datasets.make_hastie_10_2 (scikit-learn#25967) * MAINT Parameters validation for preprocessing.minmax_scale (scikit-learn#25962) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for datasets.make_checkerboard (scikit-learn#25955) * MAINT Parameters validation for datasets.make_biclusters (scikit-learn#25945) * MAINT Parameters validation for datasets.make_moons (scikit-learn#25971) * DOC replace deviance by loss in docstring of GradientBoosting (scikit-learn#25968) * MAINT Fix broken link in feature_selection/_univariate_selection.py (scikit-learn#25984) * DOC Update model_persistence.rst to fix skops example (scikit-learn#25993) Co-authored-by: adrinjalali <adrin.jalali@gmail.com> * DOC Specified meaning for max_patches=None in extract_patches_2d (scikit-learn#25996) * DOC document that last step is never cached in pipeline (scikit-learn#25995) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX SequentialFeatureSelector throws IndexError when cv is a generator (scikit-learn#25973) * ENH Adds infrequent categories support to OrdinalEncoder (scikit-learn#25677) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Andreas Mueller <t3kcit@gmail.com> * MAINT make plot_digits_denoising deterministic by fixing random state (scikit-learn#26004) * DOC improve example of PatchExtractor (scikit-learn#26002) * MAINT Parameters validation for datasets.make_friedman2 (scikit-learn#25986) * MAINT Parameters validation for datasets.make_friedman3 (scikit-learn#25989) * MAINT Parameters validation for datasets.make_sparse_uncorrelated (scikit-learn#26001) * MAINT Parameters validation for datasets.make_spd_matrix (scikit-learn#26003) * MAINT Parameters validation for datasets.make_sparse_spd_matrix (scikit-learn#26009) * DOC Added the meanings of default=None for PatchExtractor parameters (scikit-learn#26005) * MAINT remove unecessary check covered by parameter validation framework (scikit-learn#26014) * MAINT Consistent cython types from _typedefs (scikit-learn#25942) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * MAINT Parameters validation for datasets.make_swiss_roll (scikit-learn#26020) * MAINT Parameters validation for datasets.make_s_curve (scikit-learn#26022) * MAINT Parameters validation for datasets.make_blobs (scikit-learn#25983) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * DOC fix SplineTransformer include_bias docstring (scikit-learn#26018) * ENH RocCurveDisplay add option to plot chance level (scikit-learn#25987) * DOC show from_estimator and from_predictions for Displays (scikit-learn#25994) * EXA Fix rst in plot_partial_dependence (scikit-learn#26028) * CI Adds coverage to docker jobs on Azure (scikit-learn#26027) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * API Replace `n_iter` in `Bayesian Ridge` and `ARDRegression` (scikit-learn#25697) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * CLN Make _NumPyAPIWrapper naming consistent to _ArrayAPIWrapper (scikit-learn#26039) * CI disable coverage on Windows to keep CI times reasonable (scikit-learn#26052) * DOC Use Scientific Python Plausible instance for analytics (scikit-learn#25547) * MAINT Parameters validation for sklearn.preprocessing.scale (scikit-learn#26036) * MAINT Parameters validation for sklearn.metrics.pairwise.haversine_distances (scikit-learn#26047) * MAINT Parameters validation for sklearn.metrics.pairwise.laplacian_kernel (scikit-learn#26048) * MAINT Parameters validation for sklearn.metrics.pairwise.linear_kernel (scikit-learn#26049) * MAINT Parameters validation for sklearn.metrics.silhouette_samples (scikit-learn#26053) * MAINT Parameters validation for sklearn.preprocessing.add_dummy_feature (scikit-learn#26058) * Added Parameter Validation for metrics.cluster.normalized_mutual_info_score() (scikit-learn#26060) * DOC Typos in HistGradientBoosting documentation (scikit-learn#26057) * TST add global_random_seed fixture to sklearn/datasets/tests/test_rcv1.py (scikit-learn#26043) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_similarity (scikit-learn#26006) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ENH Adds isdtype to Array API wrapper (scikit-learn#26029) * MAINT Parameters validation for sklearn.metrics.silhouette_score (scikit-learn#26054) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * FIX fix spelling mistake in _NumPyAPIWrapper (scikit-learn#26064) * CI ignore more non-library Python files in codecov (scikit-learn#26059) * MAINT Parameters validation for sklearn.metrics.pairwise.cosine_distances (scikit-learn#26046) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Introduce BinaryClassifierCurveDisplayMixin (scikit-learn#25969) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ENH Forces shape to be tuple when using Array API's reshape (scikit-learn#26030) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Tim Head <betatim@gmail.com> * MAINT Parameters validation for sklearn.metrics.pairwise.paired_euclidean_distances (scikit-learn#26073) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_manhattan_distances (scikit-learn#26074) * MAINT Parameters validation for sklearn.metrics.pairwise.paired_cosine_distances (scikit-learn#26075) * MAINT Parameters validation for sklearn.preprocessing.binarize (scikit-learn#26076) * MAINT Parameters validation for metrics.explained_variance_score (scikit-learn#26079) * DOC use correct template name for displays (scikit-learn#26081) * MAINT Parameters validation for sklearn.preprocessing.maxabs_scale (scikit-learn#26077) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.preprocessing.label_binarize (scikit-learn#26078) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT parameter validation for d2_absolute_error_score (scikit-learn#26066) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameter validation for roc_auc_score (scikit-learn#26007) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.preprocessing.normalize (scikit-learn#26069) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameter validation for metrics.cluster.fowlkes_mallows_score (scikit-learn#26080) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for compose.make_column_transformer (scikit-learn#25897) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.metrics.pairwise.polynomial_kernel (scikit-learn#26070) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.pairwise.rbf_kernel (scikit-learn#26071) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.pairwise.sigmoid_kernel (scikit-learn#26072) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Param validation: constraint for numeric missing values (scikit-learn#26085) * FIX Adds support for negative values in categorical features in gradient boosting (scikit-learn#25629) Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Tim Head <betatim@gmail.com> * MAINT Fix C warning in Cython module splitting.pyx (scikit-learn#26051) * MNT Updates _isotonic.pyx to use memoryviews instead of `cnp.ndarray` (scikit-learn#26068) * FIX Fixes memory regression for inspecting extension arrays (scikit-learn#26106) * PERF set openmp to use only physical cores by default (scikit-learn#26082) * MNT Update black to 23.3.0 (scikit-learn#26110) * MNT Adds black commit to git-blame-ignore-revs (scikit-learn#26111) * MAINT Parameters validation for sklearn.metrics.pair_confusion_matrix (scikit-learn#26107) * MAINT Parameters validation for sklearn.metrics.mean_poisson_deviance (scikit-learn#26104) * DOC Use notebook style in plot_lof_outlier_detection.py (scikit-learn#26017) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * MAINT utils._fast_dict uses types from utils._typedefs (scikit-learn#26025) * DOC remove sparse-matrix for `y` in ElasticNet (scikit-learn#26127) * ENH add exponential loss (scikit-learn#25965) * MAINT Parameters validation for sklearn.preprocessing.robust_scale (scikit-learn#26086) * MAINT Parameters validation for sklearn.datasets.fetch_rcv1 (scikit-learn#26126) * MAINT Parameters validation for sklearn.metrics.adjusted_rand_score (scikit-learn#26134) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.calinski_harabasz_score (scikit-learn#26135) * MAINT Parameters validation for sklearn.metrics.davies_bouldin_score (scikit-learn#26136) * MAINT: remove `from numpy.math cimport` statements (scikit-learn#26143) * MAINT Parameters validation for sklearn.inspection.permutation_importance (scikit-learn#26145) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.cluster.homogeneity_completeness_v_measure (scikit-learn#26137) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.rand_score (scikit-learn#26138) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC update comment in metrics/tests/test_classification.py (scikit-learn#26150) * CI small cleanup of Cirrus CI test script (scikit-learn#26168) * MAINT remove deprecated is_categorical_dtype (scikit-learn#26156) * DOC Add skforecast to related projects page (scikit-learn#26133) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * FIX Keeps namedtuple's class when transform returns a tuple (scikit-learn#26121) * DOC corrected letter case for better readability in sklearn/metrics/_classification.py / (scikit-learn#26169) * MAINT Parameters validation for sklearn.preprocessing.power_transform (scikit-learn#26142) * FIX `roc_auc_score` now uses `y_prob` instead of `y_pred` (scikit-learn#26155) * MAINT Parameters validation for sklearn.datasets.load_iris (scikit-learn#26177) * MAINT Parameters validation for sklearn.datasets.load_diabetes (scikit-learn#26166) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_breast_cancer (scikit-learn#26165) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.metrics.cluster.entropy (scikit-learn#26162) * MAINT Parameters validation for sklearn.datasets.fetch_species_distributions (scikit-learn#26161) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * ASV Fix tol in SGDRegressorBenchmark (scikit-learn#26146) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MNT use api.openml.org URLs for fetch_openml (scikit-learn#26171) * MAINT Parameters validation for sklearn.utils.resample (scikit-learn#26139) * MAINT make it explicit that additive_chi2_kernel does not accept sparse matrix (scikit-learn#26178) * MNT fix circleci link in README.rst (scikit-learn#26183) * CI Fix circleci artifact redirector action (scikit-learn#26181) * GOV introduce rights for groups as discussed in SLEP019 (scikit-learn#25753) Co-authored-by: Julien <git@jjerphan.xyz> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MAINT Parameters validation for sklearn.neighbors.sort_graph_by_row_values (scikit-learn#26173) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * FIX improve convergence criterion for LogisticRegression(penalty="l1", solver='liblinear') (scikit-learn#25214) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * MAINT Fix several typos in src and doc files (scikit-learn#26187) * PERF fix overhead of _rescale_data in LinearRegression (scikit-learn#26207) * ENH add Huber loss (scikit-learn#25966) * MAINT Refactor GraphicalLasso and graphical_lasso (scikit-learn#26033) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Cython linting (scikit-learn#25861) * DOC Add JupyterLite button in example gallery (scikit-learn#25887) * MAINT Parameters validation for sklearn.covariance.ledoit_wolf_shrinkage (scikit-learn#26200) * MAINT Parameters validation for sklearn.datasets.load_linnerud (scikit-learn#26199) * MAINT Parameters validation for sklearn.datasets.load_wine (scikit-learn#26196) * DOC Added redirect to Provost paper + minor refactor (scikit-learn#26223) * MAINT Parameter Validation for `covariance.graphical_lasso` (scikit-learn#25053) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_digits (scikit-learn#26195) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.preprocessing.quantile_transform (scikit-learn#26144) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.model_selection.cross_validate (scikit-learn#26129) Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> * DOC Adds TargetEncoder example explaining the internal CV (scikit-learn#26185) Co-authored-by: Tim Head <betatim@gmail.com> * spelling mistake corrected in documentation for script `plot_document_clustering.py` (scikit-learn#26228) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * FIX possible UnboundLocalError in fetch_openml (scikit-learn#26236) * ENH Adds PyTorch support to LinearDiscriminantAnalysis (scikit-learn#25956) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Tim Head <betatim@gmail.com> * MNT Use fixed version of Pyodide (scikit-learn#26247) * MNT Reset transform_output default in example to fix doc build build (scikit-learn#26269) * DOC Update example plot_nearest_centroid.py (scikit-learn#26263) * MNT reduce JupyterLite build size (scikit-learn#26246) * DOC term -> meth in GradientBoosting (scikit-learn#26225) * MNT speed-up html-noplot build (scikit-learn#26245) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * MNT Use copy=False when creating DataFrames (scikit-learn#26272) * MAINT Parameters validation for sklearn.model_selection.permutation_test_score (scikit-learn#26230) * MAINT Parameters validation for sklearn.datasets.clear_data_home (scikit-learn#26259) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.load_files (scikit-learn#26203) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.datasets.get_data_home (scikit-learn#26260) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Fix y-axis plot labels in permutation test score example (scikit-learn#26240) * MAINT cython-lint ignores asv_benchmarks (scikit-learn#26282) * MAINT Parameter validation for metrics.cluster._supervised (scikit-learn#26258) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Improve docstring for tol in SequentialFeatureSelector (scikit-learn#26271) * MAINT Parameters validation for sklearn.datasets.load_sample_image (scikit-learn#26226) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * DOC Consistent param type for pos_label (scikit-learn#26237) * DOC Minor grammar fix to imputation docs (scikit-learn#26283) * MAINT Parameters validation for sklearn.calibration.calibration_curve (scikit-learn#26198) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.inspection.partial_dependence (scikit-learn#26209) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MAINT Parameters validation for sklearn.model_selection.validation_curve (scikit-learn#26229) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MAINT Parameters validation for sklearn.model_selection.learning_curve (scikit-learn#26227) Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr> * MNT Remove deprecated pandas.api.types.is_sparse (scikit-learn#26287) * CI Use Trusted Publishers for uploading wheels to PyPI (scikit-learn#26249) * MAINT Parameters validation for sklearn.metrics.pairwise.manhattan_distances (scikit-learn#26122) * PERF revert openmp use in csr_row_norms (scikit-learn#26275) * MAINT Parameters validation for metrics.check_scoring (scikit-learn#26041) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> * MNT Improve error message when checking classification target is of a non-regression type (scikit-learn#26281) Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * DOC fix link to User Guide encoder_infrequent_categories (scikit-learn#26309) * MNT remove unused args in _predict_regression_tree_inplace_fast_dense (scikit-learn#26314) * ENH Adds missing value support for trees (scikit-learn#23595) Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> * CLN Clean up logic in validate_data and cast_to_ndarray (scikit-learn#26300) * MAINT refactor scorer using _get_response_values (scikit-learn#26037) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> * DOC Add HGBDT to "see also" section of random forests (scikit-learn#26319) Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Tim Head <betatim@gmail.com> * MNT Bump Github Action labeler version to use newer Node (scikit-learn#26302) * FIX thresholds should not exceed 1.0 with probabilities in `roc_curve` (scikit-learn#26194) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * ENH Allow for appropriate dtype us in `preprocessing.PolynomialFeatures` for sparse matrices (scikit-learn#23731) Co-authored-by: Aleksandr Kokhaniukov <alexander.kohanyukov@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> * DOC Fix minor typo (scikit-learn#26327) * MAINT bump minimum version for pytest (scikit-learn#26184) Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> * DOC fix return type in isotonic_regression (scikit-learn#26332) * FIX fix available_if for MultiOutputRegressor.partial_fit (scikit-learn#26333) Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> * FIX make pipeline pass check_estimator (scikit-learn#26325) * FEA Add multiclass support to `average_precision_score` (scikit-learn#24769) Co-authored-by: Geoffrey <geoffrey.bolmier@gmail.com> Co-authored-by: gbolmier <geoffrey.bolmier@volvocars.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> --------- Signed-off-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Meekail Zain <34613774+Micky774@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: zeeshan lone <56621467+still-learning-ev@users.noreply.github.com> Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr> Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com> Co-authored-by: Shiva chauhan <103742975+Shivachauhan17@users.noreply.github.com> Co-authored-by: AymericBasset <45051041+AymericBasset@users.noreply.github.com> Co-authored-by: Maren Westermann <maren.westermann@gmail.com> Co-authored-by: Nishu Choudhary <51842539+choudharynishu@users.noreply.github.com> Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Loïc Estève <loic.esteve@ymail.com> Co-authored-by: Benedek Harsanyi <80836204+hbenedek@users.noreply.github.com> Co-authored-by: Pooja Subramaniam <poojas2086@gmail.com> Co-authored-by: Rushil Desai <rushildesai01@gmail.com> Co-authored-by: Xiao Yuan <yuanx749@gmail.com> Co-authored-by: Omar Salman <omar.salman@arbisoft.com> Co-authored-by: 2357juan <29247195+2357juan@users.noreply.github.com> Co-authored-by: Théophile Baranger <39696928+tbaranger@users.noreply.github.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Andreas Mueller <t3kcit@gmail.com> Co-authored-by: Jovan Stojanovic <62058944+jovan-stojanovic@users.noreply.github.com> Co-authored-by: Rahil Parikh <75483881+rprkh@users.noreply.github.com> Co-authored-by: Bharat Raghunathan <bharatraghunthan9767@gmail.com> Co-authored-by: Sortofamudkip <wishyutp0328@gmail.com> Co-authored-by: Gleb Levitski <36483986+glevv@users.noreply.github.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Ashwin Mathur <97467100+awinml@users.noreply.github.com> Co-authored-by: Sahil Gupta <sahil@Sahils-MBP.lan> Co-authored-by: Veghit <itay.vegh@gmail.com> Co-authored-by: Itay <itayvegh@gmail.com> Co-authored-by: precondition <57645186+precondition@users.noreply.github.com> Co-authored-by: Marc Torrellas Socastro <marc.torsoc@gmail.com> Co-authored-by: Dominic Fox <dominicjfox2@gmail.com> Co-authored-by: futurewarning <36329275+futurewarning@users.noreply.github.com> Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com> Co-authored-by: Joey Ortiz <orangesherbet0@gmail.com> Co-authored-by: Tim Head <betatim@gmail.com> Co-authored-by: Christian Veenhuis <veenhuis@gmail.com> Co-authored-by: adienes <51664769+adienes@users.noreply.github.com> Co-authored-by: Dave Berenbaum <dave.berenbaum@gmail.com> Co-authored-by: Lene Preuss <lene.preuss@gmail.com> Co-authored-by: A.H.Mansouri <83764851+A-H-Mansoury@users.noreply.github.com> Co-authored-by: Boris Feld <lothiraldan@gmail.com> Co-authored-by: Carla J <ca.jancik@gmail.com> Co-authored-by: windiana42 <61181806+windiana42@users.noreply.github.com> Co-authored-by: mdarii <dariimaxim@gmail.com> Co-authored-by: murezzda <47388020+murezzda@users.noreply.github.com> Co-authored-by: Peter Piontek <piontek0@gmail.com> Co-authored-by: John Pangas <swiftyxswaggy@outlook.com> Co-authored-by: Dmitry Nesterov <76070534+dmitrylala@users.noreply.github.com> Co-authored-by: Yuchen Zhou <72342196+ROMEEZHOU@users.noreply.github.com> Co-authored-by: Ekaterina Butyugina <102963496+ekaterinabutyugina@users.noreply.github.com> Co-authored-by: Jiawei Zhang <jiawei.zhang@nyu.edu> Co-authored-by: Ansam Zedan <86729068+ansamz@users.noreply.github.com> Co-authored-by: genvalen <genvalen@protonmail.com> Co-authored-by: farhan khan <86480450+BabaYaga1221@users.noreply.github.com> Co-authored-by: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Co-authored-by: Jiawei Zhang <jz4721@nyu.edu> Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com> Co-authored-by: Jessicakk0711 <106110789+Jessicakk0711@users.noreply.github.com> Co-authored-by: Ankur Singh <singankur28@gmail.com> Co-authored-by: Seoeun(Sun☀️) Hong <75988952+seoeunHong@users.noreply.github.com> Co-authored-by: Nightwalkx <74856680+xi-jiajun@users.noreply.github.com> Co-authored-by: VIGNESH D <35656793+dvignesh1995@users.noreply.github.com> Co-authored-by: Vincent-violet <130581473+Vincent-violet@users.noreply.github.com> Co-authored-by: Elabonga Atuo <elabongaatuo@gmail.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org> Co-authored-by: André Pedersen <andrped94@gmail.com> Co-authored-by: Ashish Dutt <ashish.dutt8@gmail.com> Co-authored-by: Phil <philsupertramp@users.noreply.github.com> Co-authored-by: Stanislav (Stanley) Modrak <44023416+smith558@users.noreply.github.com> Co-authored-by: hujiahong726 <52920842+hujiahong726@users.noreply.github.com> Co-authored-by: James Dean <24254612+AcylSilane@users.noreply.github.com> Co-authored-by: ArturoAmorQ <arturo.amor-quiroz@polytechnique.edu> Co-authored-by: Aleksandr Kokhaniukov <alexander.kohanyukov@gmail.com> Co-authored-by: c-git <43485962+c-git@users.noreply.github.com> Co-authored-by: annegnx <64203599+annegnx@users.noreply.github.com> Co-authored-by: Geoffrey <geoffrey.bolmier@gmail.com> Co-authored-by: gbolmier <geoffrey.bolmier@volvocars.com>

PERF set openmp to use only physical cores by default

6c7c896

ogrisel added the Performance label Apr 4, 2023

ogrisel requested review from NicolasHug, adrinjalali, thomasjpfan, lorentzenchr and jeremiedbb April 4, 2023 09:49

github-actions bot added module:utils cython labels Apr 4, 2023

jeremiedbb approved these changes Apr 4, 2023

View reviewed changes

DOC document the change

78eca65

ogrisel commented Apr 4, 2023

View reviewed changes

sklearn/utils/_openmp_helpers.pyx Outdated Show resolved Hide resolved

grammar

6326c9d

ogrisel added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Apr 4, 2023

Cache repeated calls to cpu_count.

2451b10

ogrisel mentioned this pull request Apr 4, 2023

Cache repeated calls to cpu_count. ogrisel/scikit-learn#15

Merged

thomasjpfan approved these changes Apr 5, 2023

View reviewed changes

adrinjalali approved these changes Apr 5, 2023

View reviewed changes

Merge pull request #15 from ogrisel/cache-cpu_count

68d86ef

Cache repeated calls to cpu_count.

jeremiedbb approved these changes Apr 5, 2023

View reviewed changes

Merge branch 'main' into openmp-only-physical-cores-true-by-default

a11bea6

ogrisel merged commit 5b46d01 into scikit-learn:main Apr 6, 2023

ogrisel deleted the openmp-only-physical-cores-true-by-default branch April 6, 2023 08:28

This was referenced Apr 6, 2023

PERF: models with multithreading being slower than the same model with a single thread #25822

Closed

Multicore scalability of the Histogram-based GBDT #14306

Open

Veghit pushed a commit to Veghit/scikit-learn that referenced this pull request Apr 15, 2023

PERF set openmp to use only physical cores by default (scikit-learn#2…

97b11d7

…6082)

ogrisel removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Jun 21, 2023

ogrisel mentioned this pull request Jun 22, 2023

HistGradientBoosting Oversubscription UMEssen/BOA-Contrast#2

Closed

ogrisel mentioned this pull request Jul 3, 2023

HistGradientBoostingRegressor is slower when torch not imported #26752

Closed

ogrisel mentioned this pull request Jan 17, 2025

HistGradientBoostingClassifier/Regressor 15x slowdown on small data problems compared to disabled OpenMP threading #30662

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF set openmp to use only physical cores by default #26082

PERF set openmp to use only physical cores by default #26082

ogrisel commented Apr 4, 2023 •

edited

Loading

ogrisel commented Apr 4, 2023 •

edited

Loading

jeremiedbb commented Apr 4, 2023

jeremiedbb left a comment

ogrisel commented Apr 4, 2023

ogrisel commented Apr 4, 2023

ogrisel commented Apr 4, 2023 •

edited

Loading

adrinjalali commented Apr 4, 2023

ogrisel commented Apr 4, 2023 •

edited

Loading

adrinjalali commented Apr 4, 2023

jeremiedbb commented Apr 5, 2023 •

edited

Loading

ogrisel commented Apr 5, 2023

ogrisel commented Apr 5, 2023 •

edited

Loading

jeremiedbb commented Apr 5, 2023

jeremiedbb commented Apr 5, 2023

ogrisel commented Apr 5, 2023

ogrisel commented Apr 5, 2023

ogrisel commented Apr 5, 2023

ogrisel commented Apr 5, 2023

jeremiedbb commented Apr 5, 2023 •

edited

Loading

ogrisel commented Apr 5, 2023

jeremiedbb commented Apr 5, 2023 •

edited

Loading

jeremiedbb commented Apr 5, 2023

thomasjpfan left a comment •

edited

Loading

jeremiedbb commented Apr 5, 2023

ogrisel commented Apr 5, 2023 •

edited

Loading

ogrisel commented Apr 5, 2023

jeremiedbb commented Apr 5, 2023

ogrisel commented Apr 5, 2023 •

edited

Loading

ogrisel commented Apr 5, 2023

ogrisel commented Apr 6, 2023

PERF set openmp to use only physical cores by default #26082

PERF set openmp to use only physical cores by default #26082

Conversation

ogrisel commented Apr 4, 2023 • edited Loading

ogrisel commented Apr 4, 2023 • edited Loading

jeremiedbb commented Apr 4, 2023

jeremiedbb left a comment

Choose a reason for hiding this comment

ogrisel commented Apr 4, 2023

ogrisel commented Apr 4, 2023

ogrisel commented Apr 4, 2023 • edited Loading

adrinjalali commented Apr 4, 2023

ogrisel commented Apr 4, 2023 • edited Loading

adrinjalali commented Apr 4, 2023

jeremiedbb commented Apr 5, 2023 • edited Loading

ogrisel commented Apr 5, 2023

ogrisel commented Apr 5, 2023 • edited Loading

jeremiedbb commented Apr 5, 2023

jeremiedbb commented Apr 5, 2023

ogrisel commented Apr 5, 2023

ogrisel commented Apr 5, 2023

ogrisel commented Apr 5, 2023

ogrisel commented Apr 5, 2023

jeremiedbb commented Apr 5, 2023 • edited Loading

ogrisel commented Apr 5, 2023

jeremiedbb commented Apr 5, 2023 • edited Loading

jeremiedbb commented Apr 5, 2023

thomasjpfan left a comment • edited Loading

Choose a reason for hiding this comment

jeremiedbb commented Apr 5, 2023

ogrisel commented Apr 5, 2023 • edited Loading

ogrisel commented Apr 5, 2023

jeremiedbb commented Apr 5, 2023

ogrisel commented Apr 5, 2023 • edited Loading

ogrisel commented Apr 5, 2023

ogrisel commented Apr 6, 2023

ogrisel commented Apr 4, 2023 •

edited

Loading

ogrisel commented Apr 4, 2023 •

edited

Loading

ogrisel commented Apr 4, 2023 •

edited

Loading

ogrisel commented Apr 4, 2023 •

edited

Loading

jeremiedbb commented Apr 5, 2023 •

edited

Loading

ogrisel commented Apr 5, 2023 •

edited

Loading

jeremiedbb commented Apr 5, 2023 •

edited

Loading

jeremiedbb commented Apr 5, 2023 •

edited

Loading

thomasjpfan left a comment •

edited

Loading

ogrisel commented Apr 5, 2023 •

edited

Loading

ogrisel commented Apr 5, 2023 •

edited

Loading