Skip to content

Performance regression in pairwise_distances with the Euclidean metric on sparse data #26097

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ogrisel opened this issue Apr 5, 2023 · 7 comments · Fixed by #26275
Closed

Comments

@ogrisel
Copy link
Member

ogrisel commented Apr 5, 2023

As spotted by our continuous benchmark suite, there is a more than 2x slowdown in pairwise_distances on sparse input data (for the Euclidean metric).

This happened between b4afbee and fabe160.

Could it be ef5c087?

It's not sure because there are many other commits in git log b4afbeee..fabe1606 and I did not review them all.

@github-actions github-actions bot added the Needs Triage Issue requires triage label Apr 5, 2023
@ogrisel ogrisel added Performance Regression and removed Needs Triage Issue requires triage labels Apr 5, 2023
@jeremiedbb
Copy link
Member

I guess it's the same reason as for KMeans.transform explained here #26100.

@glemaitre
Copy link
Member

I cannot reproduce the slowdown between 1.2.1 and main:

from benchmarks.common import Benchmark
from benchmarks.metrics import PairwiseDistancesBenchmark

Benchmark.data_size = "large"

params = ("sparse", "euclidean", 4)
bench = PairwiseDistancesBenchmark()
bench.setup(*params)


def time_pairwise_distances():
    bench.time_pairwise_distances()


if __name__ == "__main__":
    time_pairwise_distances()

And here are the two profiles for 1.2.1 and main, respectively. Time-wise, it takes the same time.

profile_1_2_1

profile_main

@jeremiedbb
Copy link
Member

After some digging, I'm not so sure that it's due to the validation of gen_batches. The dimensions of the dataset used in the benchmark is such that gen_batches is call ~10 times. With ~50us per call, it should not be visible at all in a 2s run.

@jeremiedbb
Copy link
Member

I finally found the commit responsible for the regression: it's de67a44 from PR #25598

I ran the benchmarks for this commit and the commit just before which confirms that: https://scikit-learn.org/scikit-learn-benchmarks/#metrics.PairwiseDistancesBenchmark.time_pairwise_distances?p-representation='sparse'&p-metric='euclidean'&p-n_jobs=4&commits=de67a442

The reason for the regression is another occurence of OpenMathLib/OpenBLAS#3187. We're running row_norms (which now uses openmp) and matmul (which uses BLAS) sequentially in a loop. It happens when BLAS and openmp don't share the same threadpool, i.e. when the BLAS uses pthread (e.g. OpenBLAS from PyPI) or a different openmp (e.g. MKL uses intel-openmp).

@glemaitre, the reason you couldn't reproduce is probably because on macos your install is such that the BLAS and scikit-learn use the same openmp lib.

Unfortunately I don't know any easy fix for that besides reverting this commit.

@ogrisel
Copy link
Member Author

ogrisel commented Apr 21, 2023

Thanks for the investigation. Indeed i don't see any short term solution beyond advising our users to install packages from conda-forge. In the longer term we will need to come back the possibility to ship numpy/scipy wheels that somehow rely on an openblas linked to a specific vendored openmp for each supported OS.

@thomasjpfan
Copy link
Member

I am +1 on reverting for now.

Moving forward, we'll need to be more careful when merging PRs that add OpenMP. Currently the best way to catch the issue is to run the benchmarks with a SciPy installed on PyPI.

@jeremiedbb
Copy link
Member

An alternative is to make row_norms able to use a float64 accumulator. This way we don't have to call row_norms for each chunk in _euclidean_distances_upcast. I created a branch to try that, but since row_norms is public and used in many places it requires more work. So I think we should revert for now (I opened #26275) and possibly come back later on this alternative.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants