Skip to content

TSNE performance regression in 1.5 #29665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gagandeep987123 opened this issue Aug 13, 2024 · 8 comments · Fixed by #29694
Closed

TSNE performance regression in 1.5 #29665

gagandeep987123 opened this issue Aug 13, 2024 · 8 comments · Fixed by #29694

Comments

@gagandeep987123
Copy link

Describe the bug

The performance of TSNE transformation reduces when using n_jobs as 25 for the newer version w.r.t. 1.3.1.
version 1.3.1

df = np.random.rand(30000, 3)
tsne = TSNE(n_components=2, random_state=42, n_jobs=25, verbose=10, n_iter=1500)

1.5.1

df = np.random.rand(30000,3)
tsne = TSNE(n_components=2, random_state=42, n_jobs=25, verbose=10,max_iter=1500)

Time 1.3.1 vs 1.5.1 :: 59 vs 223

Is this a intended behavior?

Steps/Code to Reproduce

df = np.random.rand(30000, 3)
tsne = TSNE(n_components=2, random_state=42, n_jobs=25, verbose=10, n_iter=1500)

1.5.1

df = np.random.rand(30000,3)
tsne = TSNE(n_components=2, random_state=42, n_jobs=25, verbose=10,max_iter=1500)

Expected Results

Minimal time discrepancy

Actual Results

Similar time

Versions

1.5.1

System:
    python: 3.12.3 (main, Jul 31 2024, 17:43:48) [GCC 13.2.0]
executable: /home/gagan/PycharmProjects/scikit_tsne_test/.venv/bin/python
   machine: Linux-6.8.0-40-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.5.1
          pip: 23.2.1
   setuptools: 72.2.0
        numpy: 1.26.4
        scipy: 1.14.0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 28
         prefix: libscipy_openblas
       filepath: /home/gagan/PycharmProjects/scikit_tsne_test/.venv/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-c128ec02.so
        version: 0.3.27.dev
threading_layer: pthreads
   architecture: Haswell


1.3.1
System:
    python: 3.12.3 (main, Jul 31 2024, 17:43:48) [GCC 13.2.0]
executable: /home/gagan/PycharmProjects/scikit_tsne_test/.venv/bin/python
   machine: Linux-6.8.0-40-generic-x86_64-with-glibc2.39

Python dependencies:
      sklearn: 1.3.1
          pip: 23.2.1
   setuptools: 72.2.0
        numpy: 1.26.4
        scipy: 1.14.0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 28
         prefix: libopenblas
       filepath: /home/gagan/PycharmProjects/scikit_tsne_test/.venv/lib/python3.12/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Prescott

       user_api: openmp
   internal_api: openmp
    num_threads: 28
         prefix: libgomp
       filepath: /home/gagan/PycharmProjects/scikit_tsne_test/.venv/lib/python3.12/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 28
         prefix: libscipy_openblas
       filepath: /home/gagan/PycharmProjects/scikit_tsne_test/.venv/lib/python3.12/site-packages/scipy.libs/libscipy_openblas-c128ec02.so
        version: 0.3.27.dev
threading_layer: pthreads
   architecture: Haswell
@gagandeep987123 gagandeep987123 added Bug Needs Triage Issue requires triage labels Aug 13, 2024
@adrinjalali
Copy link
Member

Thanks for the report, I confirm that I can reproduce with:

df = np.random.rand(3000, 3)
%timeit tsne = TSNE(n_components=2, random_state=42, n_jobs=7, verbose=10, n_iter=1500).fit(df)

On 1.3.1:

6.15 s ± 82.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

while on main:

13.2 s ± 160 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Sample run on 1.3.1:

[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Indexed 3000 samples in 0.002s...
[t-SNE] Computed neighbors for 3000 samples in 0.048s...
[t-SNE] Computed conditional probabilities for sample 1000 / 3000
[t-SNE] Computed conditional probabilities for sample 2000 / 3000
[t-SNE] Computed conditional probabilities for sample 3000 / 3000
[t-SNE] Mean sigma: 0.079455
[t-SNE] Computed conditional probabilities in 0.048s
[t-SNE] Iteration 50: error = 74.6297531, gradient norm = 0.0126486 (50 iterations in 0.229s)
[t-SNE] Iteration 100: error = 73.1009979, gradient norm = 0.0013591 (50 iterations in 0.222s)
[t-SNE] Iteration 150: error = 73.0318298, gradient norm = 0.0015761 (50 iterations in 0.215s)
[t-SNE] Iteration 200: error = 72.9756241, gradient norm = 0.0007988 (50 iterations in 0.218s)
[t-SNE] Iteration 250: error = 72.9535065, gradient norm = 0.0003498 (50 iterations in 0.215s)
[t-SNE] KL divergence after 250 iterations with early exaggeration: 72.953506
[t-SNE] Iteration 300: error = 1.9057550, gradient norm = 0.0224387 (50 iterations in 0.216s)
[t-SNE] Iteration 350: error = 1.4685167, gradient norm = 0.0172649 (50 iterations in 0.213s)
[t-SNE] Iteration 400: error = 1.2980621, gradient norm = 0.0145089 (50 iterations in 0.199s)
[t-SNE] Iteration 450: error = 1.2117977, gradient norm = 0.0123560 (50 iterations in 0.198s)
[t-SNE] Iteration 500: error = 1.1618414, gradient norm = 0.0108389 (50 iterations in 0.206s)
[t-SNE] Iteration 550: error = 1.1303933, gradient norm = 0.0094740 (50 iterations in 0.200s)
[t-SNE] Iteration 600: error = 1.1095150, gradient norm = 0.0082941 (50 iterations in 0.194s)
[t-SNE] Iteration 650: error = 1.0954374, gradient norm = 0.0072256 (50 iterations in 0.230s)
[t-SNE] Iteration 700: error = 1.0856805, gradient norm = 0.0059221 (50 iterations in 0.193s)
[t-SNE] Iteration 750: error = 1.0789994, gradient norm = 0.0049726 (50 iterations in 0.217s)
[t-SNE] Iteration 800: error = 1.0746853, gradient norm = 0.0037317 (50 iterations in 0.206s)
[t-SNE] Iteration 850: error = 1.0717297, gradient norm = 0.0026455 (50 iterations in 0.197s)
[t-SNE] Iteration 900: error = 1.0694184, gradient norm = 0.0025572 (50 iterations in 0.191s)
[t-SNE] Iteration 950: error = 1.0672817, gradient norm = 0.0023106 (50 iterations in 0.195s)
[t-SNE] Iteration 1000: error = 1.0657473, gradient norm = 0.0017981 (50 iterations in 0.202s)
[t-SNE] Iteration 1050: error = 1.0644907, gradient norm = 0.0015727 (50 iterations in 0.197s)
[t-SNE] Iteration 1100: error = 1.0635033, gradient norm = 0.0014580 (50 iterations in 0.202s)
[t-SNE] Iteration 1150: error = 1.0625070, gradient norm = 0.0012920 (50 iterations in 0.195s)
[t-SNE] Iteration 1200: error = 1.0616151, gradient norm = 0.0012439 (50 iterations in 0.192s)
[t-SNE] Iteration 1250: error = 1.0609202, gradient norm = 0.0011146 (50 iterations in 0.194s)
[t-SNE] Iteration 1300: error = 1.0601972, gradient norm = 0.0011908 (50 iterations in 0.197s)
[t-SNE] Iteration 1350: error = 1.0595425, gradient norm = 0.0010888 (50 iterations in 0.200s)
[t-SNE] Iteration 1400: error = 1.0588894, gradient norm = 0.0009405 (50 iterations in 0.193s)
[t-SNE] Iteration 1450: error = 1.0583086, gradient norm = 0.0010328 (50 iterations in 0.190s)
[t-SNE] Iteration 1500: error = 1.0576378, gradient norm = 0.0010928 (50 iterations in 0.196s)
[t-SNE] KL divergence after 1500 iterations: 1.057638

Sample run on main:

[t-SNE] Computing 91 nearest neighbors...
[t-SNE] Indexed 3000 samples in 0.001s...
[t-SNE] Computed neighbors for 3000 samples in 0.039s...
[t-SNE] Computed conditional probabilities for sample 1000 / 3000
[t-SNE] Computed conditional probabilities for sample 2000 / 3000
[t-SNE] Computed conditional probabilities for sample 3000 / 3000
[t-SNE] Mean sigma: 0.079133
[t-SNE] Computed conditional probabilities in 0.042s
[t-SNE] Iteration 50: error = 74.6860580, gradient norm = 0.0130361 (50 iterations in 0.561s)
[t-SNE] Iteration 100: error = 73.0622330, gradient norm = 0.0016993 (50 iterations in 0.458s)
[t-SNE] Iteration 150: error = 73.0010300, gradient norm = 0.0008503 (50 iterations in 0.440s)
[t-SNE] Iteration 200: error = 72.9877853, gradient norm = 0.0004468 (50 iterations in 0.447s)
[t-SNE] Iteration 250: error = 72.9845200, gradient norm = 0.0002729 (50 iterations in 0.439s)
[t-SNE] KL divergence after 250 iterations with early exaggeration: 72.984520
[t-SNE] Iteration 300: error = 1.9100612, gradient norm = 0.0219436 (50 iterations in 0.426s)
[t-SNE] Iteration 350: error = 1.4687014, gradient norm = 0.0174686 (50 iterations in 0.410s)
[t-SNE] Iteration 400: error = 1.2948278, gradient norm = 0.0145004 (50 iterations in 0.417s)
[t-SNE] Iteration 450: error = 1.2071151, gradient norm = 0.0125211 (50 iterations in 0.420s)
[t-SNE] Iteration 500: error = 1.1559333, gradient norm = 0.0109722 (50 iterations in 0.416s)
[t-SNE] Iteration 550: error = 1.1242188, gradient norm = 0.0094801 (50 iterations in 0.434s)
[t-SNE] Iteration 600: error = 1.1032512, gradient norm = 0.0085060 (50 iterations in 0.447s)
[t-SNE] Iteration 650: error = 1.0888082, gradient norm = 0.0075305 (50 iterations in 0.433s)
[t-SNE] Iteration 700: error = 1.0787106, gradient norm = 0.0061661 (50 iterations in 0.439s)
[t-SNE] Iteration 750: error = 1.0722656, gradient norm = 0.0045898 (50 iterations in 0.433s)
[t-SNE] Iteration 800: error = 1.0677128, gradient norm = 0.0040846 (50 iterations in 0.434s)
[t-SNE] Iteration 850: error = 1.0642205, gradient norm = 0.0034133 (50 iterations in 0.429s)
[t-SNE] Iteration 900: error = 1.0615629, gradient norm = 0.0030121 (50 iterations in 0.427s)
[t-SNE] Iteration 950: error = 1.0593433, gradient norm = 0.0026399 (50 iterations in 0.430s)
[t-SNE] Iteration 1000: error = 1.0574670, gradient norm = 0.0022615 (50 iterations in 0.438s)
[t-SNE] Iteration 1050: error = 1.0560901, gradient norm = 0.0018045 (50 iterations in 0.433s)
[t-SNE] Iteration 1100: error = 1.0549316, gradient norm = 0.0016540 (50 iterations in 0.427s)
[t-SNE] Iteration 1150: error = 1.0540546, gradient norm = 0.0012580 (50 iterations in 0.447s)
[t-SNE] Iteration 1200: error = 1.0533767, gradient norm = 0.0010741 (50 iterations in 0.434s)
[t-SNE] Iteration 1250: error = 1.0526401, gradient norm = 0.0011667 (50 iterations in 0.433s)
[t-SNE] Iteration 1300: error = 1.0518836, gradient norm = 0.0012524 (50 iterations in 0.424s)
[t-SNE] Iteration 1350: error = 1.0511994, gradient norm = 0.0011868 (50 iterations in 0.434s)
[t-SNE] Iteration 1400: error = 1.0507367, gradient norm = 0.0009460 (50 iterations in 0.430s)
[t-SNE] Iteration 1450: error = 1.0502670, gradient norm = 0.0008885 (50 iterations in 0.426s)
[t-SNE] Iteration 1500: error = 1.0498598, gradient norm = 0.0009467 (50 iterations in 0.432s)
[t-SNE] KL divergence after 1500 iterations: 1.049860
13.2 s ± 160 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

This requires more investigation to figure out where the performance hit is coming from.

@adrinjalali adrinjalali added Performance Regression and removed Bug Needs Triage Issue requires triage labels Aug 14, 2024
@adrinjalali
Copy link
Member

Looking at the git blame and our related PRs, I'm struggling to find the relevant change causing this regression.

cc @scikit-learn/core-devs for help.

@jjerphan
Copy link
Member

I would recommend using git-bisect.

@adrinjalali
Copy link
Member

adrinjalali commented Aug 14, 2024

Something did NOT go well down that git bisect path 😆

$ git bisect good                                                                               
0e19a4822ff49951d2a7606444a1a6085c32b56b is the first bad commit
commit 0e19a4822ff49951d2a7606444a1a6085c32b56b
Author: Lucy Liu <jliu176@gmail.com>
Date:   Tue Apr 2 23:07:24 2024 +1100

    DOC Fix typo `LogisticRegressionCV` docstring (#28746)

 sklearn/linear_model/_logistic.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

I'll try again

EDIT: pretty sure it's because I didn't handle meson/setuptools switch well while bisecting.

@jjerphan
Copy link
Member

Might it be due to the used of meson for the build system with flags being now different? We have seen this in the past in another issue.

@adrinjalali
Copy link
Member

Yep. Compiling the 1.5.1 tag with python setup.py develop runs TSNE fast (11-12s), while the same tag with meson takes about 20s.

cc @lesteve

@lesteve
Copy link
Member

lesteve commented Aug 19, 2024

After looking at this a bit, it seems like this is maybe linked to OpenMP, it seems like we need to add the openmp_dep to all the extensions that need it, for example for TSNE, this seems to improve the situation (need to look a bit more at it to be 100% sure this fixes the issue):

diff --git a/sklearn/manifold/meson.build b/sklearn/manifold/meson.build
index b112f63dd4..ee83e8afc5 100644
--- a/sklearn/manifold/meson.build
+++ b/sklearn/manifold/meson.build
@@ -9,7 +9,7 @@ py.extension_module(
 py.extension_module(
   '_barnes_hut_tsne',
   '_barnes_hut_tsne.pyx',
-  dependencies: [np_dep],
+  dependencies: [np_dep, openmp_dep],
   cython_args: cython_args,
   subdir: 'sklearn/manifold',
   install: true

My guess right now, is that the setuptools OpenMP flags were added everywhere globally, whereas for meson they need to be added explictly to extension modules that need it.

@lesteve
Copy link
Member

lesteve commented Aug 20, 2024

I have opened #29694 which I think fixes this issue.

@lesteve lesteve changed the title TSNE efficiency 1.3.1 vs 1.5.1 when using n_jobs TSNE performance regression in 1.5 Aug 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants