Skip to content

TST use global_dtype in sklearn/manifold/tests/test_isomap.py #22673

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Nov 29, 2022

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Mar 3, 2022

Reference Issues/PRs

Partially addresses #22881
Precedes #22590

What does this implement/fix? Explain your changes.

This parametrizes tests from test_isomap.py to run on 32bit datasets.

Any other comments?

We could introduce a mechanism to be able to able to remove tests' execution on 32bit datasets if this takes too much time to complete.

@jjerphan jjerphan marked this pull request as ready for review March 3, 2022 16:26
@jjerphan jjerphan changed the title TST Adapt test_isomap.py to test implementations on 32bit datasets TST use global_dtype in sklearn/manifold/tests/test_isomap.py Mar 24, 2022
@jeremiedbb
Copy link
Member

jeremiedbb commented Jun 8, 2022

According to some irl discussions, such tests should only be added after Isomap preserves float32 (ref #11000).

Let's keep this PR open in the mean time.

@jjerphan jjerphan added Waiting for Reviewer Waiting for Second Reviewer First reviewer is done, need a second one! and removed Waiting for Reviewer labels Oct 21, 2022
@glemaitre glemaitre self-requested a review November 15, 2022 13:50
Comment on lines 24 to 25
# TODO: Isomap fit/fit_transform must preserve dtype

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can now remove this comment

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done as part of 1d72c03.

jjerphan and others added 2 commits November 16, 2022 08:49
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
@jjerphan jjerphan removed the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 16, 2022
@jjerphan
Copy link
Member Author

Hum, I do not know why some like (mainly test_different_metric) became unstable for py38_conda_defaults_openblas solely. 🤔

@glemaitre
Copy link
Member

And the differences are huge.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failures only show up in one job because we only enable float32 in this job :)

It turns out that Isomap don't really preserve float32. I found that transform doesn't. The common test only checks fit_transform :( I'll open a PR to extend the common test.

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
@glemaitre
Copy link
Member

Upgrading the branch since we merged the fit.transform check from #24982

@jeremiedbb
Copy link
Member

There's 1 remaining failing test: test_isomap_reconstruction_error.

After investigation, it seems that it comes from the KernelCenterer which is used to compute the reconstruction error. When we compute it from float32 data or float64 data (that are close to some tolerance) the results are not always close to a reasonable tolerance. I think it's due to some kind of catastrophic cancellation. Indeed it's supposed to compute (X-Xmean)(X-Xmean).T without computing the centering explicitly.

I'm not sure what to do here. We could:

  • make KernelCenterer more numerically stable on float32 by up-casting (maybe by chunks). It will make it a bit slower though, but we did it for euclidean_distances.
  • don't run this test on float32 and add a comment explaining why.
  • still run this test on float32 but with a very high tol and add a comment to explain why.
  • something else ?

@glemaitre
Copy link
Member

I think that this is a good candidate for skipping the test in 32 bits.

@jeremiedbb
Copy link
Member

I think that this is a good candidate for skipping the test in 32 bits.

I agree. It only concerns the computation of the reconstruction error. The KernelCenterer is probably not widely used and thus not worth the added complexity and maintenance burden.

@jjerphan, let's just not enable global_dtype for test_isomap_reconstruction_error but add a comment instead explaining that we don't run this test on float32 due to numerical instabilities from the KernelCenterer used in the reconstruction_error method.

Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb jeremiedbb merged commit 35b826a into scikit-learn:main Nov 29, 2022
@jjerphan jjerphan deleted the tst/test_isomap-32bit branch November 29, 2022 14:25
@jjerphan jjerphan restored the tst/test_isomap-32bit branch December 2, 2022 15:09
@jjerphan jjerphan deleted the tst/test_isomap-32bit branch December 2, 2022 15:09
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…-learn#22673)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 20, 2023
…-learn#22673)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
jjerphan added a commit to jjerphan/scikit-learn that referenced this pull request Jan 23, 2023
…-learn#22673)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
adrinjalali pushed a commit that referenced this pull request Jan 24, 2023
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants