ENH Preserving dtypes for ICA #22806

JihaneBennis · 2022-03-12T16:58:33Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Preserving the type of X in the output (float32 -> float32)

Any other comments?

#pariswimlds

jeremiedbb · 2022-03-12T17:59:03Z

@JihaneBennis Thanks for the PR. Can you add test in test_fastica that checks that the attributes components_, mixing_, mean_ and whitening_ have the same dtype as the input ?

ogrisel

LGTM, thanks indeed for the PR and the detective work to identify the culprit line.

A new test for checking the dtypes of specific fitted attributes of the model would indeed be welcome as @jeremiedbb suggested above.

sklearn/decomposition/_fastica.py

ogrisel · 2022-03-18T17:10:03Z

I added the two unittests @jeremiedbb wanted. I will follow-up with more tests that leverage our new fixtures. I think they have detected a numerical stability problem with float32 data.

sklearn/decomposition/tests/test_fastica.py

…ical stability

jeremiedbb · 2022-03-18T17:22:53Z

Please also add a what's new entry

ogrisel · 2022-03-18T17:24:44Z

sklearn/decomposition/tests/test_fastica.py

-    rng = np.random.RandomState(0)
+    # XXX: casting X with `.astype(global_dtype)` reveals a rare numerical
+    # stability problem of FastICA with float32 bit data. I can triggers
+    # NaN values somewhere in the computation, only for some random seeds.


Note to reviewers: I did not enable global_dtype on this test because that would make it fail for some random seeds by revealing a numerical stability problem of our fast ICA implementation on 32 bit data.

I am investigating a bit, and this can happen for instance by setting SKLEARN_TESTS_GLOBAL_RANDOM_SEED=60 and float32.

But I noticed that the model does not converge in float64 for that same seed while it generally does converge for other seeds.

I fixed the NaN problem in 4464f9d as explained in #22806 (comment).

ogrisel · 2022-03-18T17:26:33Z

sklearn/decomposition/tests/test_fastica.py

+
+    # XXX: furthermore, even when we do not hit the NaN issue, the
+    # assert_allclose can only pass with rtol values significantly larger than
+    # the usual 1e-4 we expect for float32 data.


Similar comment here. The nans can appear at fit time but even when it's not the case, rtol would need to be large (larger than 1e-1) for some seeds.

I think we can safely say that our implementation is not numerically stable enough to work with float32 data without upcasting at this point.

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

ogrisel · 2022-03-21T19:04:48Z

sklearn/decomposition/_fastica.py

@@ -54,6 +54,10 @@ def _sym_decorrelation(W):
    i.e. W <- (W * W.T) ^{-1/2} * W
    """
    s, u = linalg.eigh(np.dot(W, W.T))
+    # avoid sqrt of negative values or division by zero because of rounding
+    # errors:
+    s = np.clip(s, a_min=np.finfo(W.dtype).eps, a_max=None)


By extending the existing tests to use the global_dtype and the global_random_seed fixtures, I found out that this part of the code could break quite easily with negative eigen values.

This fixes it and makes the test much more stable. However I still found fishy things that I marked with XXX comments in the diff of this PR.

ogrisel · 2022-03-21T19:07:20Z

sklearn/decomposition/tests/test_fastica.py

+            warnings.simplefilter("error", RuntimeWarning)
+            # XXX: for some seeds, the model do not converge. However this is not
+            # what we test here.
+            warnings.simplefilter("ignore", ConvergenceWarning)


I have tried to set max_iter to very large values (e.g. 10_000) and for some values of global_random_seed this will never pass. I think the problem is not float32 specific though. It should be investigated in a dedicated PR.

ogrisel · 2022-03-21T19:12:22Z

Please also add a what's new entry

I am not sure this PR is mergeable in its current state. While it does fix the most serious numerical stability problem found with the new fixtures, the remaining XXX comments still highlight issues that we want to investigate first.

To move forward we probably need to use the global_random_seed fixture in new branch off of main (with float64 data only) to investigate the problem with the ConvergenceWarning in particular.

ogrisel · 2022-03-22T12:53:05Z

A macOS build was stalled in the previous commit:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=39923&view=logs&j=97641769-79fb-5590-9088-a30ce9b850b9&t=4745baa1-36b5-56c8-9a8e-6480742db1a6

I pushed a new commit to see if it can be reproduced.

ogrisel · 2022-03-22T14:00:18Z

sklearn/decomposition/tests/test_fastica.py

-        assert_array_almost_equal(Xt, Xt2)
+        # XXX: we have to set atol for this test to pass for all seeds when
+        # fitting with float32 data. Is this is a revealing a bug?
+        assert_allclose(Xt, Xt2, atol=1e-6 if global_dtype == np.float32 else 0.0)


I noticed that if we use Student-t distributed data instead of uniformly distributed data in this test, then it's possible to make the test pass with rtol=1e-3 (but not rtol=1e-4) and with a strict atol=0.0.

Also, I have check that there is not exact zeros in Xt2 when assertion with the default tols fail (both with uniform and Student-t data).

I do not think it's a bug in FastICA, but numerical consideration based the generated datasets.

X values are floats in [0, 1]. If we change them to have a larger range (e.g. to be in [0, 100], for instance:

diff --git a/sklearn/decomposition/tests/test_fastica.py b/sklearn/decomposition/tests/test_fastica.py index f352a126f1..38f87caa06 100644 --- a/sklearn/decomposition/tests/test_fastica.py +++ b/sklearn/decomposition/tests/test_fastica.py @@ -321,7 +321,7 @@ def test_inverse_transform( # Test FastICA.inverse_transform n_samples = 100 rng = np.random.RandomState(global_random_seed) - X = rng.random_sample((n_samples, 10)).astype(global_dtype) + X = 100 * rng.random_sample((n_samples, 10)).astype(global_dtype) ica = FastICA(n_components=n_components, random_state=rng, whiten=whiten) with warnings.catch_warnings(): @@ -338,7 +338,7 @@ def test_inverse_transform( if n_components == X.shape[1]: # XXX: we have to set atol for this test to pass for all seeds when # fitting with float32 data. Is this is a revealing a bug? - assert_allclose(X, X2, atol=1e-6 if global_dtype == np.float32 else 0.0) + assert_allclose(X, X2) # FIXME remove filter in 1.3

then the composition does not suffer from numerical error, and hence all tests pass with:

SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" SKLEARN_RUN_FLOAT32_TESTS=1 pytest sklearn/decomposition/tests/test_fastica.py -k test_fit_transform

To me, this case shows that atol really should depend on the way data is generated. Personally, I do not want to get done this rabbit hole.

To me, this case shows that atol really should depend on the way data is generated. Personally, I do not want to get done this rabbit hole.

atol does depend on the data ! it basically means "I consider elements less than atol to be zero". If you have data with an average magnitude of 1, you might want set atol to 1e-5 or something like that. If you have data with an average magnitude of 1e-12 (m.e.g data for instance), then you want to set atol to 1e-17 or so.

ogrisel · 2022-03-22T18:12:14Z

A macOS build was stalled in the previous commit:

Apparently this is a non-reproducible event.

jjerphan

Some comments on those numerical edges cases.

jjerphan · 2022-03-23T09:55:20Z

sklearn/decomposition/tests/test_fastica.py

-        assert_array_almost_equal(Xt, Xt2)
+        # XXX: we have to set atol for this test to pass for all seeds when
+        # fitting with float32 data. Is this is a revealing a bug?
+        assert_allclose(Xt, Xt2, atol=1e-6 if global_dtype == np.float32 else 0.0)


I do not think it's a bug in FastICA, but numerical consideration based the generated datasets.

X values are floats in [0, 1]. If we change them to have a larger range (e.g. to be in [0, 100], for instance:

diff --git a/sklearn/decomposition/tests/test_fastica.py b/sklearn/decomposition/tests/test_fastica.py index f352a126f1..38f87caa06 100644 --- a/sklearn/decomposition/tests/test_fastica.py +++ b/sklearn/decomposition/tests/test_fastica.py @@ -321,7 +321,7 @@ def test_inverse_transform( # Test FastICA.inverse_transform n_samples = 100 rng = np.random.RandomState(global_random_seed) - X = rng.random_sample((n_samples, 10)).astype(global_dtype) + X = 100 * rng.random_sample((n_samples, 10)).astype(global_dtype) ica = FastICA(n_components=n_components, random_state=rng, whiten=whiten) with warnings.catch_warnings(): @@ -338,7 +338,7 @@ def test_inverse_transform( if n_components == X.shape[1]: # XXX: we have to set atol for this test to pass for all seeds when # fitting with float32 data. Is this is a revealing a bug? - assert_allclose(X, X2, atol=1e-6 if global_dtype == np.float32 else 0.0) + assert_allclose(X, X2) # FIXME remove filter in 1.3

then the composition does not suffer from numerical error, and hence all tests pass with:

SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" SKLEARN_RUN_FLOAT32_TESTS=1 pytest sklearn/decomposition/tests/test_fastica.py -k test_fit_transform

To me, this case shows that atol really should depend on the way data is generated. Personally, I do not want to get done this rabbit hole.

sklearn/decomposition/tests/test_fastica.py

sklearn/decomposition/_fastica.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

… only tested for float64 and float32

ogrisel · 2022-03-23T10:32:27Z

sklearn/decomposition/tests/test_fastica.py

+            atol = np.abs(Xt2).mean() / 1e6
+        else:
+            atol = 0.0  # the default rtol is enough for float64 data
+        assert_allclose(Xt, Xt2, atol=atol)


atol is not set based on the magnitude of Xt2 to make this test more robust to scale changes.

sklearn/decomposition/tests/test_fastica.py

ogrisel · 2022-03-23T10:34:45Z

Ok I think this PR is a good enough shape. Numerical stability of this method is probably not optimal but not catastrophic either so I think it's good to enable float32 dtype preservation.

…t_subnormal

ogrisel · 2022-03-23T12:57:33Z

I think this is ready for review.

jjerphan

LGTM.

Should test_non_square_fastica be also parametrised with the new fixtures?

doc/whats_new/v1.1.rst

sklearn/decomposition/tests/test_fastica.py

sklearn/decomposition/_fastica.py

sklearn/decomposition/tests/test_fastica.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

ogrisel · 2022-03-24T15:54:37Z

@jeremiedbb any opinion on this PR? My +1 does not really count anymore because I contributed too much code to this PR :)

jeremiedbb

not the most enthusiastic approve but okay :)

jeremiedbb · 2022-03-25T13:27:30Z

sklearn/decomposition/_fastica.py

    i.e. W <- (W * W.T) ^{-1/2} * W
    """
    s, u = linalg.eigh(np.dot(W, W.T))
+    # Avoid sqrt of negative values because of rounding errors. Note that
+    # np.sqrt(tiny) is larger than tiny and therefore this clipping also
+    # prevents division by zero in the next step.
+    s = np.clip(s, a_min=np.finfo(W.dtype).tiny, a_max=None)


This operation is only defined when W @ W.T has no 0 eigenvalue. At some point we should read the ref and find out what to do in that case. In the mean time I think this is acceptable.

sklearn/decomposition/_fastica.py

jeremiedbb · 2022-03-25T13:32:20Z

sklearn/decomposition/tests/test_fastica.py

+def test_fastica_attributes_dtypes(global_dtype):
+    rng = np.random.RandomState(0)
+    X = rng.random_sample((100, 10)).astype(global_dtype, copy=False)
+    fica = FastICA(
+        n_components=5, max_iter=1000, whiten="unit-variance", random_state=0
+    ).fit(X)
+    assert fica.components_.dtype == global_dtype


mixing the preserve_dtypes tag with the global_dtype fixture is a bit brittle. If we decide to extend the fixture to let's say float16 for instance at some point, this test will break.

I don't think we plan to do that so I'm ok to leave it as is, but it worries me a little bit.

ogrisel · 2022-03-28T16:08:01Z

Merged!

ogrisel · 2022-03-28T16:08:17Z

Thank you very much for the contrib @JihaneBennis!

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

JihaneBennis added 3 commits March 12, 2022 17:50

dtype preservation fix

b7112bc

reformatting

ba2324d

deleting intruding lines

d586942

github-actions bot added the module:decomposition label Mar 12, 2022

ogrisel approved these changes Mar 13, 2022

View reviewed changes

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

ogrisel added 2 commits March 18, 2022 17:03

Merge branch 'main' into FastICA_dtype

2fa7b5f

Test dtype preservation in attributes and return value of the function

5e6f5d1

jeremiedbb reviewed Mar 18, 2022

View reviewed changes

sklearn/decomposition/tests/test_fastica.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_fastica.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_fastica.py Outdated Show resolved Hide resolved

Use a combination of global_random_seed and global_dtype assess numer…

86d4666

…ical stability

ogrisel reviewed Mar 18, 2022

View reviewed changes

ogrisel and others added 3 commits March 18, 2022 18:27

Apply suggestions from code review

4771cf7

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

Avoid deprecation warning and convergence warning in new tests

9a998ed

Improving numerical stability with float32 data

4464f9d

ogrisel reviewed Mar 21, 2022

View reviewed changes

delete invalid comment

6b13549

ogrisel mentioned this pull request Mar 22, 2022

Improve tests to make them run on variously typed data using the global_dtype fixture #22881

Open

Trigger CI

1752f8a

Silence the ConvergenceWarning in test_inverse_transform

bc49b85

ogrisel reviewed Mar 22, 2022

View reviewed changes

More neutral clipping at np.finfo(W.dtype).smallest_subnormal

3a9982e

jjerphan reviewed Mar 23, 2022

View reviewed changes

ogrisel reviewed Mar 23, 2022

View reviewed changes

sklearn/decomposition/_fastica.py Outdated Show resolved Hide resolved

ogrisel and others added 4 commits March 23, 2022 11:09

Apply suggestions from code review

86a92bd

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Set atol based on the magnitude of the data in tests

bbd7717

float16 has no chance to work properly: make it explicit that this is…

fef02ff

… only tested for float64 and float32

Document change in what's new

da9fa5f

ogrisel reviewed Mar 23, 2022

View reviewed changes

sklearn/decomposition/tests/test_fastica.py Show resolved Hide resolved

ogrisel added 2 commits March 23, 2022 12:14

Clip with np.finfo(W.dtype).tiny instead of np.finfo(W.dtype).smalles…

fb9c6e7

…t_subnormal

Reformat comment

95b7fce

ogrisel added the Waiting for Reviewer label Mar 23, 2022

jjerphan approved these changes Mar 23, 2022

View reviewed changes

Apply suggestions from code review

0ac326f

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

jjerphan changed the title ~~preserving dtypes for ICA (Issue #11000) #pariswimlds~~ ENH Preserving dtypes for ICA Mar 24, 2022

jeremiedbb approved these changes Mar 25, 2022

View reviewed changes

More _more_tags as the last method of the FastICA class

597b766

ogrisel merged commit d14fd82 into scikit-learn:main Mar 28, 2022

jeremiedbb mentioned this pull request Sep 20, 2022

TST Work-around for Ubuntu atlas float32 failure #24198

Merged

Uh oh!

ENH Preserving dtypes for ICA #22806

ENH Preserving dtypes for ICA #22806

Uh oh!

Conversation

JihaneBennis commented Mar 12, 2022 • edited by jjerphan Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jeremiedbb commented Mar 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Mar 18, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb commented Mar 18, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 21, 2022

Uh oh!

ogrisel commented Mar 22, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jjerphan Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 22, 2022

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan Mar 23, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ogrisel commented Mar 23, 2022

Uh oh!

ogrisel commented Mar 23, 2022

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Mar 24, 2022

JihaneBennis commented Mar 12, 2022 •

edited by jjerphan

Loading

jeremiedbb commented Mar 12, 2022 •

edited

Loading

ogrisel left a comment •

edited

Loading

jjerphan Mar 23, 2022 •

edited

Loading

jjerphan Mar 23, 2022 •

edited

Loading