-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Preserving dtypes for ICA #22806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@JihaneBennis Thanks for the PR. Can you add test in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks indeed for the PR and the detective work to identify the culprit line.
A new test for checking the dtypes of specific fitted attributes of the model would indeed be welcome as @jeremiedbb suggested above.
I added the two unittests @jeremiedbb wanted. I will follow-up with more tests that leverage our new fixtures. I think they have detected a numerical stability problem with float32 data. |
Please also add a what's new entry |
rng = np.random.RandomState(0) | ||
# XXX: casting X with `.astype(global_dtype)` reveals a rare numerical | ||
# stability problem of FastICA with float32 bit data. I can triggers | ||
# NaN values somewhere in the computation, only for some random seeds. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to reviewers: I did not enable global_dtype
on this test because that would make it fail for some random seeds by revealing a numerical stability problem of our fast ICA implementation on 32 bit data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am investigating a bit, and this can happen for instance by setting SKLEARN_TESTS_GLOBAL_RANDOM_SEED=60
and float32.
But I noticed that the model does not converge in float64 for that same seed while it generally does converge for other seeds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the NaN problem in 4464f9d as explained in #22806 (comment).
|
||
# XXX: furthermore, even when we do not hit the NaN issue, the | ||
# assert_allclose can only pass with rtol values significantly larger than | ||
# the usual 1e-4 we expect for float32 data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment here. The nans can appear at fit time but even when it's not the case, rtol would need to be large (larger than 1e-1) for some seeds.
I think we can safely say that our implementation is not numerically stable enough to work with float32 data without upcasting at this point.
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
sklearn/decomposition/_fastica.py
Outdated
@@ -54,6 +54,10 @@ def _sym_decorrelation(W): | |||
i.e. W <- (W * W.T) ^{-1/2} * W | |||
""" | |||
s, u = linalg.eigh(np.dot(W, W.T)) | |||
# avoid sqrt of negative values or division by zero because of rounding | |||
# errors: | |||
s = np.clip(s, a_min=np.finfo(W.dtype).eps, a_max=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By extending the existing tests to use the global_dtype
and the global_random_seed
fixtures, I found out that this part of the code could break quite easily with negative eigen values.
This fixes it and makes the test much more stable. However I still found fishy things that I marked with XXX
comments in the diff of this PR.
warnings.simplefilter("error", RuntimeWarning) | ||
# XXX: for some seeds, the model do not converge. However this is not | ||
# what we test here. | ||
warnings.simplefilter("ignore", ConvergenceWarning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried to set max_iter
to very large values (e.g. 10_000) and for some values of global_random_seed
this will never pass. I think the problem is not float32 specific though. It should be investigated in a dedicated PR.
I am not sure this PR is mergeable in its current state. While it does fix the most serious numerical stability problem found with the new fixtures, the remaining To move forward we probably need to use the |
A macOS build was stalled in the previous commit: I pushed a new commit to see if it can be reproduced. |
assert_array_almost_equal(Xt, Xt2) | ||
# XXX: we have to set atol for this test to pass for all seeds when | ||
# fitting with float32 data. Is this is a revealing a bug? | ||
assert_allclose(Xt, Xt2, atol=1e-6 if global_dtype == np.float32 else 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed that if we use Student-t distributed data instead of uniformly distributed data in this test, then it's possible to make the test pass with rtol=1e-3
(but not rtol=1e-4
) and with a strict atol=0.0
.
Also, I have check that there is not exact zeros in Xt2
when assertion with the default tols fail (both with uniform and Student-t data).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it's a bug in FastICA, but numerical consideration based the generated datasets.
X values are floats in [0, 1]. If we change them to have a larger range (e.g. to be in [0, 100], for instance:
diff --git a/sklearn/decomposition/tests/test_fastica.py b/sklearn/decomposition/tests/test_fastica.py
index f352a126f1..38f87caa06 100644
--- a/sklearn/decomposition/tests/test_fastica.py
+++ b/sklearn/decomposition/tests/test_fastica.py
@@ -321,7 +321,7 @@ def test_inverse_transform(
# Test FastICA.inverse_transform
n_samples = 100
rng = np.random.RandomState(global_random_seed)
- X = rng.random_sample((n_samples, 10)).astype(global_dtype)
+ X = 100 * rng.random_sample((n_samples, 10)).astype(global_dtype)
ica = FastICA(n_components=n_components, random_state=rng, whiten=whiten)
with warnings.catch_warnings():
@@ -338,7 +338,7 @@ def test_inverse_transform(
if n_components == X.shape[1]:
# XXX: we have to set atol for this test to pass for all seeds when
# fitting with float32 data. Is this is a revealing a bug?
- assert_allclose(X, X2, atol=1e-6 if global_dtype == np.float32 else 0.0)
+ assert_allclose(X, X2)
# FIXME remove filter in 1.3
then the composition does not suffer from numerical error, and hence all tests pass with:
SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" SKLEARN_RUN_FLOAT32_TESTS=1 pytest sklearn/decomposition/tests/test_fastica.py -k test_fit_transform
To me, this case shows that atol
really should depend on the way data is generated. Personally, I do not want to get done this rabbit hole.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this case shows that atol really should depend on the way data is generated. Personally, I do not want to get done this rabbit hole.
atol
does depend on the data ! it basically means "I consider elements less than atol to be zero". If you have data with an average magnitude of 1, you might want set atol to 1e-5 or something like that. If you have data with an average magnitude of 1e-12 (m.e.g data for instance), then you want to set atol to 1e-17 or so.
Apparently this is a non-reproducible event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments on those numerical edges cases.
assert_array_almost_equal(Xt, Xt2) | ||
# XXX: we have to set atol for this test to pass for all seeds when | ||
# fitting with float32 data. Is this is a revealing a bug? | ||
assert_allclose(Xt, Xt2, atol=1e-6 if global_dtype == np.float32 else 0.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think it's a bug in FastICA, but numerical consideration based the generated datasets.
X values are floats in [0, 1]. If we change them to have a larger range (e.g. to be in [0, 100], for instance:
diff --git a/sklearn/decomposition/tests/test_fastica.py b/sklearn/decomposition/tests/test_fastica.py
index f352a126f1..38f87caa06 100644
--- a/sklearn/decomposition/tests/test_fastica.py
+++ b/sklearn/decomposition/tests/test_fastica.py
@@ -321,7 +321,7 @@ def test_inverse_transform(
# Test FastICA.inverse_transform
n_samples = 100
rng = np.random.RandomState(global_random_seed)
- X = rng.random_sample((n_samples, 10)).astype(global_dtype)
+ X = 100 * rng.random_sample((n_samples, 10)).astype(global_dtype)
ica = FastICA(n_components=n_components, random_state=rng, whiten=whiten)
with warnings.catch_warnings():
@@ -338,7 +338,7 @@ def test_inverse_transform(
if n_components == X.shape[1]:
# XXX: we have to set atol for this test to pass for all seeds when
# fitting with float32 data. Is this is a revealing a bug?
- assert_allclose(X, X2, atol=1e-6 if global_dtype == np.float32 else 0.0)
+ assert_allclose(X, X2)
# FIXME remove filter in 1.3
then the composition does not suffer from numerical error, and hence all tests pass with:
SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" SKLEARN_RUN_FLOAT32_TESTS=1 pytest sklearn/decomposition/tests/test_fastica.py -k test_fit_transform
To me, this case shows that atol
really should depend on the way data is generated. Personally, I do not want to get done this rabbit hole.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
… only tested for float64 and float32
atol = np.abs(Xt2).mean() / 1e6 | ||
else: | ||
atol = 0.0 # the default rtol is enough for float64 data | ||
assert_allclose(Xt, Xt2, atol=atol) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atol
is not set based on the magnitude of Xt2 to make this test more robust to scale changes.
Ok I think this PR is a good enough shape. Numerical stability of this method is probably not optimal but not catastrophic either so I think it's good to enable |
I think this is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Should test_non_square_fastica
be also parametrised with the new fixtures?
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@jeremiedbb any opinion on this PR? My +1 does not really count anymore because I contributed too much code to this PR :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not the most enthusiastic approve but okay :)
i.e. W <- (W * W.T) ^{-1/2} * W | ||
""" | ||
s, u = linalg.eigh(np.dot(W, W.T)) | ||
# Avoid sqrt of negative values because of rounding errors. Note that | ||
# np.sqrt(tiny) is larger than tiny and therefore this clipping also | ||
# prevents division by zero in the next step. | ||
s = np.clip(s, a_min=np.finfo(W.dtype).tiny, a_max=None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This operation is only defined when W @ W.T has no 0 eigenvalue. At some point we should read the ref and find out what to do in that case. In the mean time I think this is acceptable.
def test_fastica_attributes_dtypes(global_dtype): | ||
rng = np.random.RandomState(0) | ||
X = rng.random_sample((100, 10)).astype(global_dtype, copy=False) | ||
fica = FastICA( | ||
n_components=5, max_iter=1000, whiten="unit-variance", random_state=0 | ||
).fit(X) | ||
assert fica.components_.dtype == global_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mixing the preserve_dtypes tag with the global_dtype fixture is a bit brittle. If we decide to extend the fixture to let's say float16 for instance at some point, this test will break.
I don't think we plan to do that so I'm ok to leave it as is, but it worries me a little bit.
Merged! |
Thank you very much for the contrib @JihaneBennis! |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Reference Issues/PRs
#11000
What does this implement/fix? Explain your changes.
Preserving the type of X in the output (float32 -> float32)
Any other comments?
#pariswimlds