TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 #14178

rth · 2019-06-24T19:07:07Z

test_truncated_svd.py::test_explained_variance_components_10_20 is occasionally failing on master in one job since #14140 was merged with,

   def test_explained_variance_components_10_20(X_sparse, kind, solver):
        X = X_sparse if kind == 'sparse' else X_sparse.toarray()
        svd_10 = TruncatedSVD(10, algorithm=solver).fit(X)
        svd_20 = TruncatedSVD(20, algorithm=solver).fit(X)
    
        # Assert the 1st component is equal
        assert_allclose(
            svd_10.explained_variance_ratio_,
            svd_20.explained_variance_ratio_[:10],
>           rtol=3e-3,
        )
E       AssertionError: 
E       Not equal to tolerance rtol=0.003, atol=0
E       
E       (mismatch 10.0%)
E        x: array([ 0.073484,  0.059621,  0.059073,  0.056352,  0.054605,  0.05335 ,
E               0.04774 ,  0.04553 ,  0.042884,  0.040547])
E        y: array([ 0.073484,  0.059609,  0.059087,  0.056357,  0.054615,  0.05333 ,
E               0.047748,  0.045528,  0.042904,  0.040399])

this previously it was an assert_array_almost_equal(..., decimal=5) which given the scale of the data should correspond approximately assert_allclose(..., rtol=a few 1e-3).

That's the annoying part of moving from assert_array_almost_equal to assert_allclose but I still think it's worth it to understand the actual tolerance used in tests.

cc @thomasjpfan

thomasjpfan · 2019-06-24T20:32:41Z

Would assert_array_almost_equal(..., decimal=5) be the most similar to assert_allclose(..., atol=1.5e-5)?

rth · 2019-06-24T21:11:11Z

Would assert_array_almost_equal(..., decimal=5) be the most similar to assert_allclose(..., atol=1.5e-5)?

It would be equivalent, but that is not recommended for the same reason assert_array_almost_equal is not recommended. Comparing with an absolute tolerance only works if one knows the scale of the data (and is able to compensate for it). Often, a casual reader would not know the scale of the data (e.g. here before I printed those values in the failed test). The risk is then of false positives e.g. if the data has a scale of 1e-5 or lower and we use a tolerance of 1.5e-5, the test will always pass without the user realizing that the test doesn't enforce anything.

For that reason, it's better to use rtol instead, except when compared with 0 (cf PEP 485). In our case it means changing atol to rtol, there is a risk of test failure if the threshold is too low, but it's also an opportunity to check that the tests are behaving as expected.

For instance, I do hope that the data we are comparing with in the following tests is larger than 0.1 (or at least of the order of 1.0),

sklearn/decomposition/tests/test_nmf.py
198:        assert_array_almost_equal(A_fit_tr, A_tr, decimal=1)

sklearn/linear_model/tests/test_least_angle.py
356:    assert_array_almost_equal(lars_coef, lasso_coef2, decimal=1)

sklearn/linear_model/tests/test_logistic.py
1636:    assert_array_almost_equal(sgd.coef_, log.coef_, decimal=1)
1652:        assert_array_almost_equal(coefs[0], coefs[1], decimal=1)
1654:        assert_array_almost_equal(coefs[0], coefs[2], decimal=1)
1656:        assert_array_almost_equal(coefs[1], coefs[2], decimal=1)

sklearn/mixture/tests/test_gaussian_mixture.py
1020:        assert_array_almost_equal(gmm.means_, means_s, decimal=1)

sklearn/svm/tests/test_sparse.py
234:    assert_array_almost_equal(clf.coef_, sp_clf.coef_, decimal=1)
235:    assert_array_almost_equal(clf.intercept_, sp_clf.intercept_, decimal=1)

probably, but hard to be sure just by looking at it.

rth · 2019-06-24T21:22:46Z

Also related #13977

…it-learn#14178)

TXT Fix test_truncated_svd.py::test_explained_variance_components_10_20

a99dc88

rth changed the title ~~TXT Fix test_truncated_svd.py::test_explained_variance_components_10_20~~ TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 Jun 24, 2019

Re-trigger CI

0bca8e2

rth mentioned this pull request Jun 24, 2019

TST refactor test for PCA #14138

Merged

thomasjpfan approved these changes Jun 24, 2019

View reviewed changes

thomasjpfan merged commit a413f88 into scikit-learn:master Jun 24, 2019

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

TST Fix test_truncated_svd.py::test_explained_variance_compone… (scik…

4e65879

…it-learn#14178)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 #14178

TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 #14178

Uh oh!

rth commented Jun 24, 2019 •

edited

Loading

Uh oh!

thomasjpfan commented Jun 24, 2019

Uh oh!

rth commented Jun 24, 2019 •

edited

Loading

Uh oh!

rth commented Jun 24, 2019

Uh oh!

Uh oh!

Uh oh!

TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 #14178

TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 #14178

Uh oh!

Conversation

rth commented Jun 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Jun 24, 2019

Uh oh!

rth commented Jun 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth commented Jun 24, 2019

Uh oh!

Uh oh!

rth commented Jun 24, 2019 •

edited

Loading

rth commented Jun 24, 2019 •

edited

Loading