Skip to content

TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 #14178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

rth
Copy link
Member

@rth rth commented Jun 24, 2019

test_truncated_svd.py::test_explained_variance_components_10_20 is occasionally failing on master in one job since #14140 was merged with,

   def test_explained_variance_components_10_20(X_sparse, kind, solver):
        X = X_sparse if kind == 'sparse' else X_sparse.toarray()
        svd_10 = TruncatedSVD(10, algorithm=solver).fit(X)
        svd_20 = TruncatedSVD(20, algorithm=solver).fit(X)
    
        # Assert the 1st component is equal
        assert_allclose(
            svd_10.explained_variance_ratio_,
            svd_20.explained_variance_ratio_[:10],
>           rtol=3e-3,
        )
E       AssertionError: 
E       Not equal to tolerance rtol=0.003, atol=0
E       
E       (mismatch 10.0%)
E        x: array([ 0.073484,  0.059621,  0.059073,  0.056352,  0.054605,  0.05335 ,
E               0.04774 ,  0.04553 ,  0.042884,  0.040547])
E        y: array([ 0.073484,  0.059609,  0.059087,  0.056357,  0.054615,  0.05333 ,
E               0.047748,  0.045528,  0.042904,  0.040399])

this previously it was an assert_array_almost_equal(..., decimal=5) which given the scale of the data should correspond approximately assert_allclose(..., rtol=a few 1e-3).

That's the annoying part of moving from assert_array_almost_equal to assert_allclose but I still think it's worth it to understand the actual tolerance used in tests.

cc @thomasjpfan

@rth rth changed the title TXT Fix test_truncated_svd.py::test_explained_variance_components_10_20 TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 Jun 24, 2019
@thomasjpfan
Copy link
Member

Would assert_array_almost_equal(..., decimal=5) be the most similar to assert_allclose(..., atol=1.5e-5)?

@rth
Copy link
Member Author

rth commented Jun 24, 2019

Would assert_array_almost_equal(..., decimal=5) be the most similar to assert_allclose(..., atol=1.5e-5)?

It would be equivalent, but that is not recommended for the same reason assert_array_almost_equal is not recommended. Comparing with an absolute tolerance only works if one knows the scale of the data (and is able to compensate for it). Often, a casual reader would not know the scale of the data (e.g. here before I printed those values in the failed test). The risk is then of false positives e.g. if the data has a scale of 1e-5 or lower and we use a tolerance of 1.5e-5, the test will always pass without the user realizing that the test doesn't enforce anything.

For that reason, it's better to use rtol instead, except when compared with 0 (cf PEP 485). In our case it means changing atol to rtol, there is a risk of test failure if the threshold is too low, but it's also an opportunity to check that the tests are behaving as expected.

For instance, I do hope that the data we are comparing with in the following tests is larger than 0.1 (or at least of the order of 1.0),

sklearn/decomposition/tests/test_nmf.py
198:        assert_array_almost_equal(A_fit_tr, A_tr, decimal=1)

sklearn/linear_model/tests/test_least_angle.py
356:    assert_array_almost_equal(lars_coef, lasso_coef2, decimal=1)

sklearn/linear_model/tests/test_logistic.py
1636:    assert_array_almost_equal(sgd.coef_, log.coef_, decimal=1)
1652:        assert_array_almost_equal(coefs[0], coefs[1], decimal=1)
1654:        assert_array_almost_equal(coefs[0], coefs[2], decimal=1)
1656:        assert_array_almost_equal(coefs[1], coefs[2], decimal=1)

sklearn/mixture/tests/test_gaussian_mixture.py
1020:        assert_array_almost_equal(gmm.means_, means_s, decimal=1)

sklearn/svm/tests/test_sparse.py
234:    assert_array_almost_equal(clf.coef_, sp_clf.coef_, decimal=1)
235:    assert_array_almost_equal(clf.intercept_, sp_clf.intercept_, decimal=1)

probably, but hard to be sure just by looking at it.

@rth rth mentioned this pull request Jun 24, 2019
@rth
Copy link
Member Author

rth commented Jun 24, 2019

Also related #13977

@thomasjpfan thomasjpfan merged commit a413f88 into scikit-learn:master Jun 24, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants