TST refactor test_truncated_svd #14140

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

thomasjpfan merged 7 commits into scikit-learn:master from rth:refactor-truncated-svd

Jun 24, 2019

Member

rth commented Jun 21, 2019

Similarly to #14138 refactor test_truncated_svd to make them more parametrized and avoid assert_array_almost_equal.

This would help for adding a new solver in #12319

Should be fairly easy to review.


          TST refactor test_truncated_svd

a55eb87

rth requested a review from glemaitre

June 21, 2019 16:49


          Fix typo

b7ed247

rth commented

View reviewed changes

Member Author

rth left a comment

A few explanations

sklearn/decomposition/tests/test_truncated_svd.py Outdated



		def test_algorithms():
		@pytest.mark.parametrize("algorithm", ['randomized'])

Member Author

rth Jun 21, 2019

This would help once we add the lobpcg solver

sklearn/decomposition/tests/test_truncated_svd.py Outdated

    
                  assert_array_almost_equal(apca.singular_values_, rpca.singular_values_, 12)

                  pca = TruncatedSVD(n_components=2, algorithm='randomized',

                                     random_state=rng).fit(X)

                  assert_allclose(apca.singular_values_, pca.singular_values_, rtol=1e-2)

Member Author

rth Jun 21, 2019

This fails with a higher relative tolerance which I assume means that singular_values are very tiny..

Member Author

rth Jun 21, 2019

No they are not tiny,

>>> apca.singular_values_
[17.57446934 17.47000919]
>>> pca.singular_values_
[17.52238412 17.3596697 ]

but also certainly not equal up to 1e-12 retlative tolerance.

This was previously passing because we were comparing arpack solver against itself by mistake a few lines above.

sklearn/decomposition/tests/test_truncated_svd.py

    
                  assert_allclose(np.sum(apca.singular_values_**2.0),

                                  np.linalg.norm(X_apca, "fro")**2.0, rtol=1e-2)

                  assert_allclose(np.sum(pca.singular_values_**2.0),

                                  np.linalg.norm(X_pca, "fro")**2.0, rtol=1e-2)

Member Author

rth Jun 21, 2019

Same as above

glemaitre reviewed

View reviewed changes

Member

glemaitre left a comment

Couple of changes

sklearn/decomposition/tests/test_truncated_svd.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_truncated_svd.py Show resolved Hide resolved

sklearn/decomposition/tests/test_truncated_svd.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_truncated_svd.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_truncated_svd.py Show resolved Hide resolved

sklearn/decomposition/tests/test_truncated_svd.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_truncated_svd.py Show resolved Hide resolved

sklearn/decomposition/tests/test_truncated_svd.py Outdated Show resolved Hide resolved

sklearn/decomposition/tests/test_truncated_svd.py Outdated Show resolved Hide resolved

rth added 2 commits

June 21, 2019 14:30


          Addressing review comments

a912d34


          Guillaume's review comments

b9e044d

glemaitre reviewed

View reviewed changes

sklearn/decomposition/tests/test_truncated_svd.py Outdated Show resolved Hide resolved

glemaitre approved these changes

View reviewed changes

Member

glemaitre left a comment

LGTM

rth commented

View reviewed changes

sklearn/decomposition/tests/test_truncated_svd.py

-                  # Compare sparse vs. dense
-                  for svd_sparse, svd_dense in svds_sparse_v_dense:
-                      assert_array_almost_equal(svd_sparse.explained_variance_ratio_,
-                                                svd_dense.explained_variance_ratio_)

Member Author

rth Jun 21, 2019

This was removed as it is indirectly tested with checking that explained_variance_ratio_ for both dense and sparse is equal to the expected value below.


          A few more comments

8a2870e

jnothman approved these changes

View reviewed changes

sklearn/decomposition/tests/test_truncated_svd.py Outdated

+              @pytest.fixture(scope='module')
+              def X_sparse():
+                  # Make an X that looks somewhat like a small tf-idf matrix.
+                  # XXX newer versions of SciPy >0.16 have scipy.sparse.rand for this.

Member

jnothman Jun 23, 2019

I think now our reason for not using scipy.sparse.rand must be different: we require scipy > 0.16

Member Author

rth Jun 24, 2019

Good point. I switched to scipy.sparse.rand here.

sklearn/decomposition/tests/test_truncated_svd.py

               @pytest.mark.parametrize('fmt', ("array", "csr", "csc", "coo", "lil"))
-              def test_sparse_formats(fmt):
-                  Xfmt = Xdense if fmt == "dense" else getattr(X, "to" + fmt)()
+              def test_sparse_formats(fmt, X_sparse):

Member

jnothman Jun 23, 2019

It's interesting that this is not quite covered by common tests. Apparently we check predict and predict_proba, but not transform.

Member Author

rth Jun 24, 2019

Opened #14176 to address this.


          Use scipy.random

14f5517

rth mentioned this pull request

Add common test for .transform on different sparse formats #14176

Open

thomasjpfan reviewed

View reviewed changes

sklearn/decomposition/tests/test_truncated_svd.py


		@pytest.fixture(scope='module')
		def X_sparse():

Member

thomasjpfan Jun 22, 2019

I always wanted to do this more for our tests. (Make data into module level fixtures) Are we okay with this change?

Member Author

rth Jun 24, 2019

I don't see why not.

The only risk is in place modification of the data. Common tests that check read-only arrays as input should prevent it somewhat. Still one way around it could be a module or session scoped fixture that load and prepares the data, and a function scoped fixture that makes a copy of it. Another alternative could be to return data array with a read-only=True flag.

Still that's something we could look into separately. Here that should be fairly equivalent to using a global variable with data.

sklearn/decomposition/tests/test_truncated_svd.py Outdated

    
                  assert_array_almost_equal(comp_a[:9], comp_r[:9])

                  assert_array_almost_equal(comp_a[9:], comp_r[9:], decimal=2)

                  assert_allclose(comp_a[:9], comp[:9], rtol=1e-3)

                  assert_allclose(comp_a[9:], comp[9:], rtol=2e-1, atol=1e-2)

Member

thomasjpfan Jun 24, 2019

Is rtol needed here?

Member Author

rth Jun 24, 2019 •

edited

Loading

Yes, later coefficients need higher tolerance, also decimal in assert_array_almost_equal account for both rtol and atol, e.g.,

assert_array_almost_equal(0.01, 0, decimal=1)

Member

thomasjpfan Jun 24, 2019

In the check for later coefficients:

assert_allclose(comp_a[9:], comp[9:], rtol=2e-1, atol=1e-2)

with atol=1e-2 it feels like rtol can even be set to zero.

BTW I am fine either way, this is kind of nitpicky. 😅

Member Author

rth Jun 24, 2019

On a second look, you are right -- removed it.


          Remove rtol

9aae4a1

thomasjpfan approved these changes

View reviewed changes

thomasjpfan merged commit eade48e into scikit-learn:master

Member

thomasjpfan commented Jun 24, 2019

Thank you @rth!

rth mentioned this pull request

TST Fix test_truncated_svd.py::test_explained_variance_components_10_20 #14178

Merged

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request


          TST refactor test_truncated_svd (scikit-learn#14140)

90029c5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet