-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST refactor test_truncated_svd #14140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few explanations
|
||
|
||
def test_algorithms(): | ||
@pytest.mark.parametrize("algorithm", ['randomized']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would help once we add the lobpcg solver
assert_array_almost_equal(apca.singular_values_, rpca.singular_values_, 12) | ||
pca = TruncatedSVD(n_components=2, algorithm='randomized', | ||
random_state=rng).fit(X) | ||
assert_allclose(apca.singular_values_, pca.singular_values_, rtol=1e-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fails with a higher relative tolerance which I assume means that singular_values are very tiny..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No they are not tiny,
>>> apca.singular_values_
[17.57446934 17.47000919]
>>> pca.singular_values_
[17.52238412 17.3596697 ]
but also certainly not equal up to 1e-12 retlative tolerance.
This was previously passing because we were comparing arpack solver against itself by mistake a few lines above.
assert_allclose(np.sum(apca.singular_values_**2.0), | ||
np.linalg.norm(X_apca, "fro")**2.0, rtol=1e-2) | ||
assert_allclose(np.sum(pca.singular_values_**2.0), | ||
np.linalg.norm(X_pca, "fro")**2.0, rtol=1e-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
# Compare sparse vs. dense | ||
for svd_sparse, svd_dense in svds_sparse_v_dense: | ||
assert_array_almost_equal(svd_sparse.explained_variance_ratio_, | ||
svd_dense.explained_variance_ratio_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was removed as it is indirectly tested with checking that explained_variance_ratio_
for both dense and sparse is equal to the expected value below.
@pytest.fixture(scope='module') | ||
def X_sparse(): | ||
# Make an X that looks somewhat like a small tf-idf matrix. | ||
# XXX newer versions of SciPy >0.16 have scipy.sparse.rand for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think now our reason for not using scipy.sparse.rand must be different: we require scipy > 0.16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I switched to scipy.sparse.rand here.
|
||
|
||
@pytest.mark.parametrize('fmt', ("array", "csr", "csc", "coo", "lil")) | ||
def test_sparse_formats(fmt): | ||
Xfmt = Xdense if fmt == "dense" else getattr(X, "to" + fmt)() | ||
def test_sparse_formats(fmt, X_sparse): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that this is not quite covered by common tests. Apparently we check predict and predict_proba, but not transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #14176 to address this.
|
||
@pytest.fixture(scope='module') | ||
def X_sparse(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I always wanted to do this more for our tests. (Make data into module level fixtures) Are we okay with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why not.
The only risk is in place modification of the data. Common tests that check read-only arrays as input should prevent it somewhat. Still one way around it could be a module or session scoped fixture that load and prepares the data, and a function scoped fixture that makes a copy of it. Another alternative could be to return data array with a read-only=True flag.
Still that's something we could look into separately. Here that should be fairly equivalent to using a global variable with data.
assert_array_almost_equal(comp_a[:9], comp_r[:9]) | ||
assert_array_almost_equal(comp_a[9:], comp_r[9:], decimal=2) | ||
assert_allclose(comp_a[:9], comp[:9], rtol=1e-3) | ||
assert_allclose(comp_a[9:], comp[9:], rtol=2e-1, atol=1e-2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is rtol
needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, later coefficients need higher tolerance, also decimal
in assert_array_almost_equal
account for both rtol and atol, e.g.,
assert_array_almost_equal(0.01, 0, decimal=1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the check for later coefficients:
assert_allclose(comp_a[9:], comp[9:], rtol=2e-1, atol=1e-2)
with atol=1e-2
it feels like rtol
can even be set to zero.
BTW I am fine either way, this is kind of nitpicky. 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a second look, you are right -- removed it.
Thank you @rth! |
Similarly to #14138 refactor
test_truncated_svd
to make them more parametrized and avoidassert_array_almost_equal
.This would help for adding a new solver in #12319
Should be fairly easy to review.