-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Fix missing assert and parametrize some k-means tests #12368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor comment, other than that LGTM!
km = KMeans(init=centers.copy(), n_clusters=n_clusters, random_state=42, | ||
n_init=1) | ||
km.fit(X) | ||
@pytest.mark.parametrize('representation', ['dense', 'sparse']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not directly
@pytest.mark.parametrize('data', [X, X_csr])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's just for the readability when you run pytest.
With your proposition tests will appear as
test_whatever_test_name[data0]
test_whatever_test_name[data1]
Here it will appear as
test_whatever_test_name[dense]
test_whatever_test_name[sparse]
I just find it easier to track which parameters make some test fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's a good point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, it's possible to provide ids
as a workaround,
@pytest.mark.parametrize('data', (X, Xcsr), ids=('dense', 'sparse'))
maybe that's a bit more direct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know that. This is better ! Thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jeremiedbb this is nice. A few comments below.
km = KMeans(init=centers.copy(), n_clusters=n_clusters, random_state=42, | ||
n_init=1) | ||
km.fit(X) | ||
@pytest.mark.parametrize('representation', ['dense', 'sparse']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, it's possible to provide ids
as a workaround,
@pytest.mark.parametrize('data', (X, Xcsr), ids=('dense', 'sparse'))
maybe that's a bit more direct?
|
||
# sanity check: predict centroid labels | ||
pred = mb_k_means.predict(mb_k_means.cluster_centers_) | ||
assert_array_equal(pred, np.arange(n_clusters)) | ||
|
||
# check that models trained on sparse input also works for dense input at | ||
# predict time | ||
assert_array_equal(mb_k_means.predict(X), mb_k_means.labels_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we still keep this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved it in a new function : test_predict_minibatch_dense_sparse
.
|
||
# sanity check: predict centroid labels | ||
pred = mb_k_means.predict(mb_k_means.cluster_centers_) | ||
assert_array_equal(pred, np.arange(n_clusters)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we keep these 2 lines as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did keep them :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in test_predict_minibatch
, line 559-560
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, true I missed those :)
Thanks @jeremiedbb and thank you for the review @NicolasHug !
BTW, Circle CI doesn't seem to be triggering now. https://status.circleci.com/ looks fine, so I'm not sure what happened. Anyway it should not affect this PR. |
…learn#12368)" This reverts commit 347c272.
…learn#12368)" This reverts commit 347c272.
Noticed a missing
assert
in k-means tests, meaning the test would always pass.I took the opportunity to parametrize some of the k-means test. I did not make any changes to the tests, just avoided code redundancy. I was doing it in #11950 but it will be more reviewable if I do it here, in a separate PR.