-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST use global_dtype in sklearn/metrics/tests/test_pairwise.py #22666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST use global_dtype in sklearn/metrics/tests/test_pairwise.py #22666
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's systematically add assertions on the expected dtype for the results of the pairwise distance computation when the dtype of the input is fully specified in the test.
I only did some suggestions via github because I cannot suggestion on folded lines but almost all the newly parametrized tests would deserved such a treatment to make the parametrization more useful.
test_pairwise.py
to test implementations on 32bit datasets@@ -291,7 +296,7 @@ def callable_rbf_kernel(x, y, **kwds): | |||
(pairwise_kernels, callable_rbf_kernel, {"gamma": 0.1}), | |||
], | |||
) | |||
@pytest.mark.parametrize("dtype", [np.float64, int]) | |||
@pytest.mark.parametrize("dtype", [np.float64, np.float32, int]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't weird to not use np.int32
or np.int64
since int
would be platform dependent here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also be worth to check the output dtype
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking the dtype fails here because it turns out that the returned dtype depends on n_jobs
. This is a bug imo. I'll open an issue about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the issue opened? If so we could link it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that an issue was created.
@jeremiedbb: do you have a snippet which reproduces this bug? This way we could create an issue and resolve it. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened #24502
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. This is already a net improvement even if all the tests have not yet been global_dtype
fixtured.
I locally ran:
SKLEARN_RUN_FLOAT32_TESTS=1 pytest -v sklearn/metrics/tests/test_pairwise.py
and I get 244 passed test instead of 194. No failures and no new warnings.
@@ -291,7 +296,7 @@ def callable_rbf_kernel(x, y, **kwds): | |||
(pairwise_kernels, callable_rbf_kernel, {"gamma": 0.1}), | |||
], | |||
) | |||
@pytest.mark.parametrize("dtype", [np.float64, int]) | |||
@pytest.mark.parametrize("dtype", [np.float64, np.float32, int]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the issue opened? If so we could link it here.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @jjerphan
Reference Issues/PRs
Partially addresses #22881
Precedes #22590
What does this implement/fix? Explain your changes.
This parametrizes tests from
test_pairwise.py
to run on 32bit datasets.Any other comments?
We could introduce a mechanism to be able to able to remove tests' execution on 32bit datasets if this takes too much time to complete.