TST use global_dtype in sklearn/neighbors/tests/test_neighbors.py #22663

jjerphan · 2022-03-03T13:03:00Z

Reference Issues/PRs

Partially addresses #22881
Precedes #22590

What does this implement/fix? Explain your changes.

This parametrizes tests from test_neighbors.py to run on 32bit datasets.

Any other comments?

We could introduce a mechanism to be able to able to remove tests' execution on 32bit datasets if this takes too much time to complete.

jeremiedbb · 2022-03-03T15:01:23Z

general remark regarding the batch of similar PRs. I think we need to be careful to not parametrize with dtype too many tests because it doubles the time for running the tests. The test suite takes a lot of time already.

jjerphan · 2022-03-03T16:47:15Z

Yes -- I put a remark regarding having a mechanism to only test generally for dtype=np.float64. I think I'll create an issue for this topic.

thomasjpfan · 2022-03-03T17:03:13Z

Yes -- I put a remark regarding having a mechanism to only test generally for dtype=np.float64

I have a mechanism in mind. It comes down to a global fixture that is either triggered with a pytest mark or an ENV variable. (A custom command line option can work but it requires a little more work to make it work with --pyargs)

The ENV variable is easiest, because we can just set it in azure_pipeline.yml without additional logic. The idea looks like:

import pytest
import numpy as np
from os import environ

_SKIP32_MARK = pytest.mark.skipif(
    environ.get("SKLEARN_SKIP_FLOAT32", "1") != "0",
    reason="Set SKLEARN_SKIP_FLOAT32=0 to run float32 dtype tests",
)

# place this in `conftest.py` in scikit-learn
@pytest.fixture(params=[pytest.param(np.float32, marks=_SKIP32_MARK), np.float64])
def dtype(request):
    yield request.param

def test_dtype(dtype):
    a = np.asarray([1, 2], dtype=dtype)
    assert a.dtype == dtype

Skips float32 by default

pytest test_script.py -v

To run float32

SKLEARN_SKIP_FLOAT32=0 pytest test_script.py -v

jjerphan · 2022-03-03T17:19:45Z

You were faster than me, @thomasjpfan!
I like the idea of using pytest.fixtures even if it looks like an automagic trick to me.

In the meantime, I created #22680 to discuss it and to pin the group of PRs for testing on 32bit datasets.

ogrisel · 2022-03-03T17:44:20Z

general remark regarding the batch of similar PRs. I think we need to be careful to not parametrize with dtype too many tests because it doubles the time for running the tests. The test suite takes a lot of time already.

On top of this I think it's important to add extra assertions on what is the expected impact of fitting a model with a specific input dtype.

For instance a fitted attribute that is an array of fitted parameters could have a dtype that depends on the input, or to the contrary, we can check that the dtype of such a fitted attribute is always float64 even if the input is float32, if there is a good reason for that (e.g. to avoid a known numerical stability problem). We should add a comment in the test to explain when this is the case as in general I would expect the precision of the fitted attributes to be lower when the input data is of lower precision.

Similarly for the dtype of the arrays returned by transform for transformers, or predict for regressors or predict_proba for classifiers.

ogrisel · 2022-03-03T17:46:08Z

In particular assert_array_equal(a, b) and assert_allclose(a, b) can pass if a.dtype == np.float64 and b.dtype == np.float32 so it's important to explicitly check what is the expected dtype in those cases.

jeremiedbb · 2022-03-03T17:48:08Z

Similarly for the dtype of the arrays returned by transform for transformers,

For that there's the common check check_transformer_preserve_dtypes

jeremiedbb · 2022-03-03T17:53:10Z

For the attributes it's estimator and attribute specific. Some are meant to have the same dtype as the input but others are integers arrays or scalars, or whatever. For estimators that are supposed to preserve some dtypes, we usually have a dedicated test to check the dtype of the appropriate attributes.

It's however possible that we don't have tests for all of them. We should.

ogrisel · 2022-03-03T18:37:09Z

For that there's the common check check_transformer_preserve_dtypes

Good point. No need to duplicate this check then. We don't have anything similar for the predict method of regressors or the predict_proba method of classifier, right?

sklearn/kernel_approximation.py:                "check_transformer_preserve_dtypes": (
sklearn/manifold/tests/test_spectral_embedding.py:    `check_transformer_preserve_dtypes`. However, this test only run
sklearn/utils/estimator_checks.py:        yield check_transformer_preserve_dtypes
sklearn/utils/estimator_checks.py:def check_transformer_preserve_dtypes(name, transformer_orig):

We probably should.

thomasjpfan · 2022-03-03T18:43:09Z

We don't have anything similar for the predict method of regressors or the predict_proba method of classifier, right?

We do not. Likely deserves an issue to define what the behavior should be. For example, regressor.fit(X_32, y_64), should regressor.predict(X_32) be float32 or float64?

ogrisel · 2022-03-03T19:05:01Z

@thomasjpfan I created an issue: #22682

Feel free to edit. If you agree with my proposal, please remove the Needs Triage label and add a new Help Wanted label.

jeremiedbb

Here are some comments. In addition all astype should use copy=False

sklearn/neighbors/tests/test_neighbors.py

jeremiedbb · 2022-03-25T15:55:52Z

sklearn/neighbors/tests/test_neighbors.py

@@ -1317,7 +1346,7 @@ def test_kneighbors_graph():
    assert_array_equal(A.toarray(), np.eye(A.shape[0]))

    A = neighbors.kneighbors_graph(X, 1, mode="distance")
-    assert_array_almost_equal(
+    assert_allclose(


Shouldn't this test (test_kneighbors_graph) use the global_dtype ?

sklearn/neighbors/tests/test_neighbors.py

Co-authored-by: Jérémie du Boisberranger

jeremiedbb · 2022-03-30T08:09:38Z

All the astype are missing copy=False. Looks good otherwise

… tests

ogrisel

I pushed some nitpicks + a fix for warnings raised during the tests.

There are still warnings about invalid values in division when running the tests with -Werror but this is because of _weight_func that can generate invalid weights for zero distance pairs. However this is unrelated to the scope of this PR (it happens irrespective of the dtype of X) and would better be addressed in a dedicated PR.

jeremiedbb

LGTM

…ikit-learn#22663) Co-authored-by: Jérémie du Boisberranger Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

TST Adapt test_neighbors.py to test implementations on 32bit datasets

73ae8b2

github-actions bot added the module:neighbors label Mar 3, 2022

jjerphan added the No Changelog Needed label Mar 3, 2022

jjerphan marked this pull request as ready for review March 3, 2022 15:01

jjerphan mentioned this pull request Mar 3, 2022

TST Add option to run tests on 32bit data #22680

Closed

This was referenced Mar 3, 2022

TST use global_dtype in sklearn/cluster/tests/test_birch.py #22671

Merged

Estimator check for dtype preservation for regressors #22682

Open

jjerphan mentioned this pull request Mar 4, 2022

TST Add minimal setup to be able to run test suite on float32 #22690

Merged

jjerphan added 2 commits March 17, 2022 18:43

Merge branch 'main' into tst/test_neighbors-32bit

2654be6

TST Use global_dtype

7e04e7b

jjerphan changed the title ~~TST Adapt test_neighbors.py to test implementations on 32bit datasets~~ TST use global_dtype in sklearn/neighbors/tests/test_neighbors.py Mar 17, 2022

jjerphan mentioned this pull request Mar 17, 2022

Improve tests to make them run on variously typed data using the global_dtype fixture #22881

Open

jjerphan added the Waiting for Reviewer label Mar 23, 2022

jeremiedbb reviewed Mar 25, 2022

View reviewed changes

Apply review comments

35ff908

Co-authored-by: Jérémie du Boisberranger

jjerphan and others added 2 commits March 30, 2022 10:15

Don't copy on same dtype

382a472

Pass dtype=global_dtype directly to np.array([...]) + fix warnings in…

0a90e35

… tests

ogrisel approved these changes Mar 30, 2022

View reviewed changes

jeremiedbb approved these changes Mar 30, 2022

View reviewed changes

jeremiedbb merged commit 7931262 into scikit-learn:main Mar 30, 2022

jjerphan deleted the tst/test_neighbors-32bit branch March 30, 2022 12:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST use global_dtype in sklearn/neighbors/tests/test_neighbors.py #22663

TST use global_dtype in sklearn/neighbors/tests/test_neighbors.py #22663

jjerphan commented Mar 3, 2022 •

edited

Loading

jeremiedbb commented Mar 3, 2022

jjerphan commented Mar 3, 2022 •

edited

Loading

thomasjpfan commented Mar 3, 2022

jjerphan commented Mar 3, 2022 •

edited

Loading

ogrisel commented Mar 3, 2022

ogrisel commented Mar 3, 2022

jeremiedbb commented Mar 3, 2022

jeremiedbb commented Mar 3, 2022

ogrisel commented Mar 3, 2022

thomasjpfan commented Mar 3, 2022

ogrisel commented Mar 3, 2022

jeremiedbb left a comment

jeremiedbb Mar 25, 2022

jeremiedbb commented Mar 30, 2022

ogrisel left a comment

jeremiedbb left a comment

TST use global_dtype in sklearn/neighbors/tests/test_neighbors.py #22663

TST use global_dtype in sklearn/neighbors/tests/test_neighbors.py #22663

Conversation

jjerphan commented Mar 3, 2022 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

jeremiedbb commented Mar 3, 2022

jjerphan commented Mar 3, 2022 • edited Loading

thomasjpfan commented Mar 3, 2022

Skips float32 by default

To run float32

jjerphan commented Mar 3, 2022 • edited Loading

ogrisel commented Mar 3, 2022

ogrisel commented Mar 3, 2022

jeremiedbb commented Mar 3, 2022

jeremiedbb commented Mar 3, 2022

ogrisel commented Mar 3, 2022

thomasjpfan commented Mar 3, 2022

ogrisel commented Mar 3, 2022

jeremiedbb left a comment

Choose a reason for hiding this comment

jeremiedbb Mar 25, 2022

Choose a reason for hiding this comment

jeremiedbb commented Mar 30, 2022

ogrisel left a comment

Choose a reason for hiding this comment

jeremiedbb left a comment

Choose a reason for hiding this comment

jjerphan commented Mar 3, 2022 •

edited

Loading

jjerphan commented Mar 3, 2022 •

edited

Loading

jjerphan commented Mar 3, 2022 •

edited

Loading