Skip to content

ENH Add common Array API tests and estimator tag #26372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 13, 2023

Conversation

betatim
Copy link
Member

@betatim betatim commented May 15, 2023

Reference Issues/PRs

#26348

What does this implement/fix? Explain your changes.

This adds a "Array API support" estimator tag that allows the common estimator checks to test if an estimator's Array API support full fills basic requirements.

The idea is to have these tests as common tests instead of having them duplicated in the tests for each estimator (#26243, #26315, #22554, #25956).

Any other comments?

What would be a good way to have an additional dtype (float16) in the torch test? We can't just add it because some estimators (like LinearDiscriminantAnalysis) don't support it. But some others do. So we need a way to detect if an estimator should work or not with float16s.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we also need tests to test the new added common checks

"""Check that the array_api Array gives the same results as ndarrays."""
xp = pytest.importorskip(array_namespace)

X, y = make_classification(random_state=42)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't we use load_iris here instead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the question.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rest of the common tests use the iris dataset, we don't seem to use make_classification anywhere else. Here we could use iris too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we repeat that everywhere, I wanted to see the impact in terms of computational resources and both options are cheap enough.

In [1]: from sklearn.datasets import load_iris

In [2]: %timeit load_iris()
215 µs ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [3]: from sklearn.datasets import make_classification

In [4]: %timeit make_classification(random_state=42)
164 µs ± 721 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

We could switch to load_iris for the sake of consistency but no strong opinion.

@ogrisel
Copy link
Member

ogrisel commented May 16, 2023

Would it be possible to consolidate the two check_* functions together to avoid introducing a PyTorch specific function?

It should be possible to call the standard some_xp_array.to_device(ns_specific_device_name_or_object) for namespaces that support multiple devices or alternatively rely on asarray(data, device=ns_specific_device_name_or_object).

EDIT: you might need https://github.com/data-apis/array-api-compat to expose torch as a regular array namespace to make the code of the checker library agnostic beyond the namespace name to create the arrays and get an asarray constructor:

>>> import array_api_compat.torch as xp
>>> xp.asarray(np.arange(10), dtype=xp.float32, device="cpu")
tensor([0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])


est_xp_param_np = _convert_to_numpy(est_xp_param, xp=xp)
assert_allclose(
attribute, est_xp_param_np, err_msg=f"{key} not the same", atol=1e-3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually a bit surprised to see atol=1e-3. Is there a particular reason why we have such a large tolerance? I know it is inherited from the existing tests, but I don't know the reasoning for it originally either. It just seems especially large compared to our tolerances on many other algorithms and implementations.

Copy link
Member

@thomasjpfan thomasjpfan May 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is comparing the output of results from a GPU and CPU, which can result in bigger differences. For example, PyTorch sets atol=1e-3 and rtol=1e-3 when comparing CPU and GPU operations: https://github.com/pytorch/pytorch/blob/c618093681bfbb7fca6159de5b73ffceb5b4e0f5/test/test_ops.py#L287

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a good enough explanation @Micky774 or should we dig a bit more?

Worth adding Thomas' answer as a comment for people from the future who have the same question?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After seeing Olivier's change to 100 * finfo(dtype).eps, should we still add a comment?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It wouldn't hurt to leave a comment about how these error bounds are tighter than PyTorch which is a strong precedence.

@ogrisel
Copy link
Member

ogrisel commented May 30, 2023

I went ahead and fixed the tests for cupy and removed the now redundant pytorch test.

I also changed the atol value to be 100 x finfo(dtype).eps and the tests pass on a machine with a cuda GPU (both cupy and pytorch):

$ pytest -k check_array_api_input sklearn/tests/test_common.py  -vx
============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.11.3, pytest-7.3.1, pluggy-1.0.0 -- /data/parietal/store3/work/ogrisel/mambaforge/envs/cuda/bin/python3.11
cachedir: .pytest_cache
rootdir: /home/parietal/ogrisel/code/scikit-learn
configfile: setup.cfg
collected 9857 items / 9851 deselected / 6 selected                                                                                                                                                              

sklearn/tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=numpy.array_api)] PASSED                                                                  [ 16%]
sklearn/tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=cupy.array_api)] PASSED                                                                   [ 33%]
sklearn/tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=torch,dtype=float64,device=cpu)] PASSED                                                   [ 50%]
sklearn/tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=torch,dtype=float32,device=cpu)] PASSED                                                   [ 66%]
sklearn/tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=torch,dtype=float64,device=cuda)] PASSED                                                  [ 83%]
sklearn/tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input(array_namespace=torch,dtype=float32,device=cuda)] PASSED                                                  [100%]

================================================================================ 6 passed, 9851 deselected, 77 warnings in 6.27s =================================================================================

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming CI is green, LGTM.

@ogrisel
Copy link
Member

ogrisel commented May 30, 2023

CI is not green when array-api-compat is not installed. Will update the tests.

@ogrisel
Copy link
Member

ogrisel commented May 30, 2023

@betatim I also made sure that estimator checks can be executed without the dependency on pytest by raising SkipTest manually instead of relying on pytest.skip or pytest.importorskip.

@ogrisel
Copy link
Member

ogrisel commented May 31, 2023

I missed some pytest imports in sklearn/utils/tests/test_estimator_checks.py, hopefully fixed in the last commit.

@adrinjalali
Copy link
Member

Shouldn't we have a place where we explain how an environment can be created where these tests are not skipped? We're adding quite some environment complexity here with a few dependencies, it'd be nice to have them documented. Otherwise this LGTM.

@ogrisel
Copy link
Member

ogrisel commented Jun 1, 2023

Shouldn't we have a place where we explain how an environment can be created where these tests are not skipped

As soon as the optional dependencies (array-api-compat and pytorch or cupy) are installed and the matching hardware is available (e.g. cuda for torch gpu and cupy), the relevant tests are not skipped anymore. The skip tests messages seem explicit enough to know what to install, no? I am worried that adding doc on how to install such extra packages will be redundant with the installation instructions of said packages and I am worried that our doc might not stay up to date.

Or maybe we could add a list of links to the installation instructions of the afore mentioned packages at the bottom of the Array API doc page in scikit-learn?

@betatim
Copy link
Member Author

betatim commented Jun 1, 2023

I added a short section to develop.rst about the dependencies you need, that there are common checks and that things get skipped if not available.

@adrinjalali
Copy link
Member

The skip tests messages seem explicit enough to know what to install, no?

When I write tests, there are hundreds of skipped tests and tons of xfailed ones, it's not reasonable to expect that one can easily find steps to make all the tests run. But it's also not super crucial maybe.

I just don't understand where we're testing the GPU code here in our CI. If we have a job doing that, I'd be happier with having the PR merged.

@thomasjpfan
Copy link
Member

I just don't understand where we're testing the GPU code here in our CI. If we have a job doing that, I'd be happier with having the PR merged.

On the CI, we are only testing Array API support using PyTorch on CPU and NumPy's Array API implementation. For coverage, PyTorch on CPU will test the same code paths as a GPU array. The only thing we are missing from the CI is the correctness of the GPU result.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, I think this looks good.

@betatim
Copy link
Member Author

betatim commented Jun 1, 2023

Comments addressed.

Testing on an actual GPU in CI would be nice. Even if we only do it once a week or so. However, it is proving hard to find a CI with a GPU :-/

@aktech
Copy link

aktech commented Jun 1, 2023

it is proving hard to find a CI with a GPU :-/

Hey @betatim not anymore. You can try https://cirun.io it supports GPU runners for GitHub Actions, spun up on a cloud provider like say AWS.

Here is an example of a sample run: https://github.com/pystatgen/sgkit/actions/runs/5142554146/jobs/9256432281

Here is an example configuration:

https://github.com/pystatgen/sgkit/blob/main/.cirun.yml

@betatim
Copy link
Member Author

betatim commented Jun 2, 2023

My comment about hardness might have factored in the hardness of finding someone who pays for the costs of running CI on GPUs ;)

But while I have an expert here: what alternatives/competitors exist to cirun.io?

@aktech
Copy link

aktech commented Jun 2, 2023

My comment about hardness might have factored in the hardness of finding someone who pays for the costs of running CI on GPUs ;)

Yeah, that's the tricky bit.

But while I have an expert here: what alternatives/competitors exist to cirun.io?

None that I'm aware of in the GPU space, I read somewhere that CircleCI supports GPUs as well on enterprise plans, but have never tried it. There is another called buildjet in non-gpu space.

@rgommers
Copy link
Contributor

rgommers commented Jun 2, 2023

Cost isn't really the problem, unless you want to run it by default on PRs. Above you (@betatim) said "Even if we only do it once a week or so." - that'd be like $100/yr. It's admin'ing the integration and cloud account that's the painful part.

None that I'm aware of in the GPU space, I read somewhere that CircleCI supports GPUs as well on enterprise plans, but have never tried it.

PyTorch CI used to run completely on CircleCI; it is very good, but also very expensive.

@aktech
Copy link

aktech commented Jun 2, 2023

It's admin'ing the integration and cloud account that's the painful part.

That's something I can help with, I already do that with some projects, if I can get a cloud account.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the changes on a local machine with CuPy and left some suggestions to get the common test working with CuPy.

ogrisel and others added 5 commits June 4, 2023 17:32
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
The namespace is not exactly the same as the module.
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@betatim
Copy link
Member Author

betatim commented Jun 6, 2023

Addressed the latest rounds of comments from Thomas

@ogrisel
Copy link
Member

ogrisel commented Jun 13, 2023

@thomasjpfan I think your comments were taken into account. Merge?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update! LGTM

@thomasjpfan thomasjpfan changed the title Add common Array API tests and estimator tag ENH Add common Array API tests and estimator tag Jun 13, 2023
@thomasjpfan thomasjpfan merged commit d29f78e into scikit-learn:main Jun 13, 2023
@betatim betatim deleted the common-test-array-api branch June 21, 2023 15:03
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants