Skip to content

Make test_eye more efficient #200

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

jakevdp
Copy link
Contributor

@jakevdp jakevdp commented Nov 10, 2023

Part of #197

@jakevdp jakevdp force-pushed the array-creation-tests branch from 9decccd to e04bdb7 Compare November 10, 2023 23:33
Copy link
Member

@honno honno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing, makes sense but one major blocker I've put in review comments.

Also test failures look irrelevant to this PR 👀

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

Thanks – I wonder how you'd feel about using from_dlpack to convert the output to numpy and then use something like np.testing to assert that the elements are equal? From what I can tell, from_dlpack is not an optional part of the API, so this should be better all around. What do you think?

@jakevdp jakevdp force-pushed the array-creation-tests branch from e04bdb7 to 243dfea Compare November 13, 2023 18:18
@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

OK, I removed references to nonzero – I opted for a fast path in cases that the tests succeed, falling back to a slow element-by-element test to generate an appropriate error message in case the test fails. What do you think?

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

Hmm, it looks like the approach in the recent commit fails to correctly error on mismatched signed zeros.

It looks like the array API standard doesn't have anything akin to signbit, so I don't think there's any way to express the required logic efficiently via the array API. Seems like an oversight in the array API design, if I'm being honest.

@rgommers
Copy link
Member

It looks like the array API standard doesn't have anything akin to signbit, so I don't think there's any way to express the required logic efficiently via the array API. Seems like an oversight in the array API design, if I'm being honest.

xref data-apis/array-api#670 - a stronger use case would help move that along, and also it looked like it wasn't available in JAX (maybe that was wrong?).

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

jax.numpy.signbit is implemented: https://jax.readthedocs.io/en/latest/_autosummary/jax.numpy.signbit.html

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

Thanks for the link @rgommers! That would help to express this efficiently.

That said, I still think the best solution for the test suite would be to convert things to numpy via dlpack and use existing utilities in np.testing throughout. I understand the desire to make this not depend on numpy, but if that purity prevents the test suite from even being usable in some cases, the purity is not worth the cost.

@rgommers
Copy link
Member

Converting to numpy via DLPack only works well for CPU implementations though, there's no generic way to do it without introducing library-specific code for CuPy, PyTorch CUDA, etc. Of course there would be if x.to_device('cpu') would work, but that method is unfortunately one of the least likely to be available, nor is it guaranteed to accept the 'cpu' string.

That said: performance matters for this test suite. I also noticed last week that it was way slower than I had expected, and it makes it harder to use and contribute to. The numpy.testing utilities don't do anything particularly magical. We already generalized them for SciPy's needs: https://github.com/scipy/scipy/blob/ca135ae16ec6c8870f16b08d1a8eaca50418b1ac/scipy/_lib/_array_api.py#L277. I assume JAX has the same functionality too. How about we extend that and include such a function in the test suite here?

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

The numpy.testing utilities don't do anything particularly magical.

Understood, but numpy is a solid reference implementation, and numpy.testing already has the logic we're recreating here to print useful errors when arrays mismatch. Seems easier to just use that than to re-invent the whole thing, no?

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

I assume JAX has the same functionality too.

No, JAX uses numpy.testing for the reasons I mentioned above.

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

I think the core issue here is

  1. A requirement of the test suite is to ensure that signs of float zeros match in some cases
  2. There is no way to check that within the array API

...and so

  1. The only way to implement these tests is via eager python iteration over array elements, which is sort of slow in numpy and prohibitively slow for JAX and likely for other packages.

We could fix that by either (1) ensuring the Array API includes all elementwise operations required to validate its own specification, or (2) choose to validate those requirements in a more flexible reference implementation like numpy.

(1) is blocked by the lack of signbit in the array API, (2) is blocked by the lack of functionality to copy data to a numpy CPU array.

@rgommers
Copy link
Member

I don't think the lack of signbit should be a blocker here - and if so, let's add it to the standard asap. The signed zeros and other special cases can be separate tests, but should not impede vectorized checking a la assert_allclose for a large majority of test cases. @honno mentioned an issue dealing with nan comparisons too, which should also not be blocking - we can use isnan, ensure all present nan's are in the same locations, and then fill them.

No, JAX uses numpy.testing for the reasons I mentioned above.

Good to know. In SciPy we can easily add JAX support that way (and I assume we'll get to adding JAX support in SciPy at some point in the first half of next year).

Seems easier to just use that than to re-invent the whole thing, no?

I think using it unconditionally is tricky for the non-CPU cases, but if there's no better option than we should. We need vectorized checks. I think we can make it non numpy specific without too much trouble.

Also, as a data point, the SciPy xp_assert_close has led to several recent improvements to numpy.testing, because it turned out that the PyTorch version of assert_close was more strict than the NumPy one (e.g., it checks if .dtype's match), so it can be useful for things to be independent of numpy.


I had a closer look at assert_allclose and assert_array_compare, and while it's ~200 lines of code, I think it's doable to write that with array API standard primitives only. Which will also make the code quite a bit shorter, more readable and possibly faster, because we don't have to deal with ndarray subclasses, object, datetime and other non-numeric dtypes, strict/non-strict flavors, or non-default equal_inf.

We probably need only a couple of functions. In SciPy we found so far that we needed assert_allclose, assert_array_less and assert_equal. I'm tempted to take the relevant parts of https://github.com/numpy/numpy/blob/main/numpy/testing/_private/utils.py and array-api-ify them. Did anyone already try an exercise like that?

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

Sounds good! Thanks for tackling this, @rgommers!

@asmeurer
Copy link
Member

We just need to be careful that we minimize the number of array functions that we actually use in a test helper. Because

  1. Any array library that doesn't implement one of the function wouldn't able to use the test suite at all.
  2. The tests for those functions themselves will need to be implemented in the more naive way to avoid circularity.

If we are adding something like signbit as a new function, we may need to add conditions that fallback to handle the case where it isn't implemented yet. Doing that may be a good idea for other functions as well, especially if they aren't universally implemented (e.g., if there is a helper for it for some libraries in the compat library).

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 13, 2023

Thanks @asmeurer, that makes sense.

If you're worried about circular reasoning and/or complexity in the testing utilities, I think the natural solution would be to find a strategy to export the data to numpy and use numpy.testing in this test suite. That would sidestep all the concerns you have about building the evaluation tools from the building blocks you're trying to evaluate!

I'd point out that performance aside, the current approach has that issue as well: for example, if my array library has a bug where indexing always returns zero, I think that would be enough to make a lot of tests pass, even if the functions being tested are returning garbage!

@asmeurer
Copy link
Member

asmeurer commented Nov 14, 2023

We need to take a pragmatic approach. Weird corner cases that wouldn't actually happen for a real array library where the tests would technically pass aren't that interesting.

But helper functions using an array function that isn't implemented, causing every single test to fail is a very real concern. This sort of thing has already happened and it's why the test suite has some affordances for things like not every dtype being implemented (because pytorch doesn't implement uint64). We want to maintain the ability to use the test suite to produce reasonable results for the main NumPy or Pytorch namespaces (i.e., not just run against the compat library). Some functions, like asarray, are absolutely required, but we want to minimize that set of functions.

Similarly, a test function that just uses itself to test itself would pass with any possible implementation, not just weird corner cases. For example, numpy/numpy#10322, is a wrong result from < (it comes from NumPy's value-based type promotion). We wouldn't be able to catch it except by having a test that actually tests the numerical values in a non-circular way.

But if there's only a handful of these base functions, we can test everything else indirectly using them.

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 14, 2023

I think we’re advocating for the same thing: an “export to numpy” function seems like it would be sufficient for all existing tests, and probably represents the minimum possible requirement in order to validate the API in a way that doesn’t lead to poor performance.

@rgommers
Copy link
Member

I'm tempted to take the relevant parts of https://github.com/numpy/numpy/blob/main/numpy/testing/_private/utils.py and array-api-ify them.

I have half an implementation and will complete it, but it does use a fair amount of API surface, so it doesn't quite meet the requirements @asmeurer and @honno have.

@jakevdp the current diff in this PR looks quite good. If you add signbit to it, can it pass all CI checks? If so, I think that's a good way to go here. If with that plus using --max-examples=1 as commented on in #197 (comment) we can unblock all current development needs, I think we should be in good shape.

@jakevdp
Copy link
Contributor Author

jakevdp commented Nov 14, 2023

Thanks! With the addition of signbit (and some use of isnan) I think I could update this PR in a way that would replicate the previous behavior of the check without having to loop over array elements. How soon would a decision about adding signbit propagate to this package?

@rgommers
Copy link
Member

How soon would a decision about adding signbit propagate to this package?

I think we should be able to sort the decision on that out by Thursday evening, and then we can immediately add it.

@honno
Copy link
Member

honno commented Feb 26, 2024

Closing this PR as its superseded by #236. Thanks for all the work and discussion here—utilising vectorised methods for assertions throughout the test suite is my number 1 priority!

@honno honno closed this Feb 26, 2024
@jakevdp jakevdp deleted the array-creation-tests branch February 26, 2024 17:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants