-
Notifications
You must be signed in to change notification settings - Fork 45
Add multi-dimensional arrays support for existing elementwise tests #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I actually meant work your way from the bottom up, fixing the existing tests (so starting with For trunc, we the test should be similar to the tests for floor, ceil, and round (in effect, those four functions implement the four types of rounding modes, round to infinity, negative infinity, even, and zero). Actually, I think the spec text is wrong for |
Basically, I just want you to update the existing elementwise tests to accept arbitrary arrays. The ones written later (nearer the bottom of the file) will be easier to update because I wrote them with arbitrary arrays in mind, so I actually did things like used masks and iterated all indices where necessary. But for some of the tests nearer the top I did not, so they will need to be fixed to do that. |
bef69cf
to
6c5368f
Compare
So I've added ndarrays support for most of the existing tests. Many worked as-is, some I had to rework a tiny bit. I avoided the bitwise ones for now as they only support 0d arrays. Notably I've aliased |
The idea at least in my head was that the array_helper would be the array functions that are allowed to be used in the tests, which would then be kept as minimal as possible. It's not ended up being as minimal as I had initially hoped, so I'm not sure if it was a great idea. Although we could definitely trim it down. |
The bitwise tests will need to just iterate over all the elements using the |
Ah I see, it makes sense now thanks. Will revert this change.
Sweet, I'll take a stab. |
x1, x2 = args | ||
@given(two_mutual_arrays(numeric_dtype_objects)) | ||
def test_add(x1_and_x2): | ||
x1, x2 = x1_and_x2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not ideal, but thinking about it, I think it's the best option. The alternative with shared stuff would just make the strategies really messy. It's really too bad that Python dropped the def f((x, y))
syntax that used to be allowed in Python 2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, was my thinking here. It's a shame matching that matching signatures of array module methods to their respective tests is quite tricky for a lot of these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if two_mutual_arrays could instead be a function that returns two shared strategies. It would then be
@given(*two_mutual_arrays(numeric_dtype_objects))
def test_add(x1, x2):
...
It would make two_mutual_arrays a little more complex, but at least that complexity would be confined to a single helper function in the hypothesis helpers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works great, thanks!
Why did GitHub Actions not run on this pull request? |
I checked and the elementwise tests on this PR all pass against the NumPy implementation (except for floor_divide, which also has an issue in master, which is related to data-apis/array-api#264). |
Prevents the test failing when using `pytest --max-examples=1`
e9abb15
to
f7838d3
Compare
I've implemented ndarray support for the bitwise tests. Note I broadcast paired arrays before generating indices, like in I've ended up aliasing the following modules: I also now create |
Broadcasting as in test_equal is fine. Fixing it requires generalizing Re the namespacing: being explicit about ah vs. xp is probably a good idea if we really do want to be explicit about the difference between them. The downside to namespacing is more typing, and also it breaks pyflakes. Having the |
@@ -54,10 +54,11 @@ def test_kwargs(): | |||
results = [] | |||
|
|||
@given(hh.kwargs(n=st.integers(0, 10), c=st.from_regex("[a-f]"))) | |||
@settings(max_examples=100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this? Isn't max_examples already 100 by default? If we set the --max-examples
flag at the command line (see conftest.py) which takes precedence?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep @settings
takes precedence over everything else. I do this so this test should pass even when we use --max-examples=1
. This is actually what a few tests in Hypothesis does, to prevent false-negatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you would ever pass max-examples=1. And even if you did, why wouldn't want it to apply to this test? I typically use max-examples to increase the number of examples to do a more rigorous test run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, a few times now I've run pytest --max-examples=1
for the whole suite, so I thought forcing max_examples
here would reduce noise for myself and generally prevent any noisy false-negatives if someone else had the same idea. Generally I think it clarifies that we're not using Hypothesis to test things, but testing Hypothesis itself i.e. see if our custom strategy will work for a "typical" run. I imagine both of these reasons are why Hypothesis does this internally too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test fail if it is run with max_examples=1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, as expected. In Hypothesis this kind of max_example
forcing would be covered by their find_any
util.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How this test_kwargs
test method tries to find that certain things will be generated is essentially what find_any
does in their own test suite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I missed that. I didn't notice that this was on an inner test function, not the outer one.
xps.arrays( | ||
dtype=hh.numeric_dtypes, | ||
shape=xps.array_shapes(), | ||
elements={"allow_nan": False, "allow_infinity": False}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's use masks like in the other tests.
We can think about whether restricting the inputs like this makes more sense than using masks. There are advantages and disadvantages to both ways. The main advantage is that some array libraries might not support masks, which would make the test suite in its current form mostly useless. But the disadvantage is that by doing this, we are technically testing less. Elementwise functions should operate on each element of an array independently, and the tests here check that. The exception is if the spec says that a certain value is not supported or may raise an exception. This sort of thing isn't spelled out for the elementwise functions as well is it could be (data-apis/array-api#199), so there's some degree that the spec needs to be improved here as well.
This PR will implement the stub elementwise tests. Uses stuff from #18 so I just copied it's commit history for now—can rebase this PR to whatever merge/squash/rebase you do with that PR once it's in master.
I'll try starting from the bottom as you suggested. Right now there's just
test_trunc
, if you'd like to take a look.