-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Implement new casting loops based on NEP 42 and 43 #17863
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
raise AssertionError("did not find strategy to fill cast array") | ||
|
||
strategy = strategies.tuples(*([strategy] * length)) | ||
return data.draw(strategy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Zac-HD just in case you have time to have a look. These tests work, but are ugly (so I would consider just deleting them if they ever make trouble).
So I am curious if you have an idea for a better solution, I would like to create a single array with "complicated" elements to test against. That could be randomized. It then ends up casting, which is tricky, because casting is in general tricky... (So hacking the expected casting result is probably worse than the abuse of hypothesis.)
OTOH, it would be cool to improve the tests so to check cast functions for unaligned and strided, etc. inputs conveniently and thoroughly (and maybe also have a blue-print for testing ufuncs better in this regard)
It looks like using Hypothesis' built-in support for Numpy could help - in particular it looks like you've implemented part of the from hypothesis import given
from hypothesis.extra import numpy as npst
@given(
npst.arrays(
dtype=npst.nested_dtypes(), # scalar, or array, or structured dtypes; recursively
shape=npst.array_shapes(min_dims=0, min_side=0), # any array shape
)
)
def test_foo(arr):
# `arr` is a Numpy array which could be *extremely* strange w.r.t. dtype and shape
...
Alternatively, a more direct translation would be: @pytest.mark.parametrize("from_dt", simple_dtype_instances())
@given(strategies.data())
def test_simple_direct_casts(self, from_dt, data):
values = data.draw(
npst.arrays(
# Automatically uses `npst.from_dtype()` to get the elements strategy
dtype=from_dt,
# Allows us to shrink to an array of just one element on failure
shape=st.integers(1, self.size // dtype.itemsize)
)
)
... Regarding I'd also be pretty nervous about only generating five examples, when these tests can cover so many different combinations, but that's really down to your performance budget. However this would be a perfect case for using a coverage-guided fuzzer like Atheris (notable for doing both Python and C code coverage) to really shake out any subtle issues (worked example here). I'm in Canberra (UTC+11), but happy to chat in my morning / your evening at your convenience if that would be helpful. Standing offer also covers to code review or writing tests for Numpy if there's something specific which would be useful 😁 |
Thanks Zac! Let me look a bit closer at the array strategies next week. I guess I did not do it when I wrote this (it is a while ago), but than it probably would make sense to sit down some time briefly. It is plausible that this could also just use a single (slightly) nested dtype strategy. If we manually unnest nested dtypes (with array views) that will effectively test all those complicated cases (strided, unaligned, at least hopefully). And the nested dtypes are not super well tested currently... (The test suit seems to find even subtle bugs, but often in unrelated tests ;)) |
That's certainly been my experience applying Hypothesis to Numpy, and why I'm reluctant to write more tests without advice from maintainers! Finding more bugs can be a good thing long term, but it does add a bit of friction to each PR :/ |
Thanks, I decided to just simplify the tests a lot for now (using only small random numbers). For the structured dtypes, I think hypothesis could be very cool:
But quite frankly, the complexity of such a test is probably so big, that I am unsure it is worth the trouble (at least as part of this PR). So what I will do is update the PR in a bit and consider it finished (with the caveat that I will check coverage results to add simple tests that to ensure decent code coverage). |
463e6da
to
e507b30
Compare
This is for 1-d strided loops? What about the more general nd case when strides on one axis may skip elements? |
@mattip not sure where you are aiming? The core implementation only uses 1-D strided chunks and the implementation for calling that multiple times is one level higher (and unchanged). I admit, there may be a point in adding more tests where many calls for the 1-D loop are necessary, just to make sure there is nothing wrong with that case. |
Right, sorry. I was thinking of something else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On a first pass through, this looks good, thanks. It would be nice to test more of the error conditions and also to test the auxilary data mechanism. I only see additional tests, are there tests that are now redundant?
*out_transferdata = NULL; | ||
PyArrayMethodObject *meth = context->method; | ||
|
||
int nargs = context->method->nin + context->method->nout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
int nargs = context->method->nin + context->method->nout; | |
int nargs = meth->nin + meth->nout; |
"_simple_strided_call() takes exactly one tuple with as many " | ||
"arrays as the method takes arguments (%d+%d).", nin, nout); | ||
return NULL; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coverage thinks the error condition needs a test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't bother adding tests for this (at least for now) private API yet. But can add it here.
NPY_AUXDATA_FREE((NpyAuxData *)res); | ||
return NULL; | ||
} | ||
return (NpyAuxData *)res; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this covered by a test? Maybe worth trying to think of one that has a use for this code path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the only way to reach this is using np.nditer(...).copy()
probably, but can add tests.
Anything else that jumps at you aside from a few additional tests? Most of the error paths (aside those in array_method.c
) are probably (almost) impossible to tests, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it all looks quite straightforward although repetitive. It would be nice to try to refactor this somehow, but I don't see how. There is not much that needs careful refcount considerations or malloc/free checking, since it is mostly logic to assign function pointers.
One thing to think about would be to split the large files into smaller pieces.
f0535f2
to
19ed336
Compare
19ed336
to
16b88fe
Compare
Did another small updates on the test. Yes, that Do you have any specific tests in mind that would be good? Testing this is tricky, so there could well be something I forgot. |
Oh, I still need to change the "use new code paths" flag back to silence the doctest error, since that would confuse codecov, I will do that at the very end. |
The code itself looks fine. It seems like we are adding alot of boilerplate but I don't see how to avoid it. There is a conflict and coverage is still complaining that some of the test conditions are never hit. Could you write some c-level tests that abuse the API just to see that the errors are correctly raised with no problems? |
Well, the boilerplate has a purpose... It adds a layer of separation between code paths that need to be separated. Ideas welcome, but the only idea I had would be to use a I had hoped I can push that off, until that API is actually made public and the errors are more than sanity checks on new additions to NumPy, because that probably makes the testing also nicer (e.g. you had the idea of introducing that test-time compilation infrastructure), but yes, I could do that. |
Since it is internal we can put off adding tests for now, but should somehow mark it with a comment |
16b88fe
to
b6a9f76
Compare
It seems the only reason this did not cause problems before was that the test checking for C warnings only ran on a run where these were always 0 and thus undefined (and so were not redefined later). The `#ifndef` would have to happen at a later time apparently, so just avoid it.
b6a9f76
to
787ef65
Compare
Well, I added such a comment to the private methods and the internal fromspec (in the header). Also squashed and the doctest flag switched (which means codecov will probably be useless now). |
Build is failing, typo?:
|
787ef65
to
18d69f2
Compare
Oh man, should have just run the tests even for a comment change I guess... |
switch to using new casting as default to run full CI Initialize casting to silence compile warnings only... BUG: As my comment even said, out_needs_api is caller-initialized so I shouldn't init it to 0, it might disable api again. Simplify tests again, it seems unnecessar to go this complex Simplify `#if` as the variable should always be defined TST: Add more tests (and some related fixes) This fixes a few small things (i.e. the decref loop can be NULL and checking for writeable was incorrect in the private API). Mostly adds new tests for some internal API as well as for the copy of the auxiliar data. The nditer test however also is a heavyweight tests for the general multistep casting
18d69f2
to
90f4b08
Compare
Thanks @seberg |
@seberg see the linked issues this casts python objects to strings which is a huge breaking change |
This PR should be mostly finished, but I will still check code-coverage etc. It is the followup for the initial casting PR, and now changes all casts to go through the
ArrayMethod
machinery (some "legacy" casts will still use the very old machinery though).There are a few things to note:
[ ] Some of headers I added toMaybe it doesn't matter, thenumpy/core/src/common/lowlevel_strided_loops.h
should probably move to the newnumpy/core/src/multiarray/dtype_transfer.h
.object
ones had to be because they use new private API, but the other loops are old-style, so keeping them in the old place is probably just as well.There are a few things we should possibly discuss, but it is not super pressing right now:
move_references
is a flag that we use to say that the source of a casting function (ufuncs would be the same in a sense) is a buffer, and the content of that buffer should be cleared during iteration. Clearing buffers (also on error!) is a bit tricky and already broken. I have not yet attempted at fixing all of it. Themove_references
mechanism seems useful for this (although probably it doesn't matter for the "mini buffers" used indtype_transfer.c
). I suppose we should just keep it around, and probably can just live with it also for ufuncs... It could even turn out to be useful there. Plausibly, we will need to add a flag so that anArrayMethod
can indicate that it supportsmove_references
.itemsize-aligned
flag (basically for complex).In any case, there should not be much left to do here, so this is ready for review, although maybe not quite for merging.