-
Notifications
You must be signed in to change notification settings - Fork 45
Clean up the data in test_type_promotion.py #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
9ec75a5
to
48fd544
Compare
d6a3379
to
ce6a7e8
Compare
If you were to have a look today—note that I'm definitely not happy with the current parametrize refactor heh (but now that I've finally figured out everything it should be easier to clean-up tomorrow). I will also compare the parameters with master so that I don't miss anything. |
Your approach to cleaning up the paramerizations seems pretty reasonable. Doing it all in a function with explicit loops instead of list comprehensions, and generating param objects instead of separate parameters and ids does look better. |
I think I'm happy with how I've refactored the parametrization logic and the tests now, although it would be good if I slept on it. I will get to cleaning up the scalar promotion test case at the bottom tomorrow too. Generally would you mind if I used the word "builtin" to specify Python-type scalars? e.g. a test method Also I just realised I should point all references of Also also I should keep reference to the notes you made regarding internal promotions and |
I usually try to call these "Python scalars". I think that's also what the spec calls them, at least right now. "Builtin" is unfortunately sometimes an overloaded term in Python, which sometimes means what you'd expect, but also sometimes means "compiled".
This probably isn't super important. Back when the tests were much simpler, an @example was possible. Unless you expect that might be the case again with the new hypothesis strategies, it's probably not that important. I expect hypothesis will always draw dimension 0 arrays anyway (also there was a typo anyway; it should be shape |
No we should keep those. The special case tests only test the functions, not the operators. And they also don't test the dtype, only the value. |
{ | ||
'__add__': lambda m, M: lambda s: { | ||
'min_value': max(m - s, m), | ||
'max_value': min(M - s, M), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need to be worrying about this. Any input value from the dtype range should work, even if the output overflows. I think if some library does something different for integer overflows, we will need to update a large part of the test suite to handle it.
I think I'm happy with this now. I'm now testing NaNs and infs. Note that I wrap potential overflow scenarios with a try-except block. This might be all we need, as if other libraries don't take the NumPy route of overflowing-without-error, they're likely going to throw an try:
out = func(*arrays)
except OverflowError:
reject() FYI Good shout on the dtype sorting flaw. I've now used a method which only relies on dtype object equality. The current minimisation behaviour should be: Promotable dtype pairs sorted by minimisation order
Broadly the minimisation goes:
dtypes are ordered like so:
And for pairs of mixed dtypes, I also preference the smaller dtype as the first item in the pair. I believe you initially desired float first, then int-uint-bool. Personally I prefer the opposite scenario, which is also how Also, I can't actually test minimisation behaviour easily without using the MPL-licensed code Hypothesis does for this. I'll definitely mull it over though, so maybe expect a PR in the future for this. (the code I used to print the above table)from rich import print
from rich.table import Table
grid = Table.grid("1st", "2nd", padding=(0, 2))
for key in promotable_dtypes:
grid.add_row(f"{key[0]}", f"{key[1]}")
print("")
print(grid) |
Sounds reasonable. It isn't important, but I also wonder if we should prefer the "default floating-point/integer dtype" first. I find it a little confusing that hypothesis always shrinks to float32. In my experience actually if only one of float32 or float64 fails, then float32 is the one that fails, because it is more likely to suffer bugs from overflows or upcasting. If anything, seeing "float32" makes me suspect that the shorter type is somehow relevant (although I know that this is general sort of problem with hypothesis shrinking that you just have to be on guard about, see https://hypothesis.works/articles/threshold-problem/). If the floating-point dtype is irrelevant to the failure, I think would generally prefer to see the default floating-point dtype (which is float64 for NumPy, but can be either in the spec).
Can we just import the internal hypothesis functions? |
Great, this works against plain I was worried what the error messages would look like with so many @composite strategies, but it looks like they are usable. Here is an example error from
The input arrays, including shapes and dtypes are both easy to see. My only gripe is I would include the name of the function in the assertion error message. |
The main function I'm referring to is
Right I see, that isn't pretty heh. I've fixed this now by mapping dtypes to names so the parameter ids are now consistent.
I see that adding the func names to assertions could be useful. Additionally it could be useful to have dtypes involved, ala the parameter ids. How the func names should fit in an error message requires some thought, as it can end up making the messages harder-to-parse for a user (especially with the op funcs). So I would prefer if we left this out this PR, as I want to mull it over myself. Generally I've tried to make error situations better. I've shortened the test method names and pass func/op names to test cases (so I dont have to pass the func obj and
|
You could use
Well I wish pytest would surface the assertion error messages a little better. Right now, they are sort of presented in a summary (but they are truncated). I think it was designed around most people not bothering putting messages in their assertions. But I think it's really useful for a test suite designed to be used by other people. It also has the advantage that hypothesis will treat assertions with separate messages as distinct failures. |
I want to merge this. It's making the other PR hard to review. We can fix any issues in a later PR. |
This will resolve #21 .
promote_dtypes()
a dictis_*_dtype()
test_type_promotion.py