Skip to content

Clean up the data in test_type_promotion.py #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 47 commits into from
Oct 19, 2021

Conversation

honno
Copy link
Member

@honno honno commented Oct 5, 2021

This will resolve #21 .

  • Move the reusable data and helper functions to a central location
  • Change the data to be keyed on the dtype objects themselves.... or maybe dtype string names where appropriate
    • Make promote_dtypes() a dict
  • Review more refactors and other niceties
    • dtype objects
    • is_*_dtype()
    • dtype ranges
  • Clean up the parameterization logic in test_type_promotion.py
    • Python scalars

@honno honno force-pushed the type-promotion-refactor branch from 9ec75a5 to 48fd544 Compare October 7, 2021 16:16
@honno honno force-pushed the type-promotion-refactor branch from d6a3379 to ce6a7e8 Compare October 11, 2021 16:37
@honno
Copy link
Member Author

honno commented Oct 11, 2021

If you were to have a look today—note that I'm definitely not happy with the current parametrize refactor heh (but now that I've finally figured out everything it should be easier to clean-up tomorrow). I will also compare the parameters with master so that I don't miss anything.

@asmeurer
Copy link
Member

Your approach to cleaning up the paramerizations seems pretty reasonable. Doing it all in a function with explicit loops instead of list comprehensions, and generating param objects instead of separate parameters and ids does look better.

@honno honno marked this pull request as ready for review October 12, 2021 16:48
@honno
Copy link
Member Author

honno commented Oct 12, 2021

I think I'm happy with how I've refactored the parametrization logic and the tests now, although it would be good if I slept on it.

I will get to cleaning up the scalar promotion test case at the bottom tomorrow too. Generally would you mind if I used the word "builtin" to specify Python-type scalars? e.g. a test method test_operator_promotes_scalar would become test_operator_promotes_builtin. I think this would help distinguish from 0d arrays.

Also I just realised I should point all references of ah.dtype_objects (and similar) to what's inside dtype_helpers.

Also also I should keep reference to the notes you made regarding internal promotions and (0,) examples somewhere.

@asmeurer
Copy link
Member

I will get to cleaning up the scalar promotion test case at the bottom tomorrow too. Generally would you mind if I used the word "builtin" to specify Python-type scalars? e.g. a test method test_operator_promotes_scalar would become test_operator_promotes_builtin. I think this would help distinguish from 0d arrays.

I usually try to call these "Python scalars". I think that's also what the spec calls them, at least right now. "Builtin" is unfortunately sometimes an overloaded term in Python, which sometimes means what you'd expect, but also sometimes means "compiled".

Also also I should keep reference to the notes you made regarding internal promotions and (0,) examples somewhere.

This probably isn't super important. Back when the tests were much simpler, an @example was possible. Unless you expect that might be the case again with the new hypothesis strategies, it's probably not that important. I expect hypothesis will always draw dimension 0 arrays anyway (also there was a typo anyway; it should be shape (), not (0,)). See https://data-apis.org/array-api/latest/API_specification/type_promotion.html#notes

@asmeurer
Copy link
Member

I also don't test NaNs and inf scalars, as I expect you want to keep that all in the generated special cases tests.

No we should keep those. The special case tests only test the functions, not the operators. And they also don't test the dtype, only the value.

{
'__add__': lambda m, M: lambda s: {
'min_value': max(m - s, m),
'max_value': min(M - s, M),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to be worrying about this. Any input value from the dtype range should work, even if the output overflows. I think if some library does something different for integer overflows, we will need to update a large part of the test suite to handle it.

@honno
Copy link
Member Author

honno commented Oct 14, 2021

I think I'm happy with this now. I'm now testing NaNs and infs.


Note that I wrap potential overflow scenarios with a try-except block. This might be all we need, as if other libraries don't take the NumPy route of overflowing-without-error, they're likely going to throw an OverflowError. In any case this easily indicates to us/others in the future that this code is potentially erroneous.

try:
    out = func(*arrays)
except OverflowError:
    reject()

FYI hypothesis.reject() is an alias hypothesis.assume(False). I use it instead of assume(False) because it plays better with mypy (e.g. flycheck won't complain that out is possibly unbound).


Good shout on the dtype sorting flaw. I've now used a method which only relies on dtype object equality. The current minimisation behaviour should be:

Promotable dtype pairs sorted by minimisation order
bool     bool   
uint8    uint8  
uint16   uint16 
uint32   uint32 
uint64   uint64 
int8     int8   
int16    int16  
int32    int32  
int64    int64  
float32  float32
float64  float64
uint8    uint16 
uint16   uint8  
uint8    uint32 
uint32   uint8  
uint8    uint64 
uint16   uint32 
uint64   uint8  
uint32   uint16 
uint16   uint64 
uint64   uint16 
uint32   uint64 
uint64   uint32 
int8     int16  
int16    int8   
int8     int32  
int32    int8   
int8     int64  
int16    int32  
int64    int8   
int32    int16  
int16    int64  
int64    int16  
int32    int64  
int64    int32  
float32  float64
float64  float32
uint8    int8   
int8     uint8  
uint16   int8   
uint8    int16  
int8     uint16 
int16    uint8  
uint32   int8   
uint16   int16  
uint8    int32  
int8     uint32 
int16    uint16 
int32    uint8  
uint32   int16  
uint16   int32  
uint8    int64  
int16    uint32 
int32    uint16 
int64    uint8  
uint32   int32  
uint16   int64  
int32    uint32 
int64    uint16 
uint32   int64  
int64    uint32

Broadly the minimisation goes:

  1. Same dtypes
  2. Dtypes in the same category
  3. Dtypes in mixed categories (i.e. uints and ints)

dtypes are ordered like so:

  1. xp.bool
  2. uints
  3. ints
  4. floats

And for pairs of mixed dtypes, I also preference the smaller dtype as the first item in the pair.

I believe you initially desired float first, then int-uint-bool. Personally I prefer the opposite scenario, which is also how hypothesis.extra.numpy/hypothesis.extra.array_api does it. I think (bool, bool) better indicates errors which are not related to dtypes at all, just an array of any dtype. (uintX, uintX) likewise better indicates problems with numerical arrays generally, as floats obviously have a larger "elements space" of real numbers, NaNs and infs.

Also, I can't actually test minimisation behaviour easily without using the MPL-licensed code Hypothesis does for this. I'll definitely mull it over though, so maybe expect a PR in the future for this.

(the code I used to print the above table)
from rich import print
from rich.table import Table
grid = Table.grid("1st", "2nd", padding=(0, 2))
for key in promotable_dtypes:
    grid.add_row(f"{key[0]}", f"{key[1]}")
print("")
print(grid)

@honno honno requested a review from asmeurer October 14, 2021 10:04
@asmeurer
Copy link
Member

I believe you initially desired float first, then int-uint-bool. Personally I prefer the opposite scenario, which is also how hypothesis.extra.numpy/hypothesis.extra.array_api does it. I think (bool, bool) better indicates errors which are not related to dtypes at all, just an array of any dtype. (uintX, uintX) likewise better indicates problems with numerical arrays generally, as floats obviously have a larger "elements space" of real numbers, NaNs and infs.

Sounds reasonable. It isn't important, but I also wonder if we should prefer the "default floating-point/integer dtype" first. I find it a little confusing that hypothesis always shrinks to float32. In my experience actually if only one of float32 or float64 fails, then float32 is the one that fails, because it is more likely to suffer bugs from overflows or upcasting. If anything, seeing "float32" makes me suspect that the shorter type is somehow relevant (although I know that this is general sort of problem with hypothesis shrinking that you just have to be on guard about, see https://hypothesis.works/articles/threshold-problem/). If the floating-point dtype is irrelevant to the failure, I think would generally prefer to see the default floating-point dtype (which is float64 for NumPy, but can be either in the spec).

Also, I can't actually test minimisation behaviour easily without using the MPL-licensed code Hypothesis does for this. I'll definitely mull it over though, so maybe expect a PR in the future for this.

Can we just import the internal hypothesis functions?

@asmeurer
Copy link
Member

Great, this works against plain numpy now. When I test it, I get parameter names like array_api_tests/test_type_promotion.py::test_func_returns_array_with_correct_dtype[abs(<class 'numpy.int8'>) -> <class 'numpy.int8'>]. The <class 'numpy.int8'> bit also appears in the error messages, but I think that's fine. But I think we should make the parameterization always give abs(int8) -> int8 like it does with numpy.array_api.

I was worried what the error messages would look like with so many @composite strategies, but it looks like they are usable. Here is an example error from numpy (which does incorrect type promotion on 0-d arrays)

func = <ufunc 'add'>, in_dtypes = (<class 'numpy.int64'>, <class 'numpy.int32'>), out_dtype = <class 'numpy.int64'>
x_filter = <function <lambda>.<locals>.<lambda> at 0x112c7c0d0>, data = data(...)

    @pytest.mark.parametrize('func, in_dtypes, out_dtype, x_filter', gen_func_params())
    @given(data=st.data())
    def test_func_returns_array_with_correct_dtype(
        func, in_dtypes, out_dtype, x_filter, data
    ):
        if len(in_dtypes) == 1:
            x = data.draw(
                xps.arrays(dtype=in_dtypes[0], shape=hh.shapes).filter(x_filter), label='x'
            )
            out = func(x)
        else:
            arrays = []
            shapes = data.draw(
                hh.mutually_broadcastable_shapes(len(in_dtypes)), label='shapes'
            )
            for i, (dtype, shape) in enumerate(zip(in_dtypes, shapes), 1):
                x = data.draw(
                    xps.arrays(dtype=dtype, shape=shape).filter(x_filter), label=f'x{i}'
                )
                arrays.append(x)
            try:
                out = func(*arrays)
            except OverflowError:
                reject()
>       assert out.dtype == out_dtype, f'{out.dtype=!s}, but should be {out_dtype}'
E       AssertionError: out.dtype=int32, but should be <class 'numpy.int64'>
E       assert dtype('int32') == <class 'numpy.int64'>
E         +dtype('int32')
E         -<class 'numpy.int64'>

array_api_tests/test_type_promotion.py:99: AssertionError
------------------------------------------------------------------------------ Hypothesis -------------------------------------------------------------------------------
Falsifying example: test_func_returns_array_with_correct_dtype(
    data=data(...),
    func=<ufunc 'add'>,
    in_dtypes=(numpy.int64, numpy.int32),
    out_dtype=numpy.int64,
    x_filter=<function array_api_tests.test_type_promotion.<lambda>.<locals>.<lambda>>,
)
Draw 1 (shapes): ((), (1,))
Draw 2 (x1): array(0)
Draw 3 (x2): array([0], dtype=int32)

The input arrays, including shapes and dtypes are both easy to see. My only gripe is I would include the name of the function in the assertion error message.

@honno
Copy link
Member Author

honno commented Oct 18, 2021

I also wonder if we should prefer the "default floating-point/integer dtype" first

Can we just import the internal hypothesis functions?

The main function I'm referring to is minimal(), which unfortunately is part of their internal tests/ utilities as opposed to being part of the hypothesis package. So regarding default ints/floats, that makes a lot of sense now, but it's probably not worth the hassle until I can figure out how to actually test this. Also I believe there's no consistent way of knowing what a library's default int/float is?

When I test it, I get parameter names like array_api_tests/test_type_promotion.py::test_func_returns_array_with_correct_dtype[abs(<class 'numpy.int8'>) -> <class 'numpy.int8'>]. The <class 'numpy.int8'> bit also appears in the error messages, but I think that's fine. But I think we should make the parameterization always give abs(int8) -> int8 like it does with numpy.array_api.

Right I see, that isn't pretty heh. I've fixed this now by mapping dtypes to names so the parameter ids are now consistent.

My only gripe is I would include the name of the function in the assertion error message.

I see that adding the func names to assertions could be useful. Additionally it could be useful to have dtypes involved, ala the parameter ids. How the func names should fit in an error message requires some thought, as it can end up making the messages harder-to-parse for a user (especially with the op funcs). So I would prefer if we left this out this PR, as I want to mull it over myself.

Generally I've tried to make error situations better. I've shortened the test method names and pass func/op names to test cases (so I dont have to pass the func obj and x_filter, which distracts error messages). With the readable parameter ids it's not the worst situation currently, and could indeed be enough, e.g.:

______________ test_func_promotion[equal(bool, bool) -> bool] ______________

func_name = 'equal', in_dtypes = (dtype('bool'), dtype('bool'))
out_dtype = dtype('bool')
...
E   AssertionError: out.dtype=float32, but should be bool
...
========================== short test summary info ===========================
FAILED test_type_promotion.py::test_func_promotion[equal(bool, bool) -> bool] AssertionError: out.dtype=float32, but should be bool
...

@honno honno mentioned this pull request Oct 19, 2021
@asmeurer
Copy link
Member

Also I believe there's no consistent way of knowing what a library's default int/float is?

You could use asarray(0.0).dtype. See https://data-apis.org/array-api/latest/API_specification/creation_functions.html#asarray-obj-dtype-none-device-none-copy-none

I see that adding the func names to assertions could be useful. Additionally it could be useful to have dtypes involved, ala the parameter ids. How the func names should fit in an error message requires some thought, as it can end up making the messages harder-to-parse for a user (especially with the op funcs). So I would prefer if we left this out this PR, as I want to mull it over myself.

Well I wish pytest would surface the assertion error messages a little better. Right now, they are sort of presented in a summary (but they are truncated). I think it was designed around most people not bothering putting messages in their assertions. But I think it's really useful for a test suite designed to be used by other people. It also has the advantage that hypothesis will treat assertions with separate messages as distinct failures.

@asmeurer
Copy link
Member

I want to merge this. It's making the other PR hard to review. We can fix any issues in a later PR.

@asmeurer asmeurer merged commit bb527e6 into data-apis:master Oct 19, 2021
@honno honno deleted the type-promotion-refactor branch October 21, 2021 11:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Clean up the data in test_type_promotion.py
2 participants