MAINT: Implement new casting loops based on NEP 42 and 43 #17863

seberg · 2020-11-28T00:52:37Z

This PR should be mostly finished, but I will still check code-coverage etc. It is the followup for the initial casting PR, and now changes all casts to go through the ArrayMethod machinery (some "legacy" casts will still use the very old machinery though).

There are a few things to note:

This implements a new "chaining" of casts. Chaining also previously was implemented, but most of it should move into the new style chaining. (some of it is still old-style also to keep diffs smaller).
I implemented created new loops for object casts, since it is one of the more important casts stuck on the "legacy" system. It would have to be fixed eventually (and has higher priority than the others). This might have switched some crazier corner cases (similar to the array-coercion corner cases).
~~[ ] Some of headers I added to numpy/core/src/common/lowlevel_strided_loops.h should probably move to the new numpy/core/src/multiarray/dtype_transfer.h.~~ Maybe it doesn't matter, the object ones had to be because they use new private API, but the other loops are old-style, so keeping them in the old place is probably just as well.
Copying a complex array may be slowed down currently. The issue around alignment for complex datatypes kicks in here and makes things a bit tricky.

There are a few things we should possibly discuss, but it is not super pressing right now:

move_references is a flag that we use to say that the source of a casting function (ufuncs would be the same in a sense) is a buffer, and the content of that buffer should be cleared during iteration. Clearing buffers (also on error!) is a bit tricky and already broken. I have not yet attempted at fixing all of it. The move_references mechanism seems useful for this (although probably it doesn't matter for the "mini buffers" used in dtype_transfer.c). I suppose we should just keep it around, and probably can just live with it also for ufuncs... It could even turn out to be useful there. Plausibly, we will need to add a flag so that an ArrayMethod can indicate that it supports move_references.
I really want to look into finding a better solution for alignment. But unless we pass alignment information for all operands individually (i.e. by passing all array flags?!), the only other option would be to pass a new itemsize-aligned flag (basically for complex).

In any case, there should not be much left to do here, so this is ready for review, although maybe not quite for merging.

seberg · 2020-11-28T18:50:59Z

numpy/core/tests/test_casting_unittests.py

+            raise AssertionError("did not find strategy to fill cast array")
+
+        strategy = strategies.tuples(*([strategy] * length))
+        return data.draw(strategy)


@Zac-HD just in case you have time to have a look. These tests work, but are ugly (so I would consider just deleting them if they ever make trouble).

So I am curious if you have an idea for a better solution, I would like to create a single array with "complicated" elements to test against. That could be randomized. It then ends up casting, which is tricky, because casting is in general tricky... (So hacking the expected casting result is probably worse than the abuse of hypothesis.)

OTOH, it would be cool to improve the tests so to check cast functions for unaligned and strided, etc. inputs conveniently and thoroughly (and maybe also have a blue-print for testing ufuncs better in this regard)

Zac-HD · 2020-11-29T10:42:35Z

It looks like using Hypothesis' built-in support for Numpy could help - in particular it looks like you've implemented part of the from_dtype() function (which takes care of complex width, for example). That's even used under the hood if you just ask for arrays with some dtype, so e.g.

from hypothesis import given
from hypothesis.extra import numpy as npst

@given(
    npst.arrays(
        dtype=npst.nested_dtypes(),  # scalar, or array, or structured dtypes; recursively
        shape=npst.array_shapes(min_dims=0, min_side=0),  # any array shape
    )
)
def test_foo(arr):
    # `arr` is a Numpy array which could be *extremely* strange w.r.t. dtype and shape
    ...

nested_dtypes() is hilariously aggressive - I guess that's what you're after here, but if notyou can wind the dtypes back to "just" any non-recursive array_dtypes(), or even scalar_dtypes(). Many of these non-scalar inputs will be unaligned, though I'm not sure about striding.

Alternatively, a more direct translation would be:

    @pytest.mark.parametrize("from_dt", simple_dtype_instances())
    @given(strategies.data())
    def test_simple_direct_casts(self, from_dt, data):
        values = data.draw(
            npst.arrays(
                # Automatically uses `npst.from_dtype()` to get the elements strategy
                dtype=from_dt,
                # Allows us to shrink to an array of just one element on failure
                shape=st.integers(1, self.size // dtype.itemsize)
            )
        )
        ...

Regarding @settings(max_examples=5, phases=[Phase.generate]), I'd strongly recommend leaving the phases setting alone: you probably do want Hypothesis to replay previous failures (preventing flaky tests!) and shrink any current failures to a minimal example (which makes debugging way easier).

I'd also be pretty nervous about only generating five examples, when these tests can cover so many different combinations, but that's really down to your performance budget. However this would be a perfect case for using a coverage-guided fuzzer like Atheris (notable for doing both Python and C code coverage) to really shake out any subtle issues (worked example here).

I'm in Canberra (UTC+11), but happy to chat in my morning / your evening at your convenience if that would be helpful. Standing offer also covers to code review or writing tests for Numpy if there's something specific which would be useful 😁

seberg · 2020-11-29T18:52:19Z

Thanks Zac! Let me look a bit closer at the array strategies next week. I guess I did not do it when I wrote this (it is a while ago), but than it probably would make sense to sit down some time briefly.

It is plausible that this could also just use a single (slightly) nested dtype strategy. If we manually unnest nested dtypes (with array views) that will effectively test all those complicated cases (strided, unaligned, at least hopefully). And the nested dtypes are not super well tested currently... (The test suit seems to find even subtle bugs, but often in unrelated tests ;))

Zac-HD · 2020-12-02T00:05:48Z

The test suite seems to find even subtle bugs, but often in unrelated tests ;)

That's certainly been my experience applying Hypothesis to Numpy, and why I'm reluctant to write more tests without advice from maintainers! Finding more bugs can be a good thing long term, but it does add a bit of friction to each PR :/

seberg · 2020-12-07T20:18:13Z

Thanks, I decided to just simplify the tests a lot for now (using only small random numbers). For the structured dtypes, I think hypothesis could be very cool:

Generate a structured dtype
Generate a second dtype to cast to (could be random, although most will just now allow casting)
Compare casting between the structured dtype and the other dtype with the result if we cast the individual views

But quite frankly, the complexity of such a test is probably so big, that I am unsure it is worth the trouble (at least as part of this PR). So what I will do is update the PR in a bit and consider it finished (with the caveat that I will check coverage results to add simple tests that to ensure decent code coverage).

mattip · 2020-12-21T14:23:25Z

This is for 1-d strided loops? What about the more general nd case when strides on one axis may skip elements?

seberg · 2020-12-21T16:35:45Z

@mattip not sure where you are aiming? The core implementation only uses 1-D strided chunks and the implementation for calling that multiple times is one level higher (and unchanged). I admit, there may be a point in adding more tests where many calls for the 1-D loop are necessary, just to make sure there is nothing wrong with that case.

mattip · 2020-12-21T17:04:35Z

Right, sorry. I was thinking of something else.

mattip

On a first pass through, this looks good, thanks. It would be nice to test more of the error conditions and also to test the auxilary data mechanism. I only see additional tests, are there tests that are now redundant?

mattip · 2020-12-21T17:08:24Z

numpy/core/src/multiarray/array_method.c

+    *out_transferdata = NULL;
+    PyArrayMethodObject *meth = context->method;
+
+    int nargs = context->method->nin + context->method->nout;


Suggested change

int nargs = context->method->nin + context->method->nout;

int nargs = meth->nin + meth->nout;

mattip · 2020-12-21T17:10:21Z

numpy/core/src/multiarray/array_method.c

+                "_simple_strided_call() takes exactly one tuple with as many "
+                "arrays as the method takes arguments (%d+%d).", nin, nout);
+        return NULL;
+    }


Coverage thinks the error condition needs a test

I didn't bother adding tests for this (at least for now) private API yet. But can add it here.

mattip · 2020-12-21T17:20:11Z

numpy/core/src/multiarray/dtype_transfer.c

+        NPY_AUXDATA_FREE((NpyAuxData *)res);
+        return NULL;
+    }
+    return (NpyAuxData *)res;


Is this covered by a test? Maybe worth trying to think of one that has a use for this code path?

Yeah, the only way to reach this is using np.nditer(...).copy() probably, but can add tests.

Anything else that jumps at you aside from a few additional tests? Most of the error paths (aside those in array_method.c) are probably (almost) impossible to tests, though.

No, it all looks quite straightforward although repetitive. It would be nice to try to refactor this somehow, but I don't see how. There is not much that needs careful refcount considerations or malloc/free checking, since it is mostly logic to assign function pointers.

One thing to think about would be to split the large files into smaller pieces.

seberg · 2021-01-04T23:44:29Z

Did another small updates on the test. Yes, that nditer test is a bit crazy, but it should cover a huge amount of casting "composure" (aside from casting composing error paths, but they are very hard to hit), as well as a bit more of the array-method ones (although those are private functions currently).

Do you have any specific tests in mind that would be good? Testing this is tricky, so there could well be something I forgot.

seberg · 2021-01-04T23:58:48Z

Oh, I still need to change the "use new code paths" flag back to silence the doctest error, since that would confuse codecov, I will do that at the very end.

mattip · 2021-01-07T15:40:02Z

The code itself looks fine. It seems like we are adding alot of boilerplate but I don't see how to avoid it. There is a conflict and coverage is still complaining that some of the test conditions are never hit. Could you write some c-level tests that abuse the API just to see that the errors are correctly raised with no problems?

seberg · 2021-01-07T16:18:32Z

Well, the boilerplate has a purpose... It adds a layer of separation between code paths that need to be separated. Ideas welcome, but the only idea I had would be to use a c.src and I did not see the advantage of code reduction making that worth it (at least I like to have good editor/static analysis).

I had hoped I can push that off, until that API is actually made public and the errors are more than sanity checks on new additions to NumPy, because that probably makes the testing also nicer (e.g. you had the idea of introducing that test-time compilation infrastructure), but yes, I could do that.

mattip · 2021-01-07T16:22:57Z

Since it is internal we can put off adding tests for now, but should somehow mark it with a comment TODO: test this or so.

It seems the only reason this did not cause problems before was that the test checking for C warnings only ran on a run where these were always 0 and thus undefined (and so were not redefined later). The `#ifndef` would have to happen at a later time apparently, so just avoid it.

seberg · 2021-01-08T00:01:08Z

Well, I added such a comment to the private methods and the internal fromspec (in the header). Also squashed and the doctest flag switched (which means codecov will probably be useless now).

mattip · 2021-01-08T06:22:11Z

Build is failing, typo?:

numpy/core/src/multiarray/array_method.c:753:43: error: ‘boundarraymethod__resolve_descripors’ undeclared here

seberg · 2021-01-08T15:23:18Z

Oh man, should have just run the tests even for a comment change I guess...

switch to using new casting as default to run full CI Initialize casting to silence compile warnings only... BUG: As my comment even said, out_needs_api is caller-initialized so I shouldn't init it to 0, it might disable api again. Simplify tests again, it seems unnecessar to go this complex Simplify `#if` as the variable should always be defined TST: Add more tests (and some related fixes) This fixes a few small things (i.e. the decref loop can be NULL and checking for writeable was incorrect in the private API). Mostly adds new tests for some internal API as well as for the copy of the auxiliar data. The nditer test however also is a heavyweight tests for the general multistep casting

mattip · 2021-01-09T18:45:51Z

Thanks @seberg

jreback · 2021-01-10T17:30:59Z

@seberg see the linked issues

this casts python objects to strings which is a huge breaking change

seberg · 2021-01-10T17:46:27Z

Hmmm, thanks for the note. I can't reproduce it immediately, so let me dig in. It can't really be this PR, more likely and an error in gh-18115 or even gh-13578

seberg mentioned this pull request Nov 28, 2020

CI: GH action test runs not using right Python version? #17865

Closed

seberg commented Nov 28, 2020

View reviewed changes

seberg force-pushed the restructure-cast-loops-master branch from 463e6da to e507b30 Compare December 7, 2020 20:25

seberg added 01 - Enhancement 58 - Ready for review Priority: high High priority, also add milestones for urgent issues labels Dec 7, 2020

seberg requested a review from mattip December 16, 2020 20:01

rossbar self-requested a review December 16, 2020 21:28

seberg mentioned this pull request Dec 18, 2020

pnumpy: request ufunc like hooks for conversion functions, reduce funcs, better ufunc hooking with callback #18028

Open

mattip reviewed Dec 21, 2020

View reviewed changes

seberg force-pushed the restructure-cast-loops-master branch 2 times, most recently from f0535f2 to 19ed336 Compare December 23, 2020 23:58

seberg force-pushed the restructure-cast-loops-master branch from 19ed336 to 16b88fe Compare January 4, 2021 23:41

seberg requested a review from mattip January 4, 2021 23:41

seberg force-pushed the restructure-cast-loops-master branch from 16b88fe to b6a9f76 Compare January 7, 2021 23:49

seberg force-pushed the restructure-cast-loops-master branch from b6a9f76 to 787ef65 Compare January 7, 2021 23:59

seberg force-pushed the restructure-cast-loops-master branch from 787ef65 to 18d69f2 Compare January 8, 2021 15:22

seberg force-pushed the restructure-cast-loops-master branch from 18d69f2 to 90f4b08 Compare January 8, 2021 15:52

mattip merged commit 2facd19 into numpy:master Jan 9, 2021

seberg deleted the restructure-cast-loops-master branch January 9, 2021 19:12

phofl mentioned this pull request Jan 10, 2021

CI: Numpy changed dtype inference pandas-dev/pandas#39089

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Implement new casting loops based on NEP 42 and 43 #17863

MAINT: Implement new casting loops based on NEP 42 and 43 #17863

seberg commented Nov 28, 2020 •

edited

Loading

seberg Nov 28, 2020

Zac-HD commented Nov 29, 2020

seberg commented Nov 29, 2020

Zac-HD commented Dec 2, 2020 •

edited

Loading

seberg commented Dec 7, 2020

mattip commented Dec 21, 2020

seberg commented Dec 21, 2020

mattip commented Dec 21, 2020

mattip left a comment

mattip Dec 21, 2020

mattip Dec 21, 2020

seberg Dec 21, 2020

mattip Dec 21, 2020

seberg Dec 21, 2020

mattip Dec 21, 2020

seberg commented Jan 4, 2021

seberg commented Jan 4, 2021

mattip commented Jan 7, 2021

seberg commented Jan 7, 2021

mattip commented Jan 7, 2021

seberg commented Jan 8, 2021

mattip commented Jan 8, 2021

seberg commented Jan 8, 2021

mattip commented Jan 9, 2021

jreback commented Jan 10, 2021

seberg commented Jan 10, 2021

	int nargs = context->method->nin + context->method->nout;
	int nargs = meth->nin + meth->nout;

MAINT: Implement new casting loops based on NEP 42 and 43 #17863

MAINT: Implement new casting loops based on NEP 42 and 43 #17863

Conversation

seberg commented Nov 28, 2020 • edited Loading

seberg Nov 28, 2020

Choose a reason for hiding this comment

Zac-HD commented Nov 29, 2020

seberg commented Nov 29, 2020

Zac-HD commented Dec 2, 2020 • edited Loading

seberg commented Dec 7, 2020

mattip commented Dec 21, 2020

seberg commented Dec 21, 2020

mattip commented Dec 21, 2020

mattip left a comment

Choose a reason for hiding this comment

mattip Dec 21, 2020

Choose a reason for hiding this comment

mattip Dec 21, 2020

Choose a reason for hiding this comment

seberg Dec 21, 2020

Choose a reason for hiding this comment

mattip Dec 21, 2020

Choose a reason for hiding this comment

seberg Dec 21, 2020

Choose a reason for hiding this comment

mattip Dec 21, 2020

Choose a reason for hiding this comment

seberg commented Jan 4, 2021

seberg commented Jan 4, 2021

mattip commented Jan 7, 2021

seberg commented Jan 7, 2021

mattip commented Jan 7, 2021

seberg commented Jan 8, 2021

mattip commented Jan 8, 2021

seberg commented Jan 8, 2021

mattip commented Jan 9, 2021

jreback commented Jan 10, 2021

seberg commented Jan 10, 2021

seberg commented Nov 28, 2020 •

edited

Loading

Zac-HD commented Dec 2, 2020 •

edited

Loading