-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT, ENH: Implement calling pocketfft via gufunc and allow out argument #25536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@mhvk thanks for looking at this :-) |
e9844e2
to
8271cfc
Compare
@mhvk sorry, I have been focusing on other work (and was lazy on the holidays). The default casting is (still) "same-kind", so if you force the double loop it should be accepted, however, you need to force it of course (sorry tautology :)). There are two ways to do this in principle:
But we want the first, I think. For this ufunc, it might also be interesting to use the new-style inner-loop. That way, you would be able to cache the fft plan over multiple calls (But I am not too picky about actually doing it). |
@seberg - that sounds good in principle. I'm still confused, though, why it is not good enough to just pass in |
No, casting doesn't change how promotion works, and it shouldn't! (And it never did) It would be enough to pass |
(Sorry, to make that one step further: it is should also be enough to pass just dtype, as the default is already |
Ah, that works indeed, though instead of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some comments throughout to help review.
@@ -698,22 +736,22 @@ def _cook_nd_args(a, s=None, axes=None, invreal=0): | |||
return s, axes | |||
|
|||
|
|||
def _raw_fftnd(a, s=None, axes=None, function=fft, norm=None): | |||
def _raw_fftnd(a, s=None, axes=None, function=fft, norm=None, out=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this loop, out
is either never used, or automatically updated in-place. In principle, one could use in-place any time the shape does not change, but that would need extra logic, so perhaps out of scope here.
@@ -46,33 +46,36 @@ | |||
# divided. This replaces the original, more intuitive 'fct` parameter to avoid | |||
# divisions by zero (or alternatively additional checks) in the case of | |||
# zero-length axes during its computation. | |||
def _raw_fft(a, n, axis, is_real, is_forward, inv_norm): | |||
def _raw_fft(a, n, axis, is_real, is_forward, inv_norm, out=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note the simplification because axis
can just get passed on to the gufunc
.
} | ||
|
||
/* | ||
* For the forward real, we cannot know what the requested number of points is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the python side assumes one can pass in npts
larger or smaller than nin
- only the output size is related to npts
.
@@ -9,18 +9,12 @@ | |||
* Copyright (C) 2004-2018 Max-Planck-Society | |||
* \author Martin Reinecke | |||
*/ | |||
#define NPY_NO_DEPRECATED_API NPY_API_VERSION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in this file are just stripping out the previous numpy call code, making the file as close as possible to the original (e.g., removing the added static
).
@@ -0,0 +1,34 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The header file is now needed, since compilation is done separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using templating might make some things a bit nicer (the copy function avoiding memcpy()
and also the static void data passing). But both seems so small that it's not worth to get out C++ for.
Overall looks good, a couple of small comments, need to have another closer look (especially at the Python side), but I doubt I will notice anything.
@seberg - on the templating, I may still try it, but the new API seemed not entirely trivial without an easy example of a gufunc to work from. That said, perhaps these gufuncs can be the good example... |
I've had a bunch of experience writing new-style ufuncs for stringdtype, although never with the gufunc machinery. Happy to help out in north american time zones if needed. |
I don't think gufuncs change anything at all: The only difference between the two is that the input strides differ. Might have a look at it, overall, it might be easiest to just get started/push a start. But: I am totally fine with not doing it! |
87a911b
to
e907f3a
Compare
@seberg, @ngoldbaum - I still like the idea of using the new API, but don't really see myself having time for it this week -- term has started again... I think this PR is in pretty good shape, though it needs a squash of most of the commits (I'd prefer not to squash all, but keep the changes to pocketfft separate). |
I went ahead and rebased, squashing to 3 commits: separate out to upstream pocketfft, apply the small number of numpy patches, and the addition/use of the gufuncs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll just approve this to be clear about thinking that it is a good idea. I still didn't read the tests super carefully, but that should be OK.
There is the small symbol issue I noticed looking through, I wouldn't mind reading things once more more carefully, but overall that time would probably better spend making it C++ ;).
Let's get this in, if nobody beats me to it: next time I come across, I may push a fix for the symbols and just merge.
numpy/fft/pocketfft/pocketfft.h
Outdated
void destroy_rfft_plan (rfft_plan plan); | ||
int rfft_backward(rfft_plan plan, double c[], double fct); | ||
int rfft_forward(rfft_plan plan, double c[], double fct); | ||
size_t rfft_length(rfft_plan plan); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think these should all have a NPY_NO_EXPORT
ideally. Small thing, but we do it everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My logic was that we're vendoring the pocketfft files, so ideally we really change as little as possible. Does it matter if it is only included in the ufunc generation code? If you think it does matter, even a little, then I'll change it. But unlike the other changes to pocketfft, we could not upstream these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not super sure what the advantags are beyond making them more private (i.e. if it can avoid clashs or has other advantages).
So all else being equal, it is the right thing to add it I think. But it probably doesn't matter a whole lot, so 🤷.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I just added NPY_VISIBILITY_HIDDEN
(which is the same as NPY_NO_EXPORT
, but defined in numpy/numpyconfig.h
rather than somewhere in multiarray
, so a bit more logical for a vendored routine).
This commit ensures the pockefft.c file is identical to upstream.
These numpy changes could in principle be added to upstream, as they add robustness.
These changes make no sense for upstream.
The ufuncs taken npts from the output, so that using less or more than the full input remains possible.
Thanks Marten, let's get this in. It isn't like we are going to find substantive improvements at this point (beyond the discussed refactors), at least I hope ;). @mhvk if you want to add a small release note fragment, just make a new PR. |
It was pointed out to me that we made So I will lean to consider it a good 2.0 change that shouldn't really affect many (you currently rightly get the annoying complex warning). (I.e. the only real improvement here would be to improve that error message, IMO.) |
I agree that if people call p.s. Right now,
|
Ah, that particular error is really more a "no loop found" path but in an old code path. So it should be pretty straight forward to the input dtype(s) here, but you can't really say which cast "failed", because we may still be in the promotion step rather than casting step (and at least I think the code path can be used for either step). |
So, maybe the TODO here would be to upgrade to the new machinery, which would then give a clearer error message? |
Yeah, I guess it doesn't matter enough to fix this quickly. Upgrading to the new machinery is a big chunk of work and impossible to back-port (we have to remove the "disable nep 50" option first), but will be a big churn that would be nice to at least get started on in the next months. |
EDITED as no longer a draft.
Inspired by #25399, where @serge-sans-paille aimed to add an
out
parameter to the variousfft
routines, but where it turned out to be very hard to get it right, I instead turned to calling the fft routines via gufuncs, where the iterator takes care of things more automatically.It all works well (with one small exception; see below), and I could even remove some of the copying of the whole array done for strange
n
(now doing much smaller copies in the ufunc). I also added the tests for passing inout
from #25399 and udated them.Note on the commits: I can merge these later, but for now kept them separate so one can see the logic a bit more. In particular, the first commit separates out the numpy-specific code from what one gets from upstream pocketfft, with the second adding a few small changes (that we probably should push upstream, if it is really a problem to call
malloc(0)
).All test pass except one where it is checked that
longdouble
input gets dealt with properly. @seberg - might you be able to advice? I thought that withcasting='unsafe'
(orcasting='same_kind'
) I would be able to force the use of thedouble
loops that I defined, but somehow I still aTypeError
that no suitable loop is found forcasting='safe'
(i.e., what I pass in forcasting
seems to be ignored). Trying to debug that (see #25535 for the fix that was needed), I found that the error is raised atnumpy/numpy/_core/src/umath/ufunc_object.c
Lines 4713 to 4724 in ce3e462