Skip to content

MAINT, ENH: Implement calling pocketfft via gufunc and allow out argument #25536

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 19, 2024

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Jan 4, 2024

EDITED as no longer a draft.

Inspired by #25399, where @serge-sans-paille aimed to add an out parameter to the various fft routines, but where it turned out to be very hard to get it right, I instead turned to calling the fft routines via gufuncs, where the iterator takes care of things more automatically.

It all works well (with one small exception; see below), and I could even remove some of the copying of the whole array done for strange n (now doing much smaller copies in the ufunc). I also added the tests for passing in out from #25399 and udated them.

Note on the commits: I can merge these later, but for now kept them separate so one can see the logic a bit more. In particular, the first commit separates out the numpy-specific code from what one gets from upstream pocketfft, with the second adding a few small changes (that we probably should push upstream, if it is really a problem to call malloc(0)).

All test pass except one where it is checked that longdouble input gets dealt with properly. @seberg - might you be able to advice? I thought that with casting='unsafe' (or casting='same_kind') I would be able to force the use of the double loops that I defined, but somehow I still a TypeError that no suitable loop is found for casting='safe' (i.e., what I pass in for casting seems to be ignored). Trying to debug that (see #25535 for the fix that was needed), I found that the error is raised at

/*
* Note that part of the promotion is to the complete the signature
* (until here it only represents the fixed part and is usually NULLs).
*
* After promotion, we could push the following logic into the ArrayMethod
* in the future. For now, we do it here. The type resolution step can
* be shared between the ufunc and gufunc code.
*/
PyArrayMethodObject *ufuncimpl = promote_and_get_ufuncimpl(ufunc,
operands, signature,
operand_DTypes, force_legacy_promotion, allow_legacy_promotion,
promoting_pyscalars, NPY_FALSE);

@serge-sans-paille
Copy link
Contributor

@mhvk thanks for looking at this :-)

@mhvk mhvk force-pushed the fft-as-gufunc branch 2 times, most recently from e9844e2 to 8271cfc Compare January 4, 2024 21:39
@mhvk mhvk marked this pull request as ready for review January 4, 2024 23:51
@mhvk mhvk changed the title DRAFT Implement calling pocketfft via gufunc DRAFT Implement calling pocketfft via gufunc and allow out argument Jan 4, 2024
@seberg
Copy link
Member

seberg commented Jan 5, 2024

@mhvk sorry, I have been focusing on other work (and was lazy on the holidays). The default casting is (still) "same-kind", so if you force the double loop it should be accepted, however, you need to force it of course (sorry tautology :)).

There are two ways to do this in principle:

  1. You install a promoter as here (and its use)
  2. You use the new style loops (also that file) and just guirilla add it for longdouble->double explicitly. (although the first seems clearer)

But we want the first, I think. For this ufunc, it might also be interesting to use the new-style inner-loop. That way, you would be able to cache the fft plan over multiple calls (But I am not too picky about actually doing it).

@mhvk
Copy link
Contributor Author

mhvk commented Jan 5, 2024

@seberg - that sounds good in principle. I'm still confused, though, why it is not good enough to just pass in casting='unsafe'. Shouldn't that work?

@seberg
Copy link
Member

seberg commented Jan 5, 2024

No, casting doesn't change how promotion works, and it shouldn't! (And it never did) It would be enough to pass dtype=np.double, casting="unsafe", thought.

@seberg
Copy link
Member

seberg commented Jan 5, 2024

(Sorry, to make that one step further: it is should also be enough to pass just dtype, as the default is already same-kind casting)

@mhvk
Copy link
Contributor Author

mhvk commented Jan 5, 2024

Ah, that works indeed, though instead of dtype, I have to pass in signature=ufunc.types[0], so that inputs and outputs are both covered.

@mhvk mhvk changed the title DRAFT Implement calling pocketfft via gufunc and allow out argument MAINT, ENH: Implement calling pocketfft via gufunc and allow out argument Jan 5, 2024
Copy link
Contributor Author

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some comments throughout to help review.

@@ -698,22 +736,22 @@ def _cook_nd_args(a, s=None, axes=None, invreal=0):
return s, axes


def _raw_fftnd(a, s=None, axes=None, function=fft, norm=None):
def _raw_fftnd(a, s=None, axes=None, function=fft, norm=None, out=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this loop, out is either never used, or automatically updated in-place. In principle, one could use in-place any time the shape does not change, but that would need extra logic, so perhaps out of scope here.

@@ -46,33 +46,36 @@
# divided. This replaces the original, more intuitive 'fct` parameter to avoid
# divisions by zero (or alternatively additional checks) in the case of
# zero-length axes during its computation.
def _raw_fft(a, n, axis, is_real, is_forward, inv_norm):
def _raw_fft(a, n, axis, is_real, is_forward, inv_norm, out=None):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note the simplification because axis can just get passed on to the gufunc.

}

/*
* For the forward real, we cannot know what the requested number of points is
Copy link
Contributor Author

@mhvk mhvk Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the python side assumes one can pass in npts larger or smaller than nin - only the output size is related to npts.

@@ -9,18 +9,12 @@
* Copyright (C) 2004-2018 Max-Planck-Society
* \author Martin Reinecke
*/
#define NPY_NO_DEPRECATED_API NPY_API_VERSION
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in this file are just stripping out the previous numpy call code, making the file as close as possible to the original (e.g., removing the added static).

@@ -0,0 +1,34 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The header file is now needed, since compilation is done separately.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using templating might make some things a bit nicer (the copy function avoiding memcpy() and also the static void data passing). But both seems so small that it's not worth to get out C++ for.

Overall looks good, a couple of small comments, need to have another closer look (especially at the Python side), but I doubt I will notice anything.

@mhvk
Copy link
Contributor Author

mhvk commented Jan 7, 2024

@seberg - on the templating, I may still try it, but the new API seemed not entirely trivial without an easy example of a gufunc to work from. That said, perhaps these gufuncs can be the good example...

@ngoldbaum
Copy link
Member

I've had a bunch of experience writing new-style ufuncs for stringdtype, although never with the gufunc machinery. Happy to help out in north american time zones if needed.

@seberg
Copy link
Member

seberg commented Jan 8, 2024

I don't think gufuncs change anything at all: The only difference between the two is that the input strides differ.
That said, the static void *data field is not passed in currently, so if that isn't solved with templating, that might be annoying.

Might have a look at it, overall, it might be easiest to just get started/push a start. But: I am totally fine with not doing it!

@mhvk mhvk force-pushed the fft-as-gufunc branch 3 times, most recently from 87a911b to e907f3a Compare January 9, 2024 01:28
@mhvk
Copy link
Contributor Author

mhvk commented Jan 9, 2024

@seberg, @ngoldbaum - I still like the idea of using the new API, but don't really see myself having time for it this week -- term has started again... I think this PR is in pretty good shape, though it needs a squash of most of the commits (I'd prefer not to squash all, but keep the changes to pocketfft separate).

@mhvk
Copy link
Contributor Author

mhvk commented Jan 9, 2024

I went ahead and rebased, squashing to 3 commits: separate out to upstream pocketfft, apply the small number of numpy patches, and the addition/use of the gufuncs.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just approve this to be clear about thinking that it is a good idea. I still didn't read the tests super carefully, but that should be OK.

There is the small symbol issue I noticed looking through, I wouldn't mind reading things once more more carefully, but overall that time would probably better spend making it C++ ;).


Let's get this in, if nobody beats me to it: next time I come across, I may push a fix for the symbols and just merge.

void destroy_rfft_plan (rfft_plan plan);
int rfft_backward(rfft_plan plan, double c[], double fct);
int rfft_forward(rfft_plan plan, double c[], double fct);
size_t rfft_length(rfft_plan plan);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these should all have a NPY_NO_EXPORT ideally. Small thing, but we do it everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My logic was that we're vendoring the pocketfft files, so ideally we really change as little as possible. Does it matter if it is only included in the ufunc generation code? If you think it does matter, even a little, then I'll change it. But unlike the other changes to pocketfft, we could not upstream these.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not super sure what the advantags are beyond making them more private (i.e. if it can avoid clashs or has other advantages).

So all else being equal, it is the right thing to add it I think. But it probably doesn't matter a whole lot, so 🤷.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I just added NPY_VISIBILITY_HIDDEN (which is the same as NPY_NO_EXPORT, but defined in numpy/numpyconfig.h rather than somewhere in multiarray, so a bit more logical for a vendored routine).

mhvk added 4 commits January 19, 2024 11:10
This commit ensures the pockefft.c file is identical to upstream.
These numpy changes could in principle be added to upstream,
as they add robustness.
These changes make no sense for upstream.
The ufuncs taken npts from the output, so that using less or more
than the full input remains possible.
@seberg
Copy link
Member

seberg commented Jan 19, 2024

Thanks Marten, let's get this in. It isn't like we are going to find substantive improvements at this point (beyond the discussed refactors), at least I hope ;).

@mhvk if you want to add a small release note fragment, just make a new PR.

@seberg
Copy link
Member

seberg commented Apr 10, 2024

It was pointed out to me that we made rfft(imaginary_array) fail in this PR. TBH, I think that is probably just as well (it is a bit annoying that the error doesn't report the dtypes to make it more obvious, though).

So I will lean to consider it a good 2.0 change that shouldn't really affect many (you currently rightly get the annoying complex warning). (I.e. the only real improvement here would be to improve that error message, IMO.)

@mhvk
Copy link
Contributor Author

mhvk commented Apr 10, 2024

I agree that if people call rfft on a complex array it is almost certainly a mistake, so an error is fine. I wouldn't mind a better error message, though I think it might be the ufunc message one could improve, by listing both the input types and the accepted types, which would help all ufuncs. But perhaps relatively low priority?

p.s. Right now,

np.fft.rfft(np.arange(10)+1j)
TypeError: ufunc 'rfft_n_even' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

@seberg
Copy link
Member

seberg commented Apr 10, 2024

Ah, that particular error is really more a "no loop found" path but in an old code path. So it should be pretty straight forward to the input dtype(s) here, but you can't really say which cast "failed", because we may still be in the promotion step rather than casting step (and at least I think the code path can be used for either step).

@mhvk
Copy link
Contributor Author

mhvk commented Apr 10, 2024

So, maybe the TODO here would be to upgrade to the new machinery, which would then give a clearer error message?

@seberg
Copy link
Member

seberg commented Apr 12, 2024

Yeah, I guess it doesn't matter enough to fix this quickly. Upgrading to the new machinery is a big chunk of work and impossible to back-port (we have to remove the "disable nep 50" option first), but will be a big churn that would be nice to at least get started on in the next months.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants