-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Add pocketfft sources to numpy for testing, benchmarks, etc. #11888
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I ran the same scripts as I did for #10625 to compare the accuracy and speed. The accuracy issue for real-valued FFTs is fixed and in general pocketfft is faster than the current implementation in numpy but still slower than scipy's fftpack (not taking the additional twiddle factor creation into account). For prime sizes it is vastly faster due to using an algorithm with a lower complexity. All seems good to me. Only thing I could think of is that maybe the implementation of pocketfft could be sped up further. EDIT: Removed speed benchmarks as they did not reflect the current status of the PR. See below. |
Thanks a lot for these tests, @ncgeib! Concerning performance, I don't fully understand yet why scipy's implementation is faster for regular sizes ... the algorithms are basically the same. Hmm, one difference is that pocketfft does its own memory management, i.e. for every transform it calls malloc() and free() a handful of times. This could cause a slowdown, especially for short transforms. Maybe that's what we are seeing here. |
Sorry, I'm being stupid ... of course I know why pocketfft is currently slower than scipy! At the moment, pocketfft is not creating plans beforehand and caching them for re-use. That means that for every repetition of the transform it recomputes the twiddle factors all over again. This can be fixed of course, but I'll probably need help writing a Python class that caches the pocketfft plans. |
I wonder whether some small Cython classes would help here ... |
@mreineck Yeah, sorry if that was not clear. The comparison is not really fair as the timings exclude the twiddle factor calculation and cache creation for numpy and scipy and include the twiddle factor calculation for pocketfft. |
The code now does some caching. If someone could tell me the canonical way to pass a C pointer to Python, I'd be grateful. Currently I'm casting its value to a "long" and feel really dirty about it. |
|
I re-ran the speed tests. The comparison should now be fair (and more or less accurate). We see that pocketfft is still just a little bit slower than scipy's fftpack for most test cases. Maybe there is some overhead that could be reduced? @rgommers If I find the time I will do that. The benchmarks basically copy the ones FFTW does. For the meantime, they can be found at https://github.com/ncgeib/benchmark-ffts |
oh okay, if you mean an actual copy then never mind - FFTW is GPL licensed so we cannot use that code. standalone repo is still useful though, we can then run it again in case of other PRs with major changes like this one |
@rgommers Sorry, I meant they copy the methodology the FFTW authors use. No code was copied in any way - I simply found the way they test the accuracy of the FFT implementations to be good (comparing against a high-precision calculation of the DFT). This allows to compare the graphs to the ones on their webpage. I specifically wrote the benchmark scripts while hoping that at one point they could be used when numpy finally gains a better FFT implementation. In that sense I am quite happy now ;) |
@ncgeib I feel much more comfortable with the new benchmarks! The remaining performance difference between scipy and pocketfft is largest for small transform sizes and almost vanishes for longer ones, so my hope is that we can still gain a little bit by optimizing the wrapper code on the Python side. |
Hmmm ... I don't see any locking/unlocking going on in the scipy code when the caches are updated. Could this be the reason for the performance difference? And why is scipy not subject to race conditions? |
From scipy/scipy#1419: ... This means that SciPy's FFTs are not threadsafe, and thus the GIL cannot be released. Thus, SciPy has to lock up the interpreter while doing FFTs. ... |
Maybe there's our performance overhead. Also, scipy is doing cache management on the C side, which could also help. But anyway I think this is a detail we can look into once we are convinced that everything else works as intended. |
There seem to be unnecessary array copies in fftpack.py (and via copy/paste also in pocketfft.py): The function
(i.e. potentially there is no copy), while
This looks conspicuously inconsistent... |
BTW, I think that most of numpy.fft's slowness compared to scipy is explained by the many calls to |
After some more measurements I'm convinced that the remaining overhead is caused by cache lookup and management. If I use a primitive cache which stores just a single transform, I get speeds that lie practically on top of the scipy data points everywhere. |
I can confirm these on my machine. In the latest state of the PR the runtime of pocketfft is basically the same as scipy's fftpack for regular numbers. |
Great! All Travis builds fail (even those for Python 3), because C89 is enforced. Do we need to request a change in policy here? |
We will try to move to C99 after Python 2.7 is dropped. Note that other parts of NumPy will need some minor fixes, or at least that was my impression last time I tried with the C99 flag. |
I'm thinking of branching 1.16.0 around the end of November. |
You can try changing the build/test environment in this PR to see what needs doing, but it would be best to keep those changes as separate commits as we might want to split the resulting PR when we move up. In particular, you can drop the 2.7 tests and look to making the rest of NumPy C99 safe. If some of that can be done without breaking 2.7 on windows, so much the better. |
[skip appveyor] C99 looks to work with gcc for all the Python versions we currently support. Testing with c99 is a step forward in testing numpy#11888 before it gets merged after we drop Python 2.7 support.
I've put up #11905 to start testing with c99 on travisCI. |
Looks like we need to modify lines 344-352 in runtests.py to remove |
See #12610. |
This all looks great, but does need a release note! Also, having not looked in any detail: the old |
@mreineck I've made a separate PR for the |
Yes, we still have a bunch of c++ (mostly because Agg is templated c++). |
@mreineck Might also want to squash the 84 commits :) |
What would that look like? Some addition in I'll fetch the latest changes from master and squash as soon as the tester runs through. |
Not at the moment. I think it could be added, but I don't really have the experience where we have to keep double precision to avoid losing too much accuracy. If memory usage is the concern, people will be better off with an in-place implementation that computes twiddle factors on the fly. Performance-wise I don't expect any speedup potential unless explicit vectorization support is added. |
@mreineck Put the notes |
@mreineck - with single precision, all calculations would be single precision; in the old
and then all code is defined in terms of |
Simply replacing |
OK, I think I'm done. |
OK, let's get this in. I haven't reviewed the sources, but I think the FFT is not difficult to test, so will rely on the tests and user feedback leading up to the 1.17.0 release. Thanks @mreineck . |
@mreineck We should also add some benchmarks for this. There are currently some in scypy that can probably be modified to cover things like large(ish) prime sizes. |
@charris Thanks a lot for merging! I agree that correctness testing for FFTs is not hard to do. The previously existing unit tests have been kept, and I added identity tests (real and complex) for all lengths between 1 and 512. That should catch most potential problems. Concerning benchmarks, I think @ncgeib's work (#11888 (comment)) would be the best start. |
This pull request was opened at the request of @matthew-brett and is related to issues #11885, #11886 and #10625.