Skip to content

ENH: Optimize the FFT implementation #17839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
Qiyu8 opened this issue Nov 24, 2020 · 9 comments
Open

ENH: Optimize the FFT implementation #17839

Qiyu8 opened this issue Nov 24, 2020 · 9 comments

Comments

@Qiyu8
Copy link
Member

Qiyu8 commented Nov 24, 2020

Fast Fourier transforms are widely used for applications in engineering, music, science, and mathematics.The current fft implementation was introduced two years ago(#11885, #11888 ), which is not the best implementation as far as I know, The drawbacks includes:

  • Don't supports single precision transforms.
  • Don't support for multi-D transforms.
  • Don't make use of vector instructions for FFTs, The main reason that It's performance is not good enough(At least better than fftpack).

Although the author published the C++ based pypocketfft later, which has overcame these shortcomings, but It's was not compatible with C based numpy, I found out that there has several backend projects that worth considering.

project worth introducing?
fftw3 Most popular FFT library, which nowadays is the "gold standard" for FFT implementations. It's also the default fft algorithm of matlab, there has a python wrapper pyfftw, can't integrated due to it's GPL Licence.
Intel MKL/IPP Significantly faster than FFTW with intel processors, already integrated as a third party lib.
KFR Claims to be faster than FFTW, can't integrated due to it's commecial Licence.
FFTS reported to be faster than FFTW because the use of SIMD instructions, at least in some cases. worth considering.
FFTE reported to be faster than FFTW, but the source code is Fortran-based.
Ooura FFT provide C and Fortran version implementation, but It hasn't been updated in a long time.
muFFT/pffft/PGFFT have performance comparable to FFTW.depends strongly on the SIMD instructions. but with limited features.
KissFFT/PocketFFT The simplest but also the slowest one here, which is the current solution.

Now we have two options:

  • choose a new backend system such as FFTS, which is painful because of the trival adaptation process.
  • optimize the current algorithm based on universal intrinsics, which is more practical and operable.
@mattip
Copy link
Member

mattip commented Nov 24, 2020

There is another option, which is to defer to scipy for fft. That path would be defensible since fft is a gateway function to signal processing, and the signal processing tools in scipy are, like its fft, very good.

@leofang
Copy link
Contributor

leofang commented Nov 24, 2020

Isn't FFTS unmaintained? The last commit was 3 years ago, even older than pocketfft.

NumPy has been the reference implementation for fundamental FFT functionalities, and I expect it to do things right (accuracies, coverage of all existing kinds of transforms, etc). Without showing a solid and thorough benchmark for how "slow" the current pocketfft solution is (which I doubt) for every kind of transforms as a basis for further discussion, I find it hard to justify another migration within such a short period.

@mreineck
Copy link
Contributor

numpy pocketfft is very close in performance to the scipy version for 1D transforms. For multi-D, it's "slowish" in comparison, by factors up to maybe 4, depending on transform size and hardware. Accuracy is basically the same.
Missing features: multithreading, sine/cosine transforms and multi-accuracy support.
As a tool to get the fundamental functionality right, it's pretty OK, I'd say, although I admit I don't like it that much anymore since the C++ version came out :-)

@Qiyu8
Copy link
Member Author

Qiyu8 commented Nov 25, 2020

@mattip Scipy is using pypocketfft and support backend systems such as pyfftw and numpy's pocketfft, Do you mean we should follow that strategy? BTW, The current fft module is lack of benchmark cases, which can borrowed from scipy if we want to optimize the current solution further.
@leofang I agree with you on the thorough benchmark tests, Here is the simple benchmark result.

  • bench Script
import numpy as np
import random
import matplotlib.pyplot as plt
from simple_benchmark import benchmark
import pyfftw

def pocketfft(it):
    np.fft.fft(it)

def fftw(it):
    pyfftw.interfaces.numpy_fft.fft(it)

def bench_sum_fftwVSpocketfft():
    print (np.__version__)
    np.show_config()
    # Turn on the cache for optimum performance
    pyfftw.interfaces.cache.enable()
    b_array = benchmark(
        [pocketfft, fftw],
        arguments={2**i: np.random.randn(2**i)+1j*np.random.randn(2**i) for i in range(2, 20)},
        argument_name='array size',
        function_aliases={pocketfft: 'numpy.fft', fftw: 'pyfftw.fft'}
    )
    b_array.plot()
    plt.show()
bench_sum_fftwVSpocketfft()

image

As we can see, fftw3 is better than pocketfft with large arrays.

@mreineck Glad to hear your pertinent comment, would you like to vectorize the current pocketfft if we don't want follow scipy's strategy? I think you are the best candidate to finish the optimize task.

@mattip
Copy link
Member

mattip commented Nov 25, 2020

I meant we should not touch the fft module in NumPy, rather we should consider removing it from NumPy altogether or at least clearly document that SciPy has a better implementation and more complete support for signal-processing tools than NumPy ever will.

@mreineck
Copy link
Contributor

@mreineck Glad to hear your pertinent comment, would you like to vectorize the current pocketfft if we don't want follow scipy's strategy? I think you are the best candidate to finish the optimize task.

Sorry, but this is a task I won't have time for. Please understand that even in C++ pypocketfft there is no vectorization for 1D FFTs; this comes only with multidimensions. So the benefits of the C++ implementation could only be carried over to numpy if we introduced the native multi-D FFTs there as well. And dong this in C, without the help of templates, is a gigantic and horrible task...

In my ideal world, scipy.fft would move to numpy at some point, and since scipy depends on numpy anyway, it will always be available to any numpy or scipy user; there would be no duplication of code and no slightly different interfaces. Of course that can only happen when numpy accepts C++ extensions.

@mreineck
Copy link
Contributor

BTW, if I read the benchmark script correctly, then you do not actually execute any FFTs with FFTW, you just build plans...
If I change the line
pyfftw.builders.fft(it)
to
pyfftw.interfaces.numpy_fft.fft(it),

the plot changes to this:

bench

This result feels much more realistic to me.

@mreineck
Copy link
Contributor

Here is a very similar benchmark in 2D, including scipy.fft:

import numpy as np
import random
import matplotlib.pyplot as plt
from simple_benchmark import benchmark
import pyfftw
import scipy

def pocketfft(it):
    np.fft.fftn(it, axes=(0,1))

def scipyfft(it):
    scipy.fft.fftn(it, axes=(0,1))

def fftw(it):
    pyfftw.interfaces.numpy_fft.fftn(it, axes=(0,1))
#    pyfftw.builders.fft(it)

def shp(len):
    return (len, len)

def bench_sum_fftwVSpocketfft():
    print (np.version)
    np.show_config()
# Turn on the cache for optimum performance
    pyfftw.interfaces.cache.enable()
    b_array = benchmark(
        [pocketfft, fftw, scipyfft],
        arguments={2**i: np.random.uniform(size=shp(2**i))+1j*np.random.uniform(size=shp(2**i)) for i in range(2, 12)},
        argument_name='array size',
        function_aliases={pocketfft: 'numpy.fft', fftw: 'pyfftw.fft', scipyfft: 'scipy.fft'}
        )
    b_array.plot()
    plt.savefig("bench.png")
    plt.show()
bench_sum_fftwVSpocketfft()

bench

Here, FFTW already starts to lose for large sizes. (It would probably still win if it could do full planning, but then the interface becomes much more complicated). At small sizes one can see the vectorization advantage of scipy.fft over numpy.fft... nice, but not exactly earth-shaking.

@Qiyu8
Copy link
Member Author

Qiyu8 commented Nov 25, 2020

Thanks for pointing that out, modified to pyfftw.interfaces.numpy_fft.fft(it) and get simililar result like yours.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants