Skip to content

[Bug]: Possible problem with parameter NFFT and pad_to of mlab._spectral_helper #24822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gapplef opened this issue Dec 27, 2022 · 5 comments

Comments

@gapplef
Copy link
Contributor

gapplef commented Dec 27, 2022

Bug summary

If my understanding is correct:

  • NFFT of mlab._spectral_helper corresponds to nperseg of spectral funtions of scipy.signal.
    It's the actual length of signal (segment) for DFT.
  • pad_to of mlab._spectral_helper corresponds to nfft of spectral funtions of scipy.signal.
    It's the length of DFT, i.e. parameter n of fft funtion.
    result = np.fft.fft(result, n=pad_to, axis=0)[:numFreqs, :]

But the implementation of Matplotlib is really confusion:

  • If length of signal is less than NFFT, signal will pad to length of NFFT.

    # zero pad x and y up to NFFT if they are shorter than NFFT
    if len(x) < NFFT:
    n = len(x)
    x = np.resize(x, NFFT)
    x[n:] = 0
    if not same_data and len(y) < NFFT:
    n = len(y)
    y = np.resize(y, NFFT)
    y[n:] = 0

    And the window is determined by NFFT instead of actual length of signal. This would cause wrong window correction.

    if not np.iterable(window):
    window = window(np.ones(NFFT, x.dtype))
    if len(window) != NFFT:
    raise ValueError(
    "The window length must match the data's first dimension")

  • The Nyquist frequency is NFFT/2 instead of pad_to/2.

    # if we have a even number of frequencies, don't scale NFFT/2
    if not NFFT % 2:
    slc = slice(1, -1, None)
    # if we have an odd number, just don't scale DC
    else:
    slc = slice(1, None, None)

Code for reproduction

import numpy as np
from matplotlib import mlab
fs = 1000
t = np.arange(0, 1, 1/fs)
rng = np.random.default_rng(42)
s = np.sin(2 * np.pi * 100 * t) + 0.5 * rng.normal(size=t.shape)

nfft = 2048
nsignal = len(s) #1000
psd, freqs = mlab.psd(s, NFFT=nfft, Fs=fs, window=mlab.window_none)

mag = np.abs(np.fft.rfft(s, n=nfft))
psd1 = (mag)**2/nsignal/fs
if not nfft % 2:
  psd1[1:-1] *= 2
else: 
  psd1[1:] *= 2
relative_error = np.abs( 2 * (psd-psd1)/(psd + psd1) )
print(relative_error.max())

Actual outcome

0.6876640419947717

Expected outcome

0

@jklymak
Copy link
Member

jklymak commented Dec 27, 2022

This implementation is pretty clunky, and we strongly recommend just using scipy. Indeed, I was hoping to deprecate the whole thing, but there wasn't sufficient support (#22920)

You are not quite using the parameters correctly, but I agree the results are still incorrect:

import numpy as np
from matplotlib import mlab
import matplotlib.pyplot as plt

fs = 1000
t = np.arange(0, 1, 1/fs)
rng = np.random.default_rng(42)
s = np.sin(2 * np.pi * 100 * t) + 0.5 * rng.normal(size=t.shape)

nfft = 2048
nsignal = len(s) #1000
psd, freqs = mlab.psd(s, pad_to=nfft, NFFT=nfft, Fs=fs, window=mlab.window_none,
                      scale_by_freq=True)
psd2, freqs2 = mlab.psd(s, pad_to=nsignal, NFFT=nsignal, Fs=fs, window=mlab.window_none,
scale_by_freq=True)

fig, ax = plt.subplots()
df = np.median(np.diff(freqs))
ax.loglog(freqs, psd, label=f'NFFT = 2048: {np.sum(psd[1:] * df)}')

df2 = np.median(np.diff(freqs2))
ax.loglog(freqs2, psd2, label=f'NFFT=len(data): {np.sum(psd2[1:] * df2)}')
ax.legend()
ax.set_title(f'Signal variance: {np.var(s)}')
plt.show()

yields:

BadNorm

So, the padded normalization is incorrect by a factor of 1000/2048.

@gapplef
Copy link
Contributor Author

gapplef commented Dec 28, 2022

If NFFT was meant to be the actual length of signal (segment) for DFT, I think the following code is the source of the bug.

# zero pad x and y up to NFFT if they are shorter than NFFT
if len(x) < NFFT:
n = len(x)
x = np.resize(x, NFFT)
x[n:] = 0
if not same_data and len(y) < NFFT:
n = len(y)
y = np.resize(y, NFFT)
y[n:] = 0

And the problem should be fixed with:

# Check if x and y are the same length, zero-pad if necessary
if not same_data:
    if len(x) != len(y):
        if len(x) < len(y):
            n = len(x) 
            x = np.resize(x, len(y)) 
            x[n:] = 0
        else:
            n = len(y) 
            y = np.resize(y, len(x)) 
            y[n:] = 0

if len(x) < NFFT: 
   NFFT = len(x) 

Also the following code need be fixed for correct one-sided psd:

# if we have a even number of frequencies, don't scale NFFT/2
if not NFFT % 2:
slc = slice(1, -1, None)
# if we have an odd number, just don't scale DC
else:
slc = slice(1, None, None)

 # if we have a even number of frequencies, don't scale pad_to/2 
 if not pad_to % 2: 
     slc = slice(1, -1, None) 
 # if we have an odd number, just don't scale DC 
 else: 
     slc = slice(1, None, None) 

@jklymak
Copy link
Member

jklymak commented Dec 28, 2022

@gapplef Maybe? _spectral_helper is written so that pad_to ends up as n to FFT. That is not what I would have done, and I don't understand why it is that way. However, as above, I think we should not be dragging around a different API from scipy.signal, and should just remove all this, the practical problem is that trying to fix the inconsistencies will introduce fixes that are not necessarily back-compatible.

@QuLogic
Copy link
Member

QuLogic commented Dec 28, 2022

cc @oscargus

@misjob
Copy link

misjob commented Jan 9, 2024

Also the zero padding if NFFT is less then the signal length should be performed after the detrend. Otherwise the zero padding would screw up the "mean" and "linear" calculations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants