-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: allow sliding_window_view to return an empty array for large windows #25942
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This does make a lot of sense to me, but on first glance, I am surprisingly unsure if that generalization is maybe surprising after all. Not sure why it should matter whether the window size is exactly 1 smaller at most conceptually. Why not use A helpful thing would be to see what the prior art is in pandas/scipy? E.g. pandas rolling window operations and scipy.ndimage filters? |
The Python standard library has >>> import itertools
>>> list(itertools.pairwise([0, 1, 2]))
[(0, 1), (1, 2)]
>>> list(itertools.pairwise([0, 1]))
[(0, 1)]
>>> list(itertools.pairwise([0]))
[]
If (length of window) > (length of array) + 1 both raising an exception, the current behaviour, and your suggestion, which agrees with Whether (length of result) = (length of array) - (length of window) + 1 or (length of result) = max((length of array) - length of window) + 1, 0), if (length of window) = (length of array) + 1 then (length of result) = 0. |
I don't understand the difference in why one should be a bug and the other not, to me they seem the same: In either case there is valid "boundary" data that is not part of the result. |
Both
are reasonable. However, the current behaviour, that (length of result) = (length of array) - (length of window) + 1 unless (length of result) would equal 0 is not reasonable. There ought not to be such an arbitrary exception. |
What doth hinder this to be merged? |
@seberg, the following table shows how the length of the result in a dimension depends on the lengths of the array and the window in that dimension, under the three behaviours discussed.
Both behaviours for window > array + 1 are reasonable; this request does not change the behaviour in that case; it simply corrects the mistake in the case that window = array + 1. |
I desire a review of this request. |
Sorry, most of our reviewer bandwidth is going toward fixing urgent problems required for 2.0.0rc1. |
FWIW, I still strongly disagree that the window-size calculation matters. In my view the only thing that matters is that there are data points which are not part of the result.
I am not sure if there is much surprise with an empty result, so I am OK with it, but I wouldn't like to hear one more opinion clearly in favor. Unlike many other "empty array" results which I would find obvious, I still think this one has a slightly different quality. |
ValueError
if in some dimension the window is longer than the array by 1ValueError
if in some dimension the window is longer than the array by 1
ValueError
if in some dimension the window is longer than the array by 1
>>> import numpy as np
>>> np.diff(np.arange(4))
array([1, 1, 1])
>>> np.diff(np.arange(3))
array([1, 1])
>>> np.diff(np.arange(2))
array([1])
>>> np.diff(np.arange(1))
array([], dtype=int64)
>>> np.diff(np.arange(0))
array([], dtype=int64) It is surprising that def diff(a, axis=-1):
pairs = np.lib.stride_tricks.sliding_window_view(a, 2, axis=axis)
return pairs[..., 1] - pairs[..., 0] at least whenever Between n values there are n-1 pairs. It is a bug for |
Good point!
the current choice of returning an empty result for "n=1" but not "n=0", at least to me doesn't make sense. What does make sense is to call it an enhancement to allow empty results in general. |
@seberg, no. Consider the case where the window is of length 2. |
The proposal simply treats n=1 the same as n=2, n=3, n=4 and so on. |
Returning an empty array for n=0 is different, because then it is no longer true that (result length) = (array length) - (window length) + 1. |
@JDawson-Camlin let's just agree to disagree. I simply don't think the formula |
Describe the issue:
Observe the shapes of the following arrays:
If
x
is a 1-dimensional array andwindow_shape
is a counting number less than or equal tolen(x) + 1
thensliding_window_view(x, window_shape)
ought to have shape(len(x) - window_shape + 1, window_shape)
.However, there is a bug if the result would have length 0:
The result ought to be
Instead we get
This bug is present in higher-dimensional cases also.
Finally, note that it is correct to raise
ValueError
if in some dimension the window shape is longer than the array by 2 or more.Reproduce the code example:
Error message:
Python and NumPy Versions:
1.26.4
3.12.2 (main, Feb 25 2024, 16:36:57) [GCC 9.4.0]
Runtime Environment:
[{'numpy_version': '1.26.4',
'python': '3.12.2 (main, Feb 25 2024, 16:36:57) [GCC 9.4.0]',
'uname': uname_result(system='Linux', node='5420', release='5.15.0-97-generic', version='#107~20.04.1-Ubuntu SMP Fri Feb 9 14:20:11 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': '/home/jdawson/Documents/bugs/numpy-sliding/.venv/lib/python3.12/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23.dev'}]
None
Context for the issue:
The bug necessitates extra code around
sliding_window_view
to handle this case.