Skip to content

ENH: allow sliding_window_view to return an empty array for large windows #25942

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

ENH: allow sliding_window_view to return an empty array for large windows #25942

wants to merge 2 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 6, 2024

Describe the issue:

Observe the shapes of the following arrays:

>>> from numpy.lib.stride_tricks import sliding_window_view
>>> sliding_window_view([0, 1, 2, 3], 2)
array([[0, 1],
       [1, 2],
       [2, 3]])
>>> sliding_window_view([0, 1, 2], 2)
array([[0, 1],
       [1, 2]])
>>> sliding_window_view([0, 1], 2)
array([[0, 1]])

If x is a 1-dimensional array and window_shape is a counting number less than or equal to len(x) + 1 then sliding_window_view(x, window_shape) ought to have shape (len(x) - window_shape + 1, window_shape).

However, there is a bug if the result would have length 0:

>>> sliding_window_view([0], 2)

The result ought to be

array([], shape=(0, 2), dtype=int64)

Instead we get

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jdawson/Documents/bugs/numpy-sliding/.venv/lib/python3.12/site-packages/numpy/lib/stride_tricks.py", line 332, in sliding_window_view
    raise ValueError(
ValueError: window shape cannot be larger than input array shape

This bug is present in higher-dimensional cases also.

Finally, note that it is correct to raise ValueError if in some dimension the window shape is longer than the array by 2 or more.

Reproduce the code example:

import numpy as np

np.lib.stride_tricks.sliding_window_view([0], 2)

Error message:

Traceback (most recent call last):
  File "/home/jdawson/Documents/bugs/numpy-sliding/sliding_bug.py", line 4, in <module>
    np.lib.stride_tricks.sliding_window_view([0], 2)
  File "/home/jdawson/Documents/bugs/numpy-sliding/.venv/lib/python3.12/site-packages/numpy/lib/stride_tricks.py", line 332, in sliding_window_view
    raise ValueError(
ValueError: window shape cannot be larger than input array shape

Python and NumPy Versions:

1.26.4
3.12.2 (main, Feb 25 2024, 16:36:57) [GCC 9.4.0]

Runtime Environment:

[{'numpy_version': '1.26.4',
'python': '3.12.2 (main, Feb 25 2024, 16:36:57) [GCC 9.4.0]',
'uname': uname_result(system='Linux', node='5420', release='5.15.0-97-generic', version='#107~20.04.1-Ubuntu SMP Fri Feb 9 14:20:11 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': '/home/jdawson/Documents/bugs/numpy-sliding/.venv/lib/python3.12/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23.dev'}]
None

Context for the issue:

The bug necessitates extra code around sliding_window_view to handle this case.

@ghost ghost added the 00 - Bug label Mar 5, 2024
@seberg
Copy link
Member

seberg commented Mar 8, 2024

This does make a lot of sense to me, but on first glance, I am surprisingly unsure if that generalization is maybe surprising after all.
I think the right question is whether there are use-cases where not using a value at all would be wrong (i.e. normally you use all values at least once). I can't think of any, but I am not sure; clearly your use-case not using the values, that is what you want.

Not sure why it should matter whether the window size is exactly 1 smaller at most conceptually. Why not use max(0, ...) for the new dimension?

A helpful thing would be to see what the prior art is in pandas/scipy? E.g. pandas rolling window operations and scipy.ndimage filters?

@ghost
Copy link
Author

ghost commented Mar 8, 2024

A helpful thing would be to see what the prior art is in pandas/scipy? E.g. pandas rolling window operations and scipy.ndimage filters?

The Python standard library has itertools.pairwise, for windows of length 2. For an input of length 1 its output has length 0:

>>> import itertools
>>> list(itertools.pairwise([0, 1, 2]))
[(0, 1), (1, 2)]
>>> list(itertools.pairwise([0, 1]))
[(0, 1)]
>>> list(itertools.pairwise([0]))
[]

pandas.Series.rolling does a different kind of operation from sliding_window_view: its output is as long as its input, padded with NaNs as necessary.

Not sure why it should matter whether the window size is exactly 1 smaller at most conceptually. Why not use max(0, ...) for the new dimension?

If (length of window) > (length of array) + 1 both raising an exception, the current behaviour, and your suggestion, which agrees with itertools.pairwise, are reasonable. However, changing from the former to the latter is a change of functionality, whereas my pull request for the case that (length of window) = (length of array) + 1 resolves a bug and so should be less controversial.

Whether

(length of result) = (length of array) - (length of window) + 1

or

(length of result) = max((length of array) - length of window) + 1, 0),

if

(length of window) = (length of array) + 1

then

(length of result) = 0.

@seberg
Copy link
Member

seberg commented Mar 8, 2024

resolves a bug and so should be less controversial

I don't understand the difference in why one should be a bug and the other not, to me they seem the same: In either case there is valid "boundary" data that is not part of the result.
The pairwise is indeed prior art, it seems ndimage is not (like rolling it always returns the same length as the input, and has a parameter which might mask boundary values min_periods).

@ghost
Copy link
Author

ghost commented Mar 8, 2024

I don't understand the difference in why one should be a bug and the other not, to me they seem the same: In either case there is valid "boundary" data that is not part of the result.

Both

  1. The behaviour that (length of result) = (length of array) - (length of window) + 1; and
  2. The behaviour that (length of result) = max((length of array) - length of window) + 1, 0)

are reasonable.

However, the current behaviour, that (length of result) = (length of array) - (length of window) + 1 unless (length of result) would equal 0 is not reasonable. There ought not to be such an arbitrary exception.

@ghost
Copy link
Author

ghost commented Mar 15, 2024

What doth hinder this to be merged?

@ghost
Copy link
Author

ghost commented Mar 15, 2024

@seberg, the following table shows how the length of the result in a dimension depends on the lengths of the array and the window in that dimension, under the three behaviours discussed.

window < array + 1 window = array + 1 window > array + 1
now array - window + 1 ValueError ValueError
this request array - window + 1 0 ValueError
max(0, ...) array - window + 1 0 0

Both behaviours for window > array + 1 are reasonable; this request does not change the behaviour in that case; it simply corrects the mistake in the case that window = array + 1.

@ghost
Copy link
Author

ghost commented Mar 21, 2024

I desire a review of this request.

@mattip
Copy link
Member

mattip commented Mar 21, 2024

Sorry, most of our reviewer bandwidth is going toward fixing urgent problems required for 2.0.0rc1.

@seberg
Copy link
Member

seberg commented Apr 9, 2024

FWIW, I still strongly disagree that the window-size calculation matters. In my view the only thing that matters is that there are data points which are not part of the result.

list(itertools.pairwise([])) is indeed some prior art which just discards the result (and also works for fully empty, btw.).
Maybe short of more prior art, you can explain the use-case as a motivation?

I am not sure if there is much surprise with an empty result, so I am OK with it, but I wouldn't like to hear one more opinion clearly in favor. Unlike many other "empty array" results which I would find obvious, I still think this one has a slightly different quality.

@seberg seberg changed the title BUG: sliding_window_view raises ValueError if in some dimension the window is longer than the array by 1 ENH: sliding_window_view raises ValueError if in some dimension the window is longer than the array by 1 Apr 10, 2024
@seberg seberg changed the title ENH: sliding_window_view raises ValueError if in some dimension the window is longer than the array by 1 ENH: allow sliding_window_view to return an empty array for large windows Apr 10, 2024
@ghost
Copy link
Author

ghost commented Apr 10, 2024

numpy.diff behaves like itertools.pairwise:

>>> import numpy as np
>>> np.diff(np.arange(4))
array([1, 1, 1])
>>> np.diff(np.arange(3))
array([1, 1])
>>> np.diff(np.arange(2))
array([1])
>>> np.diff(np.arange(1))
array([], dtype=int64)
>>> np.diff(np.arange(0))
array([], dtype=int64)

It is surprising that numpy.diff is not equivalent to

def diff(a, axis=-1):
    pairs = np.lib.stride_tricks.sliding_window_view(a, 2, axis=axis)
    return pairs[..., 1] - pairs[..., 0]

at least whenever a.shape[axis] >= 1.

Between n values there are n-1 pairs. It is a bug for sliding_window_view to have an ad hoc exception for n=1.

@seberg
Copy link
Member

seberg commented Apr 10, 2024

Good point! np.diff pushes me slightly in favor. I am still adament about dropping the idea that this was ever an "ad hoc exception for n=1". I don't see it that way, n=0 is no different (also not in diff as you said). There are two choices:

  1. You raise an error (what we have)
  2. You always return an empty result.

the current choice of returning an empty result for "n=1" but not "n=0", at least to me doesn't make sense. What does make sense is to call it an enhancement to allow empty results in general.

@ghost
Copy link
Author

ghost commented Apr 10, 2024

@seberg, no. Consider the case where the window is of length 2.
Figure_1
As you can see, returning a result of length 0 when the array is of length 1 follows the pattern for arrays of length 2 or more. There is no reason whatever to raise an exception. Why should n=1 be treated differently from n=2, n=3 and so on?

@ghost
Copy link
Author

ghost commented Apr 10, 2024

the current choice of returning an empty result for "n=1" but not "n=0", at least to me doesn't make sense.

The proposal simply treats n=1 the same as n=2, n=3, n=4 and so on.

@ghost
Copy link
Author

ghost commented Apr 10, 2024

@seberg,

n=0 is no different

Returning an empty array for n=0 is different, because then it is no longer true that (result length) = (array length) - (window length) + 1.

@seberg
Copy link
Member

seberg commented Apr 10, 2024

@JDawson-Camlin let's just agree to disagree. I simply don't think the formula result_length = array_length - window_length + 1 is conceptually important to what the result should be for these.

@ghost ghost closed this by deleting the head repository Jun 3, 2024
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants