ENH: allow sliding_window_view to return an empty array for large windows #25942

ghost · 2024-03-06T15:28:13Z

Describe the issue:

Observe the shapes of the following arrays:

>>> from numpy.lib.stride_tricks import sliding_window_view
>>> sliding_window_view([0, 1, 2, 3], 2)
array([[0, 1],
       [1, 2],
       [2, 3]])
>>> sliding_window_view([0, 1, 2], 2)
array([[0, 1],
       [1, 2]])
>>> sliding_window_view([0, 1], 2)
array([[0, 1]])

If x is a 1-dimensional array and window_shape is a counting number less than or equal to len(x) + 1 then sliding_window_view(x, window_shape) ought to have shape (len(x) - window_shape + 1, window_shape).

However, there is a bug if the result would have length 0:

>>> sliding_window_view([0], 2)

The result ought to be

array([], shape=(0, 2), dtype=int64)

Instead we get

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jdawson/Documents/bugs/numpy-sliding/.venv/lib/python3.12/site-packages/numpy/lib/stride_tricks.py", line 332, in sliding_window_view
    raise ValueError(
ValueError: window shape cannot be larger than input array shape

This bug is present in higher-dimensional cases also.

Finally, note that it is correct to raise ValueError if in some dimension the window shape is longer than the array by 2 or more.

Reproduce the code example:

import numpy as np

np.lib.stride_tricks.sliding_window_view([0], 2)

Error message:

Traceback (most recent call last):
  File "/home/jdawson/Documents/bugs/numpy-sliding/sliding_bug.py", line 4, in <module>
    np.lib.stride_tricks.sliding_window_view([0], 2)
  File "/home/jdawson/Documents/bugs/numpy-sliding/.venv/lib/python3.12/site-packages/numpy/lib/stride_tricks.py", line 332, in sliding_window_view
    raise ValueError(
ValueError: window shape cannot be larger than input array shape

Python and NumPy Versions:

1.26.4
3.12.2 (main, Feb 25 2024, 16:36:57) [GCC 9.4.0]

Runtime Environment:

[{'numpy_version': '1.26.4',
'python': '3.12.2 (main, Feb 25 2024, 16:36:57) [GCC 9.4.0]',
'uname': uname_result(system='Linux', node='5420', release='5.15.0-97-generic', version='#107~20.04.1-Ubuntu SMP Fri Feb 9 14:20:11 UTC 2024', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'SkylakeX',
'filepath': '/home/jdawson/Documents/bugs/numpy-sliding/.venv/lib/python3.12/site-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so',
'internal_api': 'openblas',
'num_threads': 8,
'prefix': 'libopenblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.23.dev'}]
None

Context for the issue:

The bug necessitates extra code around sliding_window_view to handle this case.

Close #25942.

seberg · 2024-03-08T07:45:24Z

This does make a lot of sense to me, but on first glance, I am surprisingly unsure if that generalization is maybe surprising after all.
I think the right question is whether there are use-cases where not using a value at all would be wrong (i.e. normally you use all values at least once). I can't think of any, but I am not sure; clearly your use-case not using the values, that is what you want.

Not sure why it should matter whether the window size is exactly 1 smaller at most conceptually. Why not use max(0, ...) for the new dimension?

A helpful thing would be to see what the prior art is in pandas/scipy? E.g. pandas rolling window operations and scipy.ndimage filters?

ghost · 2024-03-08T10:51:12Z

A helpful thing would be to see what the prior art is in pandas/scipy? E.g. pandas rolling window operations and scipy.ndimage filters?

The Python standard library has itertools.pairwise, for windows of length 2. For an input of length 1 its output has length 0:

>>> import itertools
>>> list(itertools.pairwise([0, 1, 2]))
[(0, 1), (1, 2)]
>>> list(itertools.pairwise([0, 1]))
[(0, 1)]
>>> list(itertools.pairwise([0]))
[]

pandas.Series.rolling does a different kind of operation from sliding_window_view: its output is as long as its input, padded with NaNs as necessary.

Not sure why it should matter whether the window size is exactly 1 smaller at most conceptually. Why not use max(0, ...) for the new dimension?

If (length of window) > (length of array) + 1 both raising an exception, the current behaviour, and your suggestion, which agrees with itertools.pairwise, are reasonable. However, changing from the former to the latter is a change of functionality, whereas my pull request for the case that (length of window) = (length of array) + 1 resolves a bug and so should be less controversial.

Whether

(length of result) = (length of array) - (length of window) + 1

or

(length of result) = max((length of array) - length of window) + 1, 0),

if

(length of window) = (length of array) + 1

then

(length of result) = 0.

seberg · 2024-03-08T11:52:57Z

resolves a bug and so should be less controversial

I don't understand the difference in why one should be a bug and the other not, to me they seem the same: In either case there is valid "boundary" data that is not part of the result.
The pairwise is indeed prior art, it seems ndimage is not (like rolling it always returns the same length as the input, and has a parameter which might mask boundary values min_periods).

ghost · 2024-03-08T13:39:36Z

I don't understand the difference in why one should be a bug and the other not, to me they seem the same: In either case there is valid "boundary" data that is not part of the result.

Both

The behaviour that (length of result) = (length of array) - (length of window) + 1; and
The behaviour that (length of result) = max((length of array) - length of window) + 1, 0)

are reasonable.

However, the current behaviour, that (length of result) = (length of array) - (length of window) + 1 unless (length of result) would equal 0 is not reasonable. There ought not to be such an arbitrary exception.

ghost · 2024-03-15T05:16:19Z

What doth hinder this to be merged?

ghost · 2024-03-15T05:42:42Z

@seberg, the following table shows how the length of the result in a dimension depends on the lengths of the array and the window in that dimension, under the three behaviours discussed.

	window < array + 1	window = array + 1	window > array + 1
now	array - window + 1	ValueError	ValueError
this request	array - window + 1	0	ValueError
`max(0, ...)`	array - window + 1	0	0

Both behaviours for window > array + 1 are reasonable; this request does not change the behaviour in that case; it simply corrects the mistake in the case that window = array + 1.

ghost · 2024-03-21T08:11:08Z

I desire a review of this request.

mattip · 2024-03-21T08:18:47Z

Sorry, most of our reviewer bandwidth is going toward fixing urgent problems required for 2.0.0rc1.

seberg · 2024-04-09T12:27:37Z

FWIW, I still strongly disagree that the window-size calculation matters. In my view the only thing that matters is that there are data points which are not part of the result.

list(itertools.pairwise([])) is indeed some prior art which just discards the result (and also works for fully empty, btw.).
Maybe short of more prior art, you can explain the use-case as a motivation?

I am not sure if there is much surprise with an empty result, so I am OK with it, but I wouldn't like to hear one more opinion clearly in favor. Unlike many other "empty array" results which I would find obvious, I still think this one has a slightly different quality.

ghost · 2024-04-10T11:30:53Z

numpy.diff behaves like itertools.pairwise:

>>> import numpy as np
>>> np.diff(np.arange(4))
array([1, 1, 1])
>>> np.diff(np.arange(3))
array([1, 1])
>>> np.diff(np.arange(2))
array([1])
>>> np.diff(np.arange(1))
array([], dtype=int64)
>>> np.diff(np.arange(0))
array([], dtype=int64)

It is surprising that numpy.diff is not equivalent to

def diff(a, axis=-1):
    pairs = np.lib.stride_tricks.sliding_window_view(a, 2, axis=axis)
    return pairs[..., 1] - pairs[..., 0]

at least whenever a.shape[axis] >= 1.

Between n values there are n-1 pairs. It is a bug for sliding_window_view to have an ad hoc exception for n=1.

seberg · 2024-04-10T12:05:43Z

Good point! np.diff pushes me slightly in favor. I am still adament about dropping the idea that this was ever an "ad hoc exception for n=1". I don't see it that way, n=0 is no different (also not in diff as you said). There are two choices:

You raise an error (what we have)
You always return an empty result.

the current choice of returning an empty result for "n=1" but not "n=0", at least to me doesn't make sense. What does make sense is to call it an enhancement to allow empty results in general.

ghost · 2024-04-10T14:56:32Z

@seberg, no. Consider the case where the window is of length 2.

As you can see, returning a result of length 0 when the array is of length 1 follows the pattern for arrays of length 2 or more. There is no reason whatever to raise an exception. Why should n=1 be treated differently from n=2, n=3 and so on?

ghost · 2024-04-10T15:02:05Z

the current choice of returning an empty result for "n=1" but not "n=0", at least to me doesn't make sense.

The proposal simply treats n=1 the same as n=2, n=3, n=4 and so on.

ghost · 2024-04-10T15:06:00Z

@seberg,

n=0 is no different

Returning an empty array for n=0 is different, because then it is no longer true that (result length) = (array length) - (window length) + 1.

seberg · 2024-04-10T15:14:56Z

@JDawson-Camlin let's just agree to disagree. I simply don't think the formula result_length = array_length - window_length + 1 is conceptually important to what the result should be for these.

ghost added the 00 - Bug label Mar 5, 2024

John Dawson added 2 commits March 6, 2024 15:13

BUG: increase sliding_window_view's maximum window length by 1

3ddfc10

Close #25942.

Correct commit 3ddfc10 by breaking a line of length 80

760221c

seberg changed the title ~~BUG: sliding_window_view raises ValueError if in some dimension the window is longer than the array by 1~~ ENH: sliding_window_view raises ValueError if in some dimension the window is longer than the array by 1 Apr 10, 2024

seberg changed the title ~~ENH: sliding_window_view raises ValueError if in some dimension the window is longer than the array by 1~~ ENH: allow sliding_window_view to return an empty array for large windows Apr 10, 2024

seberg added 01 - Enhancement and removed 00 - Bug labels Apr 10, 2024

ghost closed this by deleting the head repository Jun 3, 2024

This pull request was closed.

Uh oh!

ENH: allow sliding_window_view to return an empty array for large windows #25942

ENH: allow sliding_window_view to return an empty array for large windows #25942

Uh oh!

Conversation

ghost commented Mar 6, 2024

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

Uh oh!

seberg commented Mar 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Mar 8, 2024 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Mar 8, 2024

Uh oh!

ghost commented Mar 8, 2024

Uh oh!

ghost commented Mar 15, 2024

Uh oh!

ghost commented Mar 15, 2024

Uh oh!

ghost commented Mar 21, 2024

Uh oh!

mattip commented Mar 21, 2024

Uh oh!

seberg commented Apr 9, 2024

Uh oh!

ghost commented Apr 10, 2024

Uh oh!

seberg commented Apr 10, 2024

Uh oh!

ghost commented Apr 10, 2024

Uh oh!

ghost commented Apr 10, 2024

Uh oh!

ghost commented Apr 10, 2024

Uh oh!

seberg commented Apr 10, 2024

Uh oh!

Uh oh!

seberg commented Mar 8, 2024 •

edited

Loading

ghost commented Mar 8, 2024 •

edited by ghost

Loading