Skip to content

Add out of bound check for as_strided #8315

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

uchida
Copy link

@uchida uchida commented Nov 25, 2016

numpy.lib.stride_trick.as_strided is useful but dangerous function, it may lead out of bounds memory access when one give illegal shape and strides.

This pull request checks out of bounds by calculating/comparing actual bytes of original and returned arrays in memory, and raises ValueError when out of bounds.

@uchida uchida force-pushed the add-out-of-bound-check-for-as_strided branch from bfb0ce8 to 7c668a1 Compare November 25, 2016 16:08
@uchida uchida changed the title Add out of bound check for as strided Add out of bound check for numpy.lib.stride_trick.as_strided Nov 25, 2016
@uchida uchida changed the title Add out of bound check for numpy.lib.stride_trick.as_strided Add out of bound check for numpy.lib.stride_trick.as_strided Nov 25, 2016
@uchida uchida changed the title Add out of bound check for numpy.lib.stride_trick.as_strided Add out of bound check for as_strided Nov 25, 2016
@uchida uchida force-pushed the add-out-of-bound-check-for-as_strided branch from 7c668a1 to 05e9be8 Compare November 25, 2016 16:20
@uchida uchida force-pushed the add-out-of-bound-check-for-as_strided branch from 05e9be8 to 9a5b27f Compare November 25, 2016 16:29
@uchida
Copy link
Author

uchida commented Nov 25, 2016

There are some error on tests, so one may want to keep no out of bounds checks.
To keep backward compatibility, out of bounds check should be an optional feature.
In 9a401db, check_bounds=False is added and can switch with keyword arguments.

@pv
Copy link
Member

pv commented Nov 25, 2016 via email

@uchida
Copy link
Author

uchida commented Nov 25, 2016

@pv Thank you for notifying me about negative strides.

If negative stride is given, the under-reaching address: some element head (new) < array head (original), will occur and most users should avoid these cases.
I think the possible use-case for negative strides is as_strided(a[offset:], shape=shape, strides=negative_strides)
However, only numpy experts, understanding how as_strided works and able to use check_bound=False, could use them.

Therefore I've added check code for negative stride and raise ValueError when check_bound=True and negative stride is given.

@uchida
Copy link
Author

uchida commented Nov 25, 2016

I noticed that a.view[::-1] has negative strides, so ValueError for negative stride may gives false negative.
I've added more explicit check head (original) <= head (new) and tail(new) <= tail (original) in dc2c98b.

@@ -35,7 +35,19 @@ def _maybe_view_as_subclass(original_array, new_array):
return new_array


def as_strided(x, shape=None, strides=None, subok=False, writeable=True):
def _actual_range(x):
if not isinstance(x, np.ndarray):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is guaranteed to be an ndarray instance here, so this test can be omitted.

def _actual_range(x):
if not isinstance(x, np.ndarray):
x = np.array(x)
head = np.sum((np.array(x.shape) - 1) *
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For small lists, creating arrays creates a large overhead:

In [27]: x = np.ones((4,5,3,8,6))

In [28]: %timeit np.sum((np.array(x.shape) - 1) * np.array([s if s > 0 else 0 for s in x.strides]))
10000 loops, best of 3: 23.7 µs per loop

In [29]: %timeit sum((sh - 1) * (s if s > 0 else 0) for sh, s in zip(x.shape, x.strides))
100000 loops, best of 3: 2.28 µs per loop

So, I'd replace with the example given.

@@ -105,6 +127,13 @@ def as_strided(x, shape=None, strides=None, subok=False, writeable=True):
# This should only happen if x.dtype is [('', 'Vx')]
array.dtype = x.dtype

head_orig, tail_orig = _actual_range(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two calculations should be moved inside the if check_bounds statement.

@mhvk
Copy link
Contributor

mhvk commented Nov 26, 2016

@uchida - I think this is a good addition and my sense would be to just add the checks, or at least let the default be True. However, you noted there were test failures when you initially added the checks. Is that still the case with your current version? If so, which tests fail?

@uchida
Copy link
Author

uchida commented Nov 27, 2016

@mhvk Thank you for giving review comments, reflecting your review comments in f003415.
And still some tests failed in 57af7f0, but it seems no problem to pass check_bounds=False.
In 997747f and ef7ddae, I've changed check_bounds=False for failed call of as_strided.

A bit off-topic from implementation, I've used BUG acronyms in prior commits, 9a5b27f to c1676c1, but if to use MAINT is much appropriate, I will rewrite these commits or resend ea pull request, how about them?

@uchida
Copy link
Author

uchida commented Nov 27, 2016

@@ -35,7 +35,18 @@ def _maybe_view_as_subclass(original_array, new_array):
return new_array


def as_strided(x, shape=None, strides=None, subok=False, writeable=True):
def _actual_head(x):
return sum((sh - 1) * (st if st < 0 else 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dunno, but this code smells a bit like it might go bad if you have an empty array. There is also np.byte_bounds which seems similar, just to note though, dunno if its worth using it or so.

@uchida uchida force-pushed the add-out-of-bound-check-for-as_strided branch from 81093c8 to ef7ddae Compare November 27, 2016 14:02
@uchida
Copy link
Author

uchida commented Dec 3, 2016

I've added following changes:

  • add arr.size check, zero size array or array has a zero dim, raise ValueError when check_bounds=True
  • revert check_bounds=True for default and related testcases
  • introduce numpy.lib.stride_tricks.resample, alias function to numpy.lib.stride_tricks.as_strided with check_bounds=True, (but naming could be an rolling_window or other good name)
  • add description in release notes

@uchida
Copy link
Author

uchida commented Dec 3, 2016

Well, @seberg, the real roling_window solution means introducing higer level function like Multidimensional rolling_window for numpy.

@seberg
Copy link
Member

seberg commented Dec 3, 2016

The rolling window stuff is a different story anyway...

a_low, a_high = np.byte_bounds(array)
if x.size == 0 or a_low < x_low or x_high < a_high:
raise ValueError(("given shape and strides will cause "
"out of bounds of original array"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the extra set of brackets, but OK.

@seberg
Copy link
Member

seberg commented Jan 5, 2017

Sorry for forgetting about this a bit. @mhvk whats your opinion on this? I think we should add this. I am unsure about the resample function though. It seems to me we might be blocking a name that may be more useful for something else/more high level.

Could have a few more tests for things like half of the dtype is out of bounds, but since it uses byte bounds, etc. it should be OK.

@seberg
Copy link
Member

seberg commented Jan 5, 2017

We could also think about setting the default to None (or "warn"). Then add a FutureWarning if it is an out of bound array and ask the user to explicitly set it to False. If very few people use it for huge arrays, that should be OK, otherwise, we probably would have to default to False for now, and wait a bit until thinking about it again.

@mhvk
Copy link
Contributor

mhvk commented Jan 9, 2017

I like the addition, but feel we should not add the resample function in this PR, as I agree that we then might as well try to get a really nice rolling_window function. Instead, stick here with just providing a building block for that (and the option for better security already for those who want it).

As for the default, I like the idea of giving a FutureWarning by default for out-of-bound arrays.

@uchida
Copy link
Author

uchida commented Jan 30, 2017

Sorry for being late to react, I've removed resample function and change check_bounds default to warn, checking Out of Bounds and warn FutureWarning.

@uchida uchida force-pushed the add-out-of-bound-check-for-as_strided branch from f7e3e90 to 1ccc80a Compare January 30, 2017 14:41
@uchida uchida force-pushed the add-out-of-bound-check-for-as_strided branch from 1ccc80a to 15ec56b Compare January 30, 2017 14:49
@uchida uchida force-pushed the add-out-of-bound-check-for-as_strided branch 2 times, most recently from ea7a11c to 819bb65 Compare January 30, 2017 16:40
@uchida uchida force-pushed the add-out-of-bound-check-for-as_strided branch from 819bb65 to 293d21e Compare January 30, 2017 16:45
if check_bounds:
x_low, x_high = np.byte_bounds(x)
a_low, a_high = np.byte_bounds(array)
if x.size == 0 or a_low < x_low or x_high < a_high:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the size == 0 check here for? Isn't this ok providing that a.size == 0 as well, and don't the next two conditions cover that?

assert_raises(ValueError, as_strided,
a, a.shape, (-a.itemsize,), check_bounds=True)
a = np.empty((2, 0))
assert_raises(ValueError, as_strided, a, a.shape, a.strides, check_bounds=True)
Copy link
Member

@eric-wieser eric-wieser Feb 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it should be fine to me (should not error)

@homu
Copy link
Contributor

homu commented Feb 22, 2017

☔ The latest upstream changes (presumably #8446) made this pull request unmergeable. Please resolve the merge conflicts.

@charris
Copy link
Member

charris commented Jan 25, 2021

Is this still relevant after recent changes to stride_tricks and the addition of a rolling window?

@seberg
Copy link
Member

seberg commented Jan 25, 2021

Since it wold require a rebase, lets close it. I think there may still be some value in it, so if anyone wants to pick it up, that sounds reasonable to me. But I agree that in the majority of cases the new sliding window function or broadcast_to probably will do the trick better. So the need should be much lower, and a keyword argument is probably not as convenient as the alternatives.

@seberg seberg closed this Jan 25, 2021
@charris
Copy link
Member

charris commented Jan 25, 2021

@uchida Thanks for making the PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants