-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Add out of bound check for as_strided #8315
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add out of bound check for as_strided #8315
Conversation
bfb0ce8
to
7c668a1
Compare
numpy.lib.stride_trick.as_strided
numpy.lib.stride_trick.as_strided
7c668a1
to
05e9be8
Compare
05e9be8
to
9a5b27f
Compare
There are some error on tests, so one may want to keep no out of bounds checks. |
Note that strides can also be negative.
|
@pv Thank you for notifying me about negative strides. If negative stride is given, the under-reaching address: Therefore I've added check code for negative stride and raise |
I noticed that |
bd72bc8
to
c1676c1
Compare
@@ -35,7 +35,19 @@ def _maybe_view_as_subclass(original_array, new_array): | |||
return new_array | |||
|
|||
|
|||
def as_strided(x, shape=None, strides=None, subok=False, writeable=True): | |||
def _actual_range(x): | |||
if not isinstance(x, np.ndarray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x
is guaranteed to be an ndarray
instance here, so this test can be omitted.
def _actual_range(x): | ||
if not isinstance(x, np.ndarray): | ||
x = np.array(x) | ||
head = np.sum((np.array(x.shape) - 1) * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For small lists, creating arrays creates a large overhead:
In [27]: x = np.ones((4,5,3,8,6))
In [28]: %timeit np.sum((np.array(x.shape) - 1) * np.array([s if s > 0 else 0 for s in x.strides]))
10000 loops, best of 3: 23.7 µs per loop
In [29]: %timeit sum((sh - 1) * (s if s > 0 else 0) for sh, s in zip(x.shape, x.strides))
100000 loops, best of 3: 2.28 µs per loop
So, I'd replace with the example given.
@@ -105,6 +127,13 @@ def as_strided(x, shape=None, strides=None, subok=False, writeable=True): | |||
# This should only happen if x.dtype is [('', 'Vx')] | |||
array.dtype = x.dtype | |||
|
|||
head_orig, tail_orig = _actual_range(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two calculations should be moved inside the if check_bounds
statement.
@uchida - I think this is a good addition and my sense would be to just add the checks, or at least let the default be |
@mhvk Thank you for giving review comments, reflecting your review comments in f003415. A bit off-topic from implementation, I've used |
In ef7ddae, "numpy/core/tests/test_nditer.py", line 2070, in get_params still fails. |
@@ -35,7 +35,18 @@ def _maybe_view_as_subclass(original_array, new_array): | |||
return new_array | |||
|
|||
|
|||
def as_strided(x, shape=None, strides=None, subok=False, writeable=True): | |||
def _actual_head(x): | |||
return sum((sh - 1) * (st if st < 0 else 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dunno, but this code smells a bit like it might go bad if you have an empty array. There is also np.byte_bounds
which seems similar, just to note though, dunno if its worth using it or so.
81093c8
to
ef7ddae
Compare
I've added following changes:
|
Well, @seberg, the real |
The rolling window stuff is a different story anyway... |
a_low, a_high = np.byte_bounds(array) | ||
if x.size == 0 or a_low < x_low or x_high < a_high: | ||
raise ValueError(("given shape and strides will cause " | ||
"out of bounds of original array")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the extra set of brackets, but OK.
Sorry for forgetting about this a bit. @mhvk whats your opinion on this? I think we should add this. I am unsure about the resample function though. It seems to me we might be blocking a name that may be more useful for something else/more high level. Could have a few more tests for things like half of the dtype is out of bounds, but since it uses byte bounds, etc. it should be OK. |
We could also think about setting the default to |
I like the addition, but feel we should not add the As for the default, I like the idea of giving a |
Sorry for being late to react, I've removed |
f7e3e90
to
1ccc80a
Compare
1ccc80a
to
15ec56b
Compare
ea7a11c
to
819bb65
Compare
819bb65
to
293d21e
Compare
if check_bounds: | ||
x_low, x_high = np.byte_bounds(x) | ||
a_low, a_high = np.byte_bounds(array) | ||
if x.size == 0 or a_low < x_low or x_high < a_high: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the size == 0
check here for? Isn't this ok providing that a.size == 0
as well, and don't the next two conditions cover that?
assert_raises(ValueError, as_strided, | ||
a, a.shape, (-a.itemsize,), check_bounds=True) | ||
a = np.empty((2, 0)) | ||
assert_raises(ValueError, as_strided, a, a.shape, a.strides, check_bounds=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be fine to me (should not error)
☔ The latest upstream changes (presumably #8446) made this pull request unmergeable. Please resolve the merge conflicts. |
Is this still relevant after recent changes to |
Since it wold require a rebase, lets close it. I think there may still be some value in it, so if anyone wants to pick it up, that sounds reasonable to me. But I agree that in the majority of cases the new sliding window function or |
@uchida Thanks for making the PR. |
numpy.lib.stride_trick.as_strided
is useful but dangerous function, it may lead out of bounds memory access when one give illegal shape and strides.This pull request checks out of bounds by calculating/comparing actual bytes of original and returned arrays in memory, and raises
ValueError
when out of bounds.