-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
DOC: correction to numpy.pad docstring #13149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
LGTM, the docstring check passes so it seems the documentation fix is correct. Is there a reason the design is to modify the vector in-place? Is it more efficient that way? Perhaps the docstring could hint at the motivation for the (in my opinion) non-standard design. |
I am not entirely sure if that is intentional, but the relevant usage of the provided function occurs with Lines 1223 to 1236 in 17b31fd
In that section of code, the output of I guess now the question is whether to regard this as an error in documentation or as a bug in I came across the issue when working on PyWavelets/pywt#478. I was surprised to find that it was necessary to avoid an internal copy, which made me suspect that the return value was not being used. |
I started looking at |
I did some preliminary testing and did not see any performance difference for padding a relatively large array of size (256, 256, 256) by 2 with or without assignment to However, if we do keep the requirement for in-place modification, it is possible to get a substantial performance improvement (I saw around 30-40% for my test case) if we continue to rely on in-place modification, but replace calls to i.e. using def _apply_along_axis_inplace(func1d, axis, arr, *args, **kwargs):
"""
Apply a function to 1-D slices along the given axis.
This is like apply_along_axis, but relies on the modification to arr being
done in-place.
"""
# handle negative axes
arr = asanyarray(arr)
nd = arr.ndim
axis = normalize_axis_index(axis, nd)
# arr, with the iteration axis at the end
in_dims = list(range(nd))
inarr_view = transpose(arr, in_dims[:axis] + in_dims[axis+1:] + [axis])
# compute indices for the iteration axes, and append a trailing ellipsis to
# prevent 0d arrays decaying to scalars, which fixes gh-8642
inds = ndindex(inarr_view.shape[:-1])
inds = (ind + (Ellipsis,) for ind in inds)
# invoke the function on the first item
try:
ind0 = next(inds)
except StopIteration:
raise ValueError('Cannot apply_along_axis when any iteration dimensions are 0')
func1d(inarr_view[ind0], *args, **kwargs)
for ind in inds:
func1d(inarr_view[ind], *args, **kwargs)
return I suspect use of such user-defined functions is relatively rare, though, so I'm not sure it is worth optimizing this case? |
@lagru: I know you have been working on reducing memory allocations in |
First of, I have largely ignored that part of the function.
I prefer this solution and think it would be worth to further discuss how to avoid the unnecessary allocation in |
Yeah, I agree that it would be better to support it within If we don't care to make the functionality public, a hack to avoid modifying the signature could be to check for some "private" keyword such as |
It looks like this in-place assumption has been around since at least numpy 1.7 (that was the oldest tag I looked at). Assuming there is agreement to document it as requiring in-place operation, the modified test case here should make sure this is not inadvertently changed going forward. We can decide about potentially optimizing |
Thanks @grlee77, it would be nice to open a follow-on issue to optimize |
* DOC: fix mistatement in numpy.pad docstring
When a custom function is provided to
numpy.pad
, the docstring states that this function should return a rank 1 vector. However, it actually seems that it is required that the rank 1 vector input be modified in-place and it is not necessary to return an array.I also updated a test case to make sure it is consistent with the modified description proposed here.
Two slight modifications to the example from the docstring illustrate this behavior.
1.) Padding fails if a copy of
vector
is made internally:2.) Padding works if modification is done in-place, even without any return value: