Skip to content

BUG: Fix padding with large integers #11033

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 2, 2018
Merged

BUG: Fix padding with large integers #11033

merged 2 commits into from
May 2, 2018

Conversation

lagru
Copy link
Contributor

@lagru lagru commented May 2, 2018

Closes #11027

The old way of creating the padded array padded with wrong values for
large integers because the new prepended / appended array was implicitly
created with dtype float64:

>>> (np.zeros(1) + (2 ** 64 - 1)).astype(np.uint64)
array([0], np.uint64)
>>> (np.zeros(1) + (2 ** 63 - 1)).astype(np.int64)
array([-9223372036854775808])

cc @mhvk

The old way of creating the padded array padded with wrong values for
large integers because the new prepended / appended array was implicitly
created with dtype float64:

>>> (np.zeros(1) + (2 ** 64 - 1)).astype(np.uint64)
array([0], np.uint64)
>>> (np.zeros(1) + (2 ** 63 - 1)).astype(np.int64)
array([-9223372036854775808])
Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good; happy to see an even simpler solution than the one I suggested! I'll let is sit for the day just to let other people have a chance to chime in.

@@ -138,8 +138,8 @@ def _append_const(arr, pad_amt, val, axis=-1):
return np.concatenate((arr, np.zeros(padshape, dtype=arr.dtype)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While you're here, you could remove this branch too

Copy link
Contributor Author

@lagru lagru May 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Although np.full(padshape, 0) seems to be a little slower than np.zeros(padshape)...

@lagru
Copy link
Contributor Author

lagru commented May 2, 2018

Just to make this comment more visible: Replacing np.zeros(...) with np.full(..., fill_value=0) in a5f94a9 seems to come with a small performance penalty.

@mhvk
Copy link
Contributor

mhvk commented May 2, 2018

@lagru - given that pad does unnecessary copies and that concatenating is not that fast anyway, I would not currently worry about performance too much... Overall, I think the improved code clarify is well worth a small loss in performance.

@lagru
Copy link
Contributor Author

lagru commented May 2, 2018

@mhvk

Overall, I think the improved code clarify is well worth a small loss in performance.

Sounds reasonable.

given that pad does unnecessary copies

If you don't mind me asking, where exactly are unnecessary copies made? This statement sounds like there would be a faster option to achieve the same thing pad(..., mode="constant") does. If so, how?

@mhvk
Copy link
Contributor

mhvk commented May 2, 2018

From just a quick look, copies are made implicitly by passing any input through np.array at https://github.com/numpy/numpy/blob/master/numpy/lib/arraypad.py#L1300, explicitly at https://github.com/numpy/numpy/blob/master/numpy/lib/arraypad.py#L1369, and then implicitly again by using concatenate.

Some cleanup might be good... (but definitely in a separate PR!)

@eric-wieser
Copy link
Member

eric-wieser commented May 2, 2018

I've a couple patches in the works for np.pad (#11011 + #11012 + some offline) that would likely conflict with changes that remove some redundant copies, so it might be nice to get thoes merged first before taking that on.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Feel free to merge, @mhvk.

Regarding full being slower - that sounds like a possible optimization that could be made inside full.

@mhvk
Copy link
Contributor

mhvk commented May 2, 2018

OK, merging. Thanks, @lagru!

@mhvk mhvk merged commit b946795 into numpy:master May 2, 2018
@lagru lagru deleted the pad branch May 2, 2018 21:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants