Skip to content

don't copy data when changing shape #145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Conversation

tecki
Copy link
Contributor

@tecki tecki commented Aug 24, 2011

When changing the shape of an array, by writing, say,

a.shape = 10,
an AttributeError is raised if the new shape is incompatible
with the old one. This is the intended behaviour and no
problem.

Under the hood, however, what's happening is that
in this case the array is copied as it would be
if one had written

a = a.reshape((10,))
and once it is realized that this doesn't work, the
result is discarded again, and the said AttributeError
is raised.

I discovered that while working with humongous
matrices, which nearly fill my memory. I used
the a.shape=something method because it
was supposed to be a memory no-op. Unfortunately
it isn't, as numpy tried and failed to copy the array
and raised a MemoryError instead of an AttributeError.

I consider this all a bug, not a feature, so I fixed it.
With this pull request, no data is ever copied when
changing the shape of an array.

tecki added 3 commits August 24, 2011 11:04
as per the comment in shape.c, the use of PyArray_Reshape is not
recommended. Change set_shape accordingly in getset.c.
If a shape of an array is set, but the shape is incompatible,
the array is copied, and the copy immediately discarded.

This commit changes this behaviour so that no copy is ever
generated.
Move code of PyArray_Newshape and PyArray_Newshape_Nocopy into
one function.
if (ret == NULL) {
return -1;
if (!PyArray_IntpConverter(val, &newdims)) {
return -1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation. Are you using tabs by any chance? Wrong indent setting?

@charris
Copy link
Member

charris commented Oct 2, 2011

Can this be done by adding a function that checks if the array can be reshaped in place? Something like

int can_reshape_inplace(...)

I suppose there also needs to be a function that also checks if the array can be reshaped with a copy.

tecki added 3 commits October 2, 2011 19:28
Accidentally, I was using tabs not spaces. (Yes, I do have
correct tab settings, tab is eight characters, but yes,
spaces are better anyhow).
Instead of introducing a new function PyArray_Reshape_Nocopy,
the original function PyArray_Reshape gets extended.
The parameter `order` may be set to a new value `NPY_NOCOPY`,
which prevents PyArray_Reshape from copying.
@tecki
Copy link
Contributor Author

tecki commented Oct 2, 2011

Can this be done by adding a function that checks if the array can be reshaped in place? Something like

   int can_reshape_inplace(...)

certainly, it can be done this way, but I don't think it's a good
idea. We would have to maintain two
places that basically do the same: the function
_attempt_nocopy_reshape and this new function.
Typically, everytime we change one we will forget to change the other one...

NPY_KEEPORDER=2,
/* Don't copy data, so order obviously is kept.
* used as special value in PyArray_Newshape */
NPY_NOCOPY=3
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this may be a bad idea. This is a general enum in the publicly exposed API, so a special value for one API function isn't appropriate. Further, NOCOPY seems to have roughly the same meaning as KEEPORDER.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, NOCOPY has a very different meaning from KEEPORDER. The latter means that the order ought to
be kept (hence the name) while copying, while NOCOPY means that the data should not be copied.

@mwiebe
Copy link
Member

mwiebe commented Feb 5, 2012

This looks like a worthy bug to fix, but I don't think the way it's being fixed is good. Fixing it should not require changing the exposed NumPy API.

@tecki
Copy link
Contributor Author

tecki commented Mar 14, 2012

So, I just changed this pull request not to expose NPY_NOCOPY into
the API, by just declaring NPY_NOCOPY internally.

@teoliphant
Copy link
Member

I agree with tecki that this implementation of array_shape_set needs improvement so that the copy does not happen if it doesn't need to. But, there should be some function that tests this (it looks like there is a related function to such a function in current master _attempt_nocopy_reshape).

The NOCOPY flag to NewShape is an approach --- perhaps called KEEPORDER_AND_DATA

@charris
Copy link
Member

charris commented Apr 1, 2013

@tecki Needs rebase.
@seberg Sound like you thought this worthy. It does seem that there might be a function that performs the check available.

@seberg
Copy link
Member

seberg commented Apr 1, 2013

I do think it makes sense to fix. The _attempt_nocopy_reshape could be called more directly maybe, since it does almost exactly what is needed (it does need a slight addition, but that is easy and I actually have it already). To be honest, I don't know what approach is best. Just a note for the direct call approach: you have to remember finalizing the array, or some temporary one like it happens at the moment (needed for matrix support).

@charris
Copy link
Member

charris commented Mar 28, 2014

Closing on account of stale. @seberg Thoughts?

@charris charris closed this Mar 28, 2014
luyahan pushed a commit to plctlab/numpy that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants