-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: void .item() doesn't hold reference to original array #8157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cf49728
to
84bbfc7
Compare
Hmm, looks like you have researched the problem pretty deeply. Maybe @pv has some ideas. |
Wow, that issue died as a python issue long ago. Maybe give them a ping. |
npy_intp dims[1]; | ||
dims[0] = 1; | ||
Py_INCREF(descr); | ||
u = PyArray_NewFromDescr(&PyArray_Type, descr, 1, dims, NULL, ip, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could make this PEP8 compliant. Starting a new line after the opening (
might be the easiest way.
EDIT: Or just declare a flags
variable, would make it more understandable...
PyObject *ret, *u; | ||
|
||
/* first create a size-1 view of this element */ | ||
npy_intp dims[1]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a blank line after the declaration. I'd move it up and move the comment down after the declarations.
numpy/core/tests/test_regression.py
Outdated
va = np.zeros(300000, 'V4') | ||
x = va[:1].item() | ||
del va | ||
x.tobytes() # segfault? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Always segfaults, or just now and then? Might move the trailing comment onto its own line above.
LGTM modulo some nits. I wonder a bit if returning memoryviews could cause backward compatibility problems, although it isn't clear that things were working before. I'm leaning to put this off to the next release so that things have a chance to settle. Maybe we should warn in the release notes? |
Yeah, I want to think a little more anyway, and I also want to try running pandas/astropy/scipy unit tests with this change. There might be a workaround to the performance problem through the fact that numpy usually considers unstructured void scalars to be "immutable", so often makes them copies instead of views. So maybe I can avoid the call to |
☔ The latest upstream changes (presumably #8235) made this pull request unmergeable. Please resolve the merge conflicts. |
84bbfc7
to
28b35d6
Compare
Rebased. Also, realized a simple solution to the problems above: Just return a memoryview of Note this PR will have a significant effect on the string representation of Old str behavior: >>> np.zeros(6, 'V4')
array([, , , , , ],
dtype='|V4') New behavior: >>> np.zeros(6, 'V4')
array([<memory at 0x7fd9bd3a6808>, <memory at 0x7fd9bd3a6808>,
<memory at 0x7fd9bd3a6808>, <memory at 0x7fd9bd3a6808>,
<memory at 0x7fd9bd3a6808>, <memory at 0x7fd9bd3a6808>],
dtype='|V4') I suggest that the new behavior is more correct, although wordier. This way we don't risk cluttering the terminal with invalid characters, and we don't end up with mystery spaces like in the old representation. Edit: Although... the memory addresses printed are a bit misleading. I think they are the addresses of the temporary void scalars created in array2string. Void scalars are always copies of the array elements, so the memory locations are not those of the original array. Probably the memory locations are sometimes repeated because one temporary scalar is desctroyed before the next is created. I could hide the addresses by defining |
Does the following work fine under this patch?
|
Your example is just printed differently: >>> dt = np.dtype([('i', int), ('v', 'V0')])
>>> np.zeros(6, dt)
array([(0, <memory at 0x7ffa3f09d240>), (0, <memory at 0x7ffa3f09d240>),
(0, <memory at 0x7ffa3f09d240>), (0, <memory at 0x7ffa3f09d240>),
(0, <memory at 0x7ffa3f09d240>), (0, <memory at 0x7ffa41accb70>)],
dtype=[('i', '<i8'), ('v', 'V')])
>>> m = _[0].item()[1]
>>> m.format, m.itemsize
('0x', 0) (also, this example uses a structured type, but this patch mainly affects unstructured void types) |
My concern was handling of Sure looks like a bug in python, the way that |
Also, I don't agree with your old behaviour - on python 3, I get:
Are you using 2.7? |
It is showing the address being pointed to. The problem is that during the process of printing the array, numpy makes a temporary copy of the memory and shows that instead. More in detail, numpy goes through each element, and makes a void scalar from it, and unstructured void scalars (unlike structured voids) always make a copy (not a view) of the array element they correspond to. So the printed memory refer's to an unstructured void scalar's buffer. We might fix that (in another PR?) by making unstructured void scalars point to the original array's memory (be views). That would also fix the problem that writing to an unstructured void scalar's memory currently doesn't affect the original array, unlike structured void scalars. The place where this all happens is in PyArray_Scalar. |
I'm using Python 3.6.0 in the printout above.... curious yours is different. |
Can you prove that to me with:
I see your point though, that in our case, neither address is at all useful. Since |
>>> arr = np.zeros(6, 'V4')
...: x = arr[0]
...: print(repr(x))
...: print(object.__repr__(x))
...:
<memory at 0x7f22d876b240>
<numpy.void object at 0x7f22d8790570> |
ee9b94a
to
360de17
Compare
5d05e5b
to
5ff80f5
Compare
Ah, fair enough, done. |
doc/release/1.14.0-notes.rst
Outdated
@@ -416,6 +416,10 @@ The printing style of ``np.void`` arrays is now independently customizable | |||
using the ``formatter`` argument to ``np.set_printoptions``, using the | |||
``'void'`` key, instead of the catch-all ``numpystr`` key as before. | |||
|
|||
unstructured void array's ``.item`` method now returns a bytes object | |||
--------------------------------------------------------------------- | |||
``.item`` now returns a ``bytes`` object instead of a buffer or byte array. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth drawing attention to the fact that this new object is readonly, unlike the previous buffer or byte array that was read/write
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For that reason, this maybe should be under Compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the other hand, the previous buffer or byte array wasn't a view or the original array or scalar. So writing to it had no effect. I'll add a note though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but it means that users who expected to get a mutable scratch buffer no longer do so.
I suppose the alternative would be to return a bytearray
object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I considered that too. But I suspect we would actually be causing bugs/confusion if we return a bytearray, since it vaguely implies it is a view when it isn't. Returning an immutable bytes
makes clear you only got a copy.
Updated the note, by the way.
5ff80f5
to
982ed77
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, but I'd like someone else's opinion on whether changing the mutability is too backwards-incompatible.
I have no idea if the change in mutability will have much impact. Probably the only way to find out is to try. I suppose we could issue a |
I doubt it will have much impact, since the old behavior was almost totally non-functional:
Edit: that last part isn't completely true... in python 2 the returned array was indeed a view if you do |
So, the choice comes down to picking between
Either of these is better than leaving things how they are. |
As an aside, this change will make |
If there is any doubt, then it's probably safest to issue a DeprecationWarning for 1.14 and leave the behavior alone, and then do the actual change to I just tried adding the warning, and running |
I don't see a good way to give a silenceable warning here |
I pushed this off to 1.15, will go with the warning first. |
@ahaldane Needs rebase. I'm looking to branch 1.15 pretty soon, should we push this off to the next release to allow the FutureWarning more time or just go ahead. I had the impression that this change was not expected to cause problems. |
982ed77
to
3ec8875
Compare
whoops, what happened here? |
@charris: To be clear, you want this to target 1.16? |
I think @charris was asking me to rebase for 1.15, which I did. So this can stay tagged for 1.15. |
OK, I'm looking to put this in. @ahaldane @eric-wieser Is that OK with you? |
Thanks Allan. |
Thanks Chuck! Yes it was OK. |
Fixes #8129 by making
void.item()
return a memoryview which holds a reference to the original array.Note this has very poor performance (eg in the example in #8129). The problem is that the only way I can see to create a
memoryview
that holds a reference to the base object is withPyMemoryView_FromObject
(and notPyMemoryView_FromMemory
orPyMemoryView_FromBuffer
). So I am forced to create a size-1 ndarray for every element of the array to pass toPyMemoryView_FromObject
and convert it to a memoryview, which is slow.Is there a better way? If not, hopefully this is a rare enough scenario that we don't need to care too much about performance.
Here is a list of relevant docs, for reference: