BUG: void .item() doesn't hold reference to original array #8157

ahaldane · 2016-10-14T21:32:49Z

Fixes #8129 by making void.item() return a memoryview which holds a reference to the original array.

Note this has very poor performance (eg in the example in #8129). The problem is that the only way I can see to create a memoryview that holds a reference to the base object is with PyMemoryView_FromObject (and not PyMemoryView_FromMemory or PyMemoryView_FromBuffer). So I am forced to create a size-1 ndarray for every element of the array to pass to PyMemoryView_FromObject and convert it to a memoryview, which is slow.

Is there a better way? If not, hopefully this is a rare enough scenario that we don't need to care too much about performance.

Here is a list of relevant docs, for reference:

py3 buffer docs,
py3 memoryview docs, and code
py2 buffer+memoryview docs
Python issue discussing how Python's current memoryview C api is unsatisfactory.

charris · 2016-10-16T23:31:13Z

Hmm, looks like you have researched the problem pretty deeply. Maybe @pv has some ideas.

charris · 2016-10-16T23:35:30Z

Wow, that issue died as a python issue long ago. Maybe give them a ping.

charris · 2016-10-16T23:50:02Z

numpy/core/src/multiarray/arraytypes.c.src

+        npy_intp dims[1];
+        dims[0] = 1;
+        Py_INCREF(descr);
+        u = PyArray_NewFromDescr(&PyArray_Type, descr, 1, dims, NULL, ip,


Could make this PEP8 compliant. Starting a new line after the opening ( might be the easiest way.

EDIT: Or just declare a flags variable, would make it more understandable...

charris · 2016-10-16T23:58:46Z

numpy/core/src/multiarray/arraytypes.c.src

+        PyObject *ret, *u;
+
+        /* first create a size-1 view of this element */
+        npy_intp dims[1];


Needs a blank line after the declaration. I'd move it up and move the comment down after the declarations.

charris · 2016-10-17T00:02:00Z

numpy/core/tests/test_regression.py

+        va = np.zeros(300000, 'V4')
+        x = va[:1].item()
+        del va
+        x.tobytes()  # segfault?


Always segfaults, or just now and then? Might move the trailing comment onto its own line above.

charris · 2016-10-17T00:10:16Z

LGTM modulo some nits. I wonder a bit if returning memoryviews could cause backward compatibility problems, although it isn't clear that things were working before. I'm leaning to put this off to the next release so that things have a chance to settle. Maybe we should warn in the release notes?

ahaldane · 2016-10-19T17:38:08Z

Yeah, I want to think a little more anyway, and I also want to try running pandas/astropy/scipy unit tests with this change.

There might be a workaround to the performance problem through the fact that numpy usually considers unstructured void scalars to be "immutable", so often makes them copies instead of views. So maybe I can avoid the call to PyArray_NewFromDescr by making a cheap copy of the element of some sort. Eg, getting a void-scalar might be cheaper than getting a view.

homu · 2016-11-25T15:58:57Z

☔ The latest upstream changes (presumably #8235) made this pull request unmergeable. Please resolve the merge conflicts.

ahaldane · 2017-04-22T05:16:44Z

Rebased.

Also, realized a simple solution to the problems above: Just return a memoryview of ap. I think this PR is ready now.

Note this PR will have a significant effect on the string representation of V arrays, since V items will now always show up as <memoryview>:

Old str behavior:

>>> np.zeros(6, 'V4')
array([, , , , , ], 
      dtype='|V4')

New behavior:

>>> np.zeros(6, 'V4')
array([<memory at 0x7fd9bd3a6808>, <memory at 0x7fd9bd3a6808>,
       <memory at 0x7fd9bd3a6808>, <memory at 0x7fd9bd3a6808>,
       <memory at 0x7fd9bd3a6808>, <memory at 0x7fd9bd3a6808>],
      dtype='|V4')

I suggest that the new behavior is more correct, although wordier. This way we don't risk cluttering the terminal with invalid characters, and we don't end up with mystery spaces like in the old representation.

Edit: Although... the memory addresses printed are a bit misleading. I think they are the addresses of the temporary void scalars created in array2string. Void scalars are always copies of the array elements, so the memory locations are not those of the original array. Probably the memory locations are sometimes repeated because one temporary scalar is desctroyed before the next is created. I could hide the addresses by defining voidtype_repr, in this PR or a later one...

eric-wieser · 2017-04-22T09:05:37Z

Does the following work fine under this patch?

dt = np.dtype([('i', int), ('v', 'V0')])
np.zeros(6, dt)

ahaldane · 2017-04-22T14:29:58Z

Your example is just printed differently:

>>> dt = np.dtype([('i', int), ('v', 'V0')])
>>> np.zeros(6, dt)
array([(0, <memory at 0x7ffa3f09d240>), (0, <memory at 0x7ffa3f09d240>),
       (0, <memory at 0x7ffa3f09d240>), (0, <memory at 0x7ffa3f09d240>),
       (0, <memory at 0x7ffa3f09d240>), (0, <memory at 0x7ffa41accb70>)],
      dtype=[('i', '<i8'), ('v', 'V')])
>>> m = _[0].item()[1]
>>> m.format, m.itemsize
('0x', 0)

(also, this example uses a structured type, but this patch mainly affects unstructured void types)

eric-wieser · 2017-04-22T14:31:34Z

My concern was handling of V0, but it seems it was without reason.

Sure looks like a bug in python, the way that memoryview is being shown - shouldn't that show the address of the memory it points to, not the address of the pyobject holding it?

eric-wieser · 2017-04-22T14:35:42Z

Also, I don't agree with your old behaviour - on python 3, I get:

>>> np.zeros(6, 'V4')
array([[0 0 0 0], [0 0 0 0], [0 0 0 0], [0 0 0 0], [0 0 0 0], [0 0 0 0]], 
      dtype='|V4')

Are you using 2.7?

ahaldane · 2017-04-22T14:47:25Z

It is showing the address being pointed to.

The problem is that during the process of printing the array, numpy makes a temporary copy of the memory and shows that instead.

More in detail, numpy goes through each element, and makes a void scalar from it, and unstructured void scalars (unlike structured voids) always make a copy (not a view) of the array element they correspond to. So the printed memory refer's to an unstructured void scalar's buffer.

We might fix that (in another PR?) by making unstructured void scalars point to the original array's memory (be views). That would also fix the problem that writing to an unstructured void scalar's memory currently doesn't affect the original array, unlike structured void scalars.

The place where this all happens is in PyArray_Scalar.

ahaldane · 2017-04-22T14:48:18Z

I'm using Python 3.6.0 in the printout above.... curious yours is different.

eric-wieser · 2017-04-22T14:50:27Z

It is showing the address being pointed to.

Can you prove that to me with:

arr = np.zeros(6, 'V4')
x = arr[0]
print(repr(x))
print(object.__repr__(x))

I see your point though, that in our case, neither address is at all useful.

Since void is returned by copy, I think it's very important that the repr reflects this - right now, its very indicative of a view

ahaldane · 2017-04-22T14:51:41Z

>>> arr = np.zeros(6, 'V4')
...: x = arr[0]
...: print(repr(x))
...: print(object.__repr__(x))
...: 
<memory at 0x7f22d876b240>
<numpy.void object at 0x7f22d8790570>

ahaldane · 2017-11-13T22:36:31Z

Ah, fair enough, done.

eric-wieser · 2017-11-13T22:37:54Z

doc/release/1.14.0-notes.rst

@@ -416,6 +416,10 @@ The printing style of ``np.void`` arrays is now independently customizable
 using the ``formatter`` argument to ``np.set_printoptions``, using the
 ``'void'`` key, instead of the catch-all ``numpystr`` key as before.

+unstructured void array's ``.item`` method now returns a bytes object
+---------------------------------------------------------------------
+``.item`` now returns a ``bytes`` object instead of a buffer or byte array.


It's probably worth drawing attention to the fact that this new object is readonly, unlike the previous buffer or byte array that was read/write

For that reason, this maybe should be under Compatibility

On the other hand, the previous buffer or byte array wasn't a view or the original array or scalar. So writing to it had no effect. I'll add a note though.

Indeed, but it means that users who expected to get a mutable scratch buffer no longer do so.

I suppose the alternative would be to return a bytearray object.

I considered that too. But I suspect we would actually be causing bugs/confusion if we return a bytearray, since it vaguely implies it is a view when it isn't. Returning an immutable bytes makes clear you only got a copy.

Updated the note, by the way.

eric-wieser

LGTM, but I'd like someone else's opinion on whether changing the mutability is too backwards-incompatible.

charris · 2017-11-14T17:30:26Z

I have no idea if the change in mutability will have much impact. Probably the only way to find out is to try. I suppose we could issue a FutureWarning first.

ahaldane · 2017-11-14T18:17:24Z

I doubt it will have much impact, since the old behavior was almost totally non-functional:

void.item() gave back different types objects in python 2 vs python 3 (memoryview vs array), easily caused segfaults, and, confusingly, appeared to be a view (since it was mutable) but was actually just a mutable copy, thus making it usually pointless to modify.

Edit: that last part isn't completely true... in python 2 the returned array was indeed a view if you do arr[:1].item()... but was a copy if you did arr[0].item().

eric-wieser · 2017-11-17T07:39:10Z

So, the choice comes down to picking between

Return bytes, which could break old code using np.void as a scratch pad, but accurately indicates that the object is not a view
Return bytearray, which is (fully?) compatible with old code, but gives the illusion that it might be a view (just like it always did)

Either of these is better than leaving things how they are.

eric-wieser · 2017-11-17T07:39:48Z

As an aside, this change will make int(np.array(b'10', np.void)) work, for better or worse

ahaldane · 2017-11-17T17:18:31Z

If there is any doubt, then it's probably safest to issue a DeprecationWarning for 1.14 and leave the behavior alone, and then do the actual change to bytes in 1.15. There are definitely times I wish I had been more cautious and warned first in the past!

I just tried adding the warning, and running runtests.py --raise-warnings develop, and I don't get any warnings, so adding a warning doesn't seem too noisy.

eric-wieser · 2017-11-17T17:27:07Z

I don't see a good way to give a silenceable warning here

charris · 2017-11-25T23:55:00Z

I pushed this off to 1.15, will go with the warning first.

charris · 2018-04-09T19:09:00Z

@ahaldane Needs rebase. I'm looking to branch 1.15 pretty soon, should we push this off to the next release to allow the FutureWarning more time or just go ahead. I had the impression that this change was not expected to cause problems.

ahaldane · 2018-04-13T04:13:05Z

whoops, what happened here?

eric-wieser · 2018-05-01T07:02:14Z

@charris: To be clear, you want this to target 1.16?

ahaldane · 2018-05-01T16:09:56Z

I think @charris was asking me to rebase for 1.15, which I did. So this can stay tagged for 1.15.

charris · 2018-05-11T22:44:49Z

OK, I'm looking to put this in. @ahaldane @eric-wieser Is that OK with you?

charris · 2018-05-12T17:52:28Z

Thanks Allan.

ahaldane · 2018-05-13T14:57:59Z

Thanks Chuck! Yes it was OK.

ahaldane force-pushed the void_item_memview branch 2 times, most recently from cf49728 to 84bbfc7 Compare October 15, 2016 00:29

charris added 00 - Bug component: numpy._core labels Oct 16, 2016

charris reviewed Oct 16, 2016

View reviewed changes

charris reviewed Oct 17, 2016

View reviewed changes

ahaldane force-pushed the void_item_memview branch from 84bbfc7 to 28b35d6 Compare April 22, 2017 04:50

ahaldane force-pushed the void_item_memview branch 6 times, most recently from ee9b94a to 360de17 Compare April 23, 2017 17:51

ahaldane mentioned this pull request Apr 25, 2017

ENH: fix str/repr for 0d-arrays and int* scalars #8983

Merged

charris mentioned this pull request May 5, 2017

Structured arrays containing NaNs are not considered equal by numpy.testing.assert_array_equal() #8192

Closed

ahaldane mentioned this pull request Sep 10, 2017

ENH: remove unneeded spaces in float/bool reprs, fixes 0d str #9139

Merged

ahaldane force-pushed the void_item_memview branch from 5d05e5b to 5ff80f5 Compare November 13, 2017 22:08

eric-wieser reviewed Nov 13, 2017

View reviewed changes

ahaldane force-pushed the void_item_memview branch from 5ff80f5 to 982ed77 Compare November 13, 2017 22:46

eric-wieser approved these changes Nov 13, 2017

View reviewed changes

ahaldane mentioned this pull request Nov 17, 2017

DEP: FutureWarning for void.item(): Will return bytes #10044

Merged

eric-wieser mentioned this pull request Nov 18, 2017

BUG: Allow int to be called on nested object arrays, fix np.str_.__int__ #10042

Merged

charris modified the milestones: 1.14.0 release, 1.15.0 release Nov 25, 2017

ahaldane mentioned this pull request Nov 26, 2017

Constructing object array from void array leads to use-after-free #8129

Closed

ahaldane closed this Apr 13, 2018

ahaldane force-pushed the void_item_memview branch from 982ed77 to 3ec8875 Compare April 13, 2018 04:11

MAINT: make unstructured void .item() return byte

a83af93

ahaldane reopened this Apr 13, 2018

charris merged commit 88e5cb7 into numpy:master May 12, 2018

charris mentioned this pull request Aug 3, 2018

Strange problem when creating a pandas.Series from void ndarray since 1.15.0 #11668

Closed

BUG: void .item() doesn't hold reference to original array #8157

BUG: void .item() doesn't hold reference to original array #8157

Conversation

ahaldane commented Oct 14, 2016 • edited by eric-wieser Loading

charris commented Oct 16, 2016

charris commented Oct 16, 2016

charris Oct 16, 2016 • edited Loading

Choose a reason for hiding this comment

charris Oct 16, 2016

Choose a reason for hiding this comment

charris Oct 17, 2016

Choose a reason for hiding this comment

charris commented Oct 17, 2016 • edited Loading

ahaldane commented Oct 19, 2016

homu commented Nov 25, 2016

ahaldane commented Apr 22, 2017 • edited Loading

eric-wieser commented Apr 22, 2017

ahaldane commented Apr 22, 2017 • edited Loading

eric-wieser commented Apr 22, 2017

eric-wieser commented Apr 22, 2017

ahaldane commented Apr 22, 2017

ahaldane commented Apr 22, 2017

eric-wieser commented Apr 22, 2017 • edited Loading

ahaldane commented Apr 22, 2017

ahaldane commented Nov 13, 2017

eric-wieser Nov 13, 2017

Choose a reason for hiding this comment

eric-wieser Nov 13, 2017

Choose a reason for hiding this comment

ahaldane Nov 13, 2017

Choose a reason for hiding this comment

eric-wieser Nov 13, 2017

Choose a reason for hiding this comment

ahaldane Nov 13, 2017

Choose a reason for hiding this comment

eric-wieser left a comment

Choose a reason for hiding this comment

charris commented Nov 14, 2017

ahaldane commented Nov 14, 2017 • edited Loading

eric-wieser commented Nov 17, 2017

eric-wieser commented Nov 17, 2017 • edited Loading

ahaldane commented Nov 17, 2017

eric-wieser commented Nov 17, 2017

charris commented Nov 25, 2017

charris commented Apr 9, 2018

ahaldane commented Apr 13, 2018

eric-wieser commented May 1, 2018

ahaldane commented May 1, 2018

charris commented May 11, 2018

charris commented May 12, 2018

ahaldane commented May 13, 2018

ahaldane commented Oct 14, 2016 •

edited by eric-wieser

Loading

charris Oct 16, 2016 •

edited

Loading

charris commented Oct 17, 2016 •

edited

Loading

ahaldane commented Apr 22, 2017 •

edited

Loading

ahaldane commented Apr 22, 2017 •

edited

Loading

eric-wieser commented Apr 22, 2017 •

edited

Loading

ahaldane commented Nov 14, 2017 •

edited

Loading

eric-wieser commented Nov 17, 2017 •

edited

Loading