ENH: add back the multifield copy->view change #12447

ahaldane · 2018-11-24T23:30:05Z

This is the (hopefully) last of a long-running series of PRs attepting to make multifield-indexes return views instead of copies, which has been planned since about Numpy 1.6 or 1.7. I have some writeup of the motivation here: https://gist.github.com/ahaldane/6cd44886efb449f9c8d5ea012747323b#multifield-views and in the linked mailing list discussions there.

This is an "un-revert" of #10411, which was a revert of part of #6053. The reason we can do the unreversion now is that: 1. We have fixed #8100, which was the main source of bugs that caused us to revert this change originally, and 2. We have merged #11526 which provides helper functions for cases which this change will break.

In the mailing list discussions we did get some pushback on this backcompat break. So please consider this PR an opportunity to debate whether we should make the change at all. Myself, I'd say we go ahead with it, since it seemed enough devs supported the change on the mailing list, because I think the new helper functions are better and safer than the old "trick" using views in any case, plus the other reasons in the introductory maling list post: https://mail.python.org/pipermail/numpy-discussion/2018-January/077624.html

eric-wieser · 2018-11-24T23:32:31Z

doc/release/1.16.0-notes.rst

+To help users update their code to account for these changes, a number of
+functions have been added to the ``numpy.lib.recfunctions`` module which
+safely allow such operations. For instance, the above view can be achieved
+with ``structured_to_unstructured(arr[['f1', 'f3']], dtype='float64')``.


It would be neat if it did produce a view, but right now this produces a copy not a view, right?

This particular example produces a copy, because even before this PR a copy was unavoidable.

However, it was actually your suggestion to rework the code so structured_to_unstructured returns a view in many cases, for instance structured_to_unstructured(arr[['f1', 'f2', 'f3']], dtype='float64') actually makes no copies.

This sentence should be reworked to avoid the word view though...

for instance structured_to_unstructured(arr[['f1', 'f2', 'f3']], dtype='float64') actually makes no copies.

Is that a statement about what actually happens in master, or just what I was suggesting?

I was proposing something stronger, where the following would also produce a view:

structured_to_unstructured(arr[['f1', 'f3']], dtype='float64')

structured_to_unstructured(arr[['f3', 'f1']], dtype='float64')

structured_to_unstructured(arr[['f3', 'f2', 'f1']], dtype='float64')

For clarity, let's say we're dealing with the array arr = np.ones(3, dtype='f8,f8,f8'). Then arr[['f0', 'f2']] returns a 24-byte structured array where each element is organized as FxF where F is 8 bytes of float memory, and x is 8 bytes of padding. The 3-element array as a whole is FxFFxFFxF (72 bytes).

Is it possible to view this as an unstructured float64 array with the right stride? I suppose this particular 3-field array might be viewed with shape (3, 2) and stride (24, 16). np.ndarray((3,2), strides=(24, 16), dtype='f8', buffer=arr) works.

But it seems it would be quite an involved computation to determine more generally whether appropriate strides exist, given arbitrary field offsets. Fo three fields it is always possible, but with 4 fields it is not, eg np.ones(3, 'f8,f8,f8,f8')[['f0', 'f1, 'f3']] cannot be viewed as an unstructured array.

My feeling is it's too difficult to do the calculation generally, it needs some complex gcd computation.

Fo three fields it is always possible

Not if they're not contiguous. You're right that it's only possible in some special cases - but reshape can only avoid copies in some special cases, and we decided that was worth doing - so it might also be worthwhile here

But it seems it would be quite an involved computation to determine more generally whether appropriate strides exist, given arbitrary field offsets

The computation is straightforward - pseudo-code:

if len(offsets) == 0: stride = 0 offset = 0 elif len(offsets) == 1: stride = 0 offset = offsets[0] else: offset = offsets[0] stride = offsets[1] - offsets[0] for i, o in enumerate(offsets[2:],2): if o != offset + stride * i: raise NotViewable return offset, stride

Which should give enough information to be passed to as_strided somehow

That should work, indeed. I would have to mostly rewrite structured_to_unstructured to account for it, and it seems like not enough of a good enough cost/benefit ratio for the effort, to me. I'd rather leave that to a separate PR if it's desired.

As long as the contract of structured_to_unstructured allows us to make that change in a later release, I'm fine with not doing it now

numpy/doc/structured_arrays.py

Fixes numpy#10409 Closes numpy#11530

ahaldane · 2018-11-26T21:27:04Z

I'd like us to consider this one for 1.16.0, possibly, so I added the tag.

eric-wieser · 2018-11-26T23:42:49Z

That seems fair, given that the warning messages say it would change in 1.16

charris · 2018-11-29T15:59:37Z

Ping. What is the current status of this?

ahaldane · 2018-11-30T15:53:54Z

It is ready-to-go by my reading. There is just a social question about whether we should actually make this change, since we did get some pushback on the mailing list.

But in my opinion, we should go ahead with it, since we have now mitigated the back-compat break, and the change has been "promised" since numpy 1.7 and the current Warnings say it will happen in 1.16.

We already implemented this change in 1.14.0 (reverted in 1.14.1), and almost all the bug reports we got last time should not re-occur now.

mhvk · 2018-11-30T18:22:07Z

Just a 👍 to merging this.

charris · 2018-12-01T18:29:40Z

Thanks Allan.

Numpy attempted to release a feature in 1.14.0, numpy/numpy#6053 which was then reverted in 1.14.1 with numpy/numpy#10411 And then was finally released in 1.16 by numpy/numpy#12447 This also makes a minor enhancement to determines which numpy function should be used at import time instead of run time. The function name was also then updated from _make_sliced_dtype_np_ge_14 to _make_sliced_dtype_np_ge_16 as thats the primary version the feature was released in

eric-wieser reviewed Nov 24, 2018

View reviewed changes

numpy/doc/structured_arrays.py Outdated Show resolved Hide resolved

ahaldane force-pushed the unrevert_multifield_view branch 2 times, most recently from ecd6c48 to 79e0e6a Compare November 26, 2018 19:48

ahaldane mentioned this pull request Nov 26, 2018

ENH: add multi-field assignment helpers in np.lib.recfunctions #11526

Merged

ENH: add back the multifield copy->view change

845def0

Fixes numpy#10409 Closes numpy#11530

ahaldane force-pushed the unrevert_multifield_view branch from 79e0e6a to 845def0 Compare November 26, 2018 19:55

ahaldane added this to the 1.16.0 release milestone Nov 26, 2018

ahaldane mentioned this pull request Dec 1, 2018

MAINT: cancel multifield copy->view and copy-by-position plans (alternate proposal) #11530

Closed

charris added the 01 - Enhancement label Dec 1, 2018

charris merged commit b637e65 into numpy:master Dec 1, 2018

detrout mentioned this pull request Jan 5, 2019

Numpy 1.16.0 test failure, dtype construction seems wrong dask/dask#4352

Closed

ahaldane mentioned this pull request Mar 24, 2019

ENH: Deprecate writeable broadcast_array #12609

Merged

seberg mentioned this pull request Apr 17, 2022

BUG: numpy.lib.recfunctions.structured_to_unstructured is not thread-safe #21320

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: add back the multifield copy->view change #12447

ENH: add back the multifield copy->view change #12447

ahaldane commented Nov 24, 2018

eric-wieser Nov 24, 2018

ahaldane Nov 24, 2018

ahaldane Nov 24, 2018

eric-wieser Nov 25, 2018

eric-wieser Nov 25, 2018 •

edited

Loading

ahaldane Nov 25, 2018 •

edited

Loading

eric-wieser Nov 25, 2018 •

edited

Loading

ahaldane Nov 26, 2018

eric-wieser Nov 26, 2018

ahaldane commented Nov 26, 2018

eric-wieser commented Nov 26, 2018

charris commented Nov 29, 2018

ahaldane commented Nov 30, 2018

mhvk commented Nov 30, 2018

charris commented Dec 1, 2018

ENH: add back the multifield copy->view change #12447

ENH: add back the multifield copy->view change #12447

Conversation

ahaldane commented Nov 24, 2018

eric-wieser Nov 24, 2018

Choose a reason for hiding this comment

ahaldane Nov 24, 2018

Choose a reason for hiding this comment

ahaldane Nov 24, 2018

Choose a reason for hiding this comment

eric-wieser Nov 25, 2018

Choose a reason for hiding this comment

eric-wieser Nov 25, 2018 • edited Loading

Choose a reason for hiding this comment

ahaldane Nov 25, 2018 • edited Loading

Choose a reason for hiding this comment

eric-wieser Nov 25, 2018 • edited Loading

Choose a reason for hiding this comment

ahaldane Nov 26, 2018

Choose a reason for hiding this comment

eric-wieser Nov 26, 2018

Choose a reason for hiding this comment

ahaldane commented Nov 26, 2018

eric-wieser commented Nov 26, 2018

charris commented Nov 29, 2018

ahaldane commented Nov 30, 2018

mhvk commented Nov 30, 2018

charris commented Dec 1, 2018

eric-wieser Nov 25, 2018 •

edited

Loading

ahaldane Nov 25, 2018 •

edited

Loading

eric-wieser Nov 25, 2018 •

edited

Loading