Skip to content

ENH: add back the multifield copy->view change #12447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 1, 2018

Conversation

ahaldane
Copy link
Member

Fixes #10409 Closes #11530

This is the (hopefully) last of a long-running series of PRs attepting to make multifield-indexes return views instead of copies, which has been planned since about Numpy 1.6 or 1.7. I have some writeup of the motivation here: https://gist.github.com/ahaldane/6cd44886efb449f9c8d5ea012747323b#multifield-views and in the linked mailing list discussions there.

This is an "un-revert" of #10411, which was a revert of part of #6053. The reason we can do the unreversion now is that: 1. We have fixed #8100, which was the main source of bugs that caused us to revert this change originally, and 2. We have merged #11526 which provides helper functions for cases which this change will break.

In the mailing list discussions we did get some pushback on this backcompat break. So please consider this PR an opportunity to debate whether we should make the change at all. Myself, I'd say we go ahead with it, since it seemed enough devs supported the change on the mailing list, because I think the new helper functions are better and safer than the old "trick" using views in any case, plus the other reasons in the introductory maling list post: https://mail.python.org/pipermail/numpy-discussion/2018-January/077624.html

To help users update their code to account for these changes, a number of
functions have been added to the ``numpy.lib.recfunctions`` module which
safely allow such operations. For instance, the above view can be achieved
with ``structured_to_unstructured(arr[['f1', 'f3']], dtype='float64')``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be neat if it did produce a view, but right now this produces a copy not a view, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This particular example produces a copy, because even before this PR a copy was unavoidable.

However, it was actually your suggestion to rework the code so structured_to_unstructured returns a view in many cases, for instance structured_to_unstructured(arr[['f1', 'f2', 'f3']], dtype='float64') actually makes no copies.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence should be reworked to avoid the word view though...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for instance structured_to_unstructured(arr[['f1', 'f2', 'f3']], dtype='float64') actually makes no copies.

Is that a statement about what actually happens in master, or just what I was suggesting?

Copy link
Member

@eric-wieser eric-wieser Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was proposing something stronger, where the following would also produce a view:

  • structured_to_unstructured(arr[['f1', 'f3']], dtype='float64')
  • structured_to_unstructured(arr[['f3', 'f1']], dtype='float64')
  • structured_to_unstructured(arr[['f3', 'f2', 'f1']], dtype='float64')

Copy link
Member Author

@ahaldane ahaldane Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity, let's say we're dealing with the array arr = np.ones(3, dtype='f8,f8,f8'). Then arr[['f0', 'f2']] returns a 24-byte structured array where each element is organized as FxF where F is 8 bytes of float memory, and x is 8 bytes of padding. The 3-element array as a whole is FxFFxFFxF (72 bytes).

Is it possible to view this as an unstructured float64 array with the right stride? I suppose this particular 3-field array might be viewed with shape (3, 2) and stride (24, 16). np.ndarray((3,2), strides=(24, 16), dtype='f8', buffer=arr) works.

But it seems it would be quite an involved computation to determine more generally whether appropriate strides exist, given arbitrary field offsets. Fo three fields it is always possible, but with 4 fields it is not, eg np.ones(3, 'f8,f8,f8,f8')[['f0', 'f1, 'f3']] cannot be viewed as an unstructured array.

My feeling is it's too difficult to do the calculation generally, it needs some complex gcd computation.

Copy link
Member

@eric-wieser eric-wieser Nov 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fo three fields it is always possible

Not if they're not contiguous. You're right that it's only possible in some special cases - but reshape can only avoid copies in some special cases, and we decided that was worth doing - so it might also be worthwhile here

But it seems it would be quite an involved computation to determine more generally whether appropriate strides exist, given arbitrary field offsets

The computation is straightforward - pseudo-code:

if len(offsets) == 0:
    stride = 0
    offset = 0
elif len(offsets) == 1:
    stride = 0
    offset = offsets[0]
else:
    offset = offsets[0]
    stride = offsets[1] - offsets[0]
    for i, o in enumerate(offsets[2:],2):
        if o != offset + stride * i:
            raise NotViewable
return offset, stride

Which should give enough information to be passed to as_strided somehow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work, indeed. I would have to mostly rewrite structured_to_unstructured to account for it, and it seems like not enough of a good enough cost/benefit ratio for the effort, to me. I'd rather leave that to a separate PR if it's desired.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as the contract of structured_to_unstructured allows us to make that change in a later release, I'm fine with not doing it now

@ahaldane ahaldane force-pushed the unrevert_multifield_view branch 2 times, most recently from ecd6c48 to 79e0e6a Compare November 26, 2018 19:48
@ahaldane ahaldane force-pushed the unrevert_multifield_view branch from 79e0e6a to 845def0 Compare November 26, 2018 19:55
@ahaldane ahaldane added this to the 1.16.0 release milestone Nov 26, 2018
@ahaldane
Copy link
Member Author

I'd like us to consider this one for 1.16.0, possibly, so I added the tag.

@eric-wieser
Copy link
Member

That seems fair, given that the warning messages say it would change in 1.16

@charris
Copy link
Member

charris commented Nov 29, 2018

Ping. What is the current status of this?

@ahaldane
Copy link
Member Author

It is ready-to-go by my reading. There is just a social question about whether we should actually make this change, since we did get some pushback on the mailing list.

But in my opinion, we should go ahead with it, since we have now mitigated the back-compat break, and the change has been "promised" since numpy 1.7 and the current Warnings say it will happen in 1.16.

We already implemented this change in 1.14.0 (reverted in 1.14.1), and almost all the bug reports we got last time should not re-occur now.

@mhvk
Copy link
Contributor

mhvk commented Nov 30, 2018

Just a 👍 to merging this.

@charris
Copy link
Member

charris commented Dec 1, 2018

Thanks Allan.

detrout added a commit to detrout/dask that referenced this pull request Jan 10, 2019
Numpy attempted to release a feature in 1.14.0,
numpy/numpy#6053 which was then reverted in
1.14.1 with numpy/numpy#10411 And then was
finally released in 1.16 by numpy/numpy#12447

This also makes a minor enhancement to determines which numpy function
should be used at import time instead of run time.

The function name was also then updated from
_make_sliced_dtype_np_ge_14 to _make_sliced_dtype_np_ge_16 as thats
the primary version the feature was released in
TomAugspurger pushed a commit to dask/dask that referenced this pull request Jan 14, 2019
Numpy attempted to release a feature in 1.14.0,
numpy/numpy#6053 which was then reverted in
1.14.1 with numpy/numpy#10411 And then was
finally released in 1.16 by numpy/numpy#12447

This also makes a minor enhancement to determines which numpy function
should be used at import time instead of run time.

The function name was also then updated from
_make_sliced_dtype_np_ge_14 to _make_sliced_dtype_np_ge_16 as thats
the primary version the feature was released in
jorge-pessoa pushed a commit to jorge-pessoa/dask that referenced this pull request May 14, 2019
Numpy attempted to release a feature in 1.14.0,
numpy/numpy#6053 which was then reverted in
1.14.1 with numpy/numpy#10411 And then was
finally released in 1.16 by numpy/numpy#12447

This also makes a minor enhancement to determines which numpy function
should be used at import time instead of run time.

The function name was also then updated from
_make_sliced_dtype_np_ge_14 to _make_sliced_dtype_np_ge_16 as thats
the primary version the feature was released in
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants