WIP: make views of different type checks less strict #5508

jaimefrio · 2015-01-27T11:21:13Z

This PR allows to take views of a different type even if they are not in a single memory block, as long as they have a dimension along which the stride is the same as the element size and this is unique. The following examples would raise an error in current master, but work now:

In [4]: a = np.arange(8).astype(float)

In [5]: a[::-1].view(complex)
Out[5]: array([ 6.+7.j,  4.+5.j,  2.+3.j,  0.+1.j])

In [6]: a = np.arange(0, 7, 2) + np.arange(1, 8, 2)*1.j

In [7]: a[::-1].view(float)
Out[7]: array([ 7.,  6.,  5.,  4.,  3.,  2.,  1.,  0.])

In [8]: np.ones((2, 4), complex)[:, :2].view(float)
Out[8]:
array([[ 1.,  0.,  1.,  0.],
       [ 1.,  0.,  1.,  0.]])

I would like to have some more eyes on the logic of what is and is not possible: while this is much more flexible than before, it may be possible to push the boundary a little further out, e.g. the requirement that the minium stride be equal to the element size may not be necessary.

jaimefrio · 2015-01-27T20:08:09Z

It should now pass all the tests, lets see what Travis thinks, and I think I have the logic pretty much figured out. Still missing tests and probably a rewrite of the docs.

jaimefrio · 2015-01-27T20:19:44Z

There don't seem to be any specific tests for view, any recommendation on where to stick them?

mbyt · 2015-01-27T22:49:03Z

numpy/core/src/multiarray/getset.c

+        int j, axis;
+        int nd = PyArray_NDIM(self);
+
+        if (nd == 0 || PyDataType_HASSUBARRAY(newtype)) {


As far as I understand this forbids structured arrays, e.g. fails the following code:

dt_head = np.dtype([('dx', '>u4'), ('dy', '>u4'), ('dz', '>u4')]) dt = np.dtype([('head', dt_head), ('data', '>u4', 10)]) y = np.zeros(1, dt) y[0]['head'].view('>u4')[:] = 5

Any reason not to support it? Apart from that, that it is not working in master or previous releases either ;o).

That's definitely out of the scope of this PR, but yes, it is a good question for which I don't have a definite answer. It would seem that, as long as there are no Python objects, all you should care about is the element size. Actually, with this PR you can do the following, which removes the [0] (but would work with a 1 item slice like [:1]) and uses an intermediate conversion to a void dtype:

>>> y['head'].view('|V12').view('>u4')[:] = 5 >>> y array([((5L, 5L, 5L), [0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L])], dtype=[('head', [('dx', '>u4'), ('dy', '>u4'), ('dz', '>u4')]), ('data', '>u4', (10,))])

So unless I am missing something horribly bad that could be happening, that check should also be revisited.

That's definitely out of the scope of this PR

Any way, thanks for consideration! And of course, I can fully understand that it is out of scope. Was worth a try :o).

My current work around which works in master and previous versions is

>>> y[0]['head'].view('>3u4')[:] = 5 >>> array([((5L, 5L, 5L), [0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L])], dtype=[('head', [('dx', '>u4'), ('dy', '>u4'), ('dz', '>u4')]), ('data', '>u4', (10,))])

Still it would be nice to do that directly without the need to calculate the size (here the hardcode 3) in advance via size = dt_head.itemsize / np.dtype(">u4").itemsize...

@ahaldane is also looking into this, in case you want to chip in the discussion we started on #3256 and are continuing on his work here. It seems to me more and more likely that in the end this PR will be closed, and we will have a newer one properly dealing with views of different types.

thanks for letting me know, I'll have a look into your links!

charris · 2015-07-21T01:23:30Z

@jaimefrio Needs a rebase.

ahaldane · 2015-07-21T02:20:50Z

Read thought it, and it makes sense, though there are still a couple lines I need to go over. I have some more conceptual questions/comments:

Does it often happen in real life that the minimal stride axis is not the first or last axis? What's an example of that?

Also, I am curious if anyone knows examples in the wild where it is useful to view using a different itemsize (eg np.zeros(3, 'p,p').view('p,p,p')). Even without this PR, understanding how that works is bit non-obvious, since the affected axis depends on the strides. For example,

>>> a = zeros((3,3), dtype='p,p')
>>> b = asfortranarray(a)
>>> a.view('p,p,p').shape
(3, 2)
>>> b.view('p,p,p').shape
(2, 3)

By the way, I looked into the history of the code which does this (in array_descr_set) ... it hasn't changed much from 2005, when it was cut+paste in from an SVN repository (so eariler history is lost).

Relax the previous condition (that the whole array be contiguous) to allow a view with a dtype of a different size to be produced. See the code comment for details, basically now only the dimension with the smallest absolute value stride is required to be contiguous.

jaimefrio · 2015-07-21T03:58:37Z

I have rebased, and I think this should now pass all tests. Before anyone gets tempted to merge it:

There are no tests!
This mailing list post is the origin of me working on this.
Neither @njsmith nor @seberg were too happy about my fine disregard for the rules of taking views, there was quite some discussion on the mailing list. The only conclusion I get from re-reading that is that we did not find any behavior that made us all happy.

seberg · 2015-07-21T08:11:15Z

Not sure. Ian seems to have the proposal of adding dimensions for the view, which somewhat makes sense but I am not quite sure that the logic actually is very consistent with fractional size diffs and larger vs. smaller types.
Is it worth a shot to either deprecate Fortran order views, or introduce an order argument and switch it to C as default with a warning for now (currently I am not sure axis is sufficient, unless you allow multiple axes)?
Either of that would allow you to do anything you like pretty much in terms of generality, including even `order='K', without making things so general that nobody would understand the result when they suddenly have a transposed array.

ahaldane · 2015-07-23T00:47:34Z

2 comments:

First, in that list discussion it was suggested we could add an extra 'order' or 'axis' keyword arg to ndarray.view to resolve ambiguities in how to reshape.

What about having a shape argument instead? I like that because changing the shape of an ndarray is sort of like giving the same bytes a different interpretation, just like viewing with a dtype. This way the user could resolve ambiguity about which shape they want for the dtype change, and the expected shape is explicit in the code (easier to read).

>>> arr.view(shape=(2,3)) 
>>> a = zeros((20,1), dtype='i8')
>>> a.view(dtype='i4', shape=(40,1))
>>> a.view(dtype='i4', shape=(20,2))

Unlike reshape this would raise an error if the new shape is incompatible. This would actually be generally useful for people who want to reshape an array without possibility of getting a copy, eg the first two lines here could be replaced with the last

>>> a = myArrayCreator()
>>> a.shape = (2,3)

>>> a = myArrayCreator().view(shape=(2,3))

(and implementation in PyArray_View would be just like for dtype, it assigns to the attribute)

ahaldane · 2015-07-23T00:50:12Z

Second, I like this PR as is. (...maybe with corrections for size-1 axes).

It might be a little unintuitive but that's quickly fixed by a brief section in the docs about "views with different itemsizes", along with a big warning that you have to think carefully about memory layout if you want to change the itemsize. I think we should have a new section in the user guide on views (right before byteswapping), the last section of which could be something like this:

Views of different datatype itemsize 
------------------------------------

Taking a view which changes the itemsize (eg `np.zeros(2,'i4').view('i8')` ) is
possible but complicated since it depends on the memory layout of the array and
will change the shape of the array. In order to use such views sensibly you
must understand the array's memory layout (see
`arrays.ndarray.html#internal-memory-layout-of-an-ndarray`) and whether the
array is C-contiguous, Fortran-contiguous, or neither. 

When a view changes the itemsize of an array, numpy looks for the axis over
which the elements are contiguous in memory (this is equivalent to choosing the
axis whose stride is equal to the old itemsize) and allows the dtype change as
long as an integer number of items of the new itemsize would fit along that
axis, and resizes the shape at that axis accordingly. For C-contiguous arrays
this will be the last axis. Beware that, for example, transposed matrices
(`arr.T`) are not C-contiguous, but they can be made so with
`ascontiguousarray`.

Note that if there are dimensions of size 1 and the itemsize decreases it can
be ambiguous which axis should be resized. In this case it is recommended to
specify the desired shape as an argument to the view. By default numpy will
choose the leftmost such axis. [or something else?]

    >>> a = zeros((3,1), dtype='i8') 
    >>> a.view('i4').shape
    (3, 2)
    >>> b = a.view('i4', shape=(6, 1)) 
    >>> c = a.view('i4', shape=(3, 2))

Note that if the itemsize increases or if it changes by a fractional amount (eg
'i,i,i' to 'i,i') there can be no dimensions of size 1, and the choice of 
contiguous axis will be unambiguous.

homu · 2015-10-18T20:19:05Z

☔ The latest upstream changes (presumably #6208) made this pull request unmergeable. Please resolve the merge conflicts.

charris · 2015-10-18T20:29:41Z

@jaimefrio Needs rebase.

mattip · 2019-02-13T18:59:34Z

Should we bring views of non-contiguous arrays up again on the mailing list?

seberg · 2021-09-08T17:09:12Z

This seems somewhat interesting, but after looking at it and considering that it was still a work in progress, it seems to me that the best thing may be to close it until it is brought up again (with more use-cases maybe). Especially, since it does not seem we were able to quite hash out the desired behaviour and some of it is pretty surprising!

EDIT: If anyone disagrees, please just comment.

jaimefrio force-pushed the view_as_type branch from f7457d8 to f71c964 Compare January 27, 2015 20:05

mbyt reviewed Jan 27, 2015
View reviewed changes

jaimefrio mentioned this pull request Jan 28, 2015

recarray.__getitem__ with field list gives "TypeError: Cannot change data-type for object array." when dtype contains object #3256

Closed

ahaldane mentioned this pull request Jan 30, 2015

BUG: Incorrect assignment to recarray items #3561

Closed

ahaldane mentioned this pull request Feb 9, 2015

ENH: structured datatype safety checks #5548

Merged

rgommers added 01 - Enhancement component: numpy._core labels Mar 8, 2015

jaimefrio mentioned this pull request Mar 24, 2015

Possible bug with multiple views and basic slicing #5716

Closed

ahaldane mentioned this pull request Jul 20, 2015

ENH: Allow views using smaller coprime itemsizes #6097

Closed

jaimefrio force-pushed the view_as_type branch from f71c964 to 17e57ca Compare July 21, 2015 03:29

jaimefrio force-pushed the view_as_type branch from 17e57ca to ea43503 Compare July 21, 2015 03:47

ahaldane mentioned this pull request Nov 13, 2015

getting unexpected error from: X.dtype = complex (or Y = X.view(complex)) #6678

Closed

seberg mentioned this pull request Jan 21, 2017

ENH: Make it possible to call .view on object arrays #8514

Closed

jaimefrio mentioned this pull request Jul 30, 2017

BUG/MAINT/ENH: Allow viewing where only the last axis is contiguous #9497

Closed

eric-wieser mentioned this pull request Jul 31, 2017

Can't change dtype for non-continuous array #9496

Open

Base automatically changed from master to main March 4, 2021 02:03

github-actions bot added the 25 - WIP label Mar 4, 2021

seberg added the 54 - Needs decision label Sep 8, 2021

seberg closed this Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: make views of different type checks less strict #5508

WIP: make views of different type checks less strict #5508

jaimefrio commented Jan 27, 2015

jaimefrio commented Jan 27, 2015

jaimefrio commented Jan 27, 2015

mbyt Jan 27, 2015

jaimefrio Jan 27, 2015

mbyt Jan 28, 2015

jaimefrio Jan 28, 2015

mbyt Jan 29, 2015

charris commented Jul 21, 2015

ahaldane commented Jul 21, 2015

jaimefrio commented Jul 21, 2015

seberg commented Jul 21, 2015

ahaldane commented Jul 23, 2015

ahaldane commented Jul 23, 2015

homu commented Oct 18, 2015

charris commented Oct 18, 2015

mattip commented Feb 13, 2019

seberg commented Sep 8, 2021 •

edited

Loading

WIP: make views of different type checks less strict #5508

WIP: make views of different type checks less strict #5508

Conversation

jaimefrio commented Jan 27, 2015

jaimefrio commented Jan 27, 2015

jaimefrio commented Jan 27, 2015

mbyt Jan 27, 2015

Choose a reason for hiding this comment

jaimefrio Jan 27, 2015

Choose a reason for hiding this comment

mbyt Jan 28, 2015

Choose a reason for hiding this comment

jaimefrio Jan 28, 2015

Choose a reason for hiding this comment

mbyt Jan 29, 2015

Choose a reason for hiding this comment

charris commented Jul 21, 2015

ahaldane commented Jul 21, 2015

jaimefrio commented Jul 21, 2015

seberg commented Jul 21, 2015

ahaldane commented Jul 23, 2015

ahaldane commented Jul 23, 2015

homu commented Oct 18, 2015

charris commented Oct 18, 2015

mattip commented Feb 13, 2019

seberg commented Sep 8, 2021 • edited Loading

seberg commented Sep 8, 2021 •

edited

Loading