Skip to content

WIP: make views of different type checks less strict #5508

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

jaimefrio
Copy link
Member

This PR allows to take views of a different type even if they are not in a single memory block, as long as they have a dimension along which the stride is the same as the element size and this is unique. The following examples would raise an error in current master, but work now:

In [4]: a = np.arange(8).astype(float)

In [5]: a[::-1].view(complex)
Out[5]: array([ 6.+7.j,  4.+5.j,  2.+3.j,  0.+1.j])

In [6]: a = np.arange(0, 7, 2) + np.arange(1, 8, 2)*1.j

In [7]: a[::-1].view(float)
Out[7]: array([ 7.,  6.,  5.,  4.,  3.,  2.,  1.,  0.])

In [8]: np.ones((2, 4), complex)[:, :2].view(float)
Out[8]:
array([[ 1.,  0.,  1.,  0.],
       [ 1.,  0.,  1.,  0.]])

I would like to have some more eyes on the logic of what is and is not possible: while this is much more flexible than before, it may be possible to push the boundary a little further out, e.g. the requirement that the minium stride be equal to the element size may not be necessary.

@jaimefrio
Copy link
Member Author

It should now pass all the tests, lets see what Travis thinks, and I think I have the logic pretty much figured out. Still missing tests and probably a rewrite of the docs.

@jaimefrio
Copy link
Member Author

There don't seem to be any specific tests for view, any recommendation on where to stick them?

int j, axis;
int nd = PyArray_NDIM(self);

if (nd == 0 || PyDataType_HASSUBARRAY(newtype)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I understand this forbids structured arrays, e.g. fails the following code:

dt_head = np.dtype([('dx', '>u4'), ('dy', '>u4'), ('dz', '>u4')])
dt = np.dtype([('head', dt_head), ('data', '>u4', 10)])

y = np.zeros(1, dt)
y[0]['head'].view('>u4')[:] = 5

Any reason not to support it? Apart from that, that it is not working in master or previous releases either ;o).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely out of the scope of this PR, but yes, it is a good question for which I don't have a definite answer. It would seem that, as long as there are no Python objects, all you should care about is the element size. Actually, with this PR you can do the following, which removes the [0] (but would work with a 1 item slice like [:1]) and uses an intermediate conversion to a void dtype:

>>> y['head'].view('|V12').view('>u4')[:] = 5
>>> y
array([((5L, 5L, 5L), [0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L])],
      dtype=[('head', [('dx', '>u4'), ('dy', '>u4'), ('dz', '>u4')]), ('data', '>u4', (10,))])

So unless I am missing something horribly bad that could be happening, that check should also be revisited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's definitely out of the scope of this PR

Any way, thanks for consideration! And of course, I can fully understand that it is out of scope. Was worth a try :o).

My current work around which works in master and previous versions is

>>> y[0]['head'].view('>3u4')[:] = 5
>>>
array([((5L, 5L, 5L), [0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L])],
      dtype=[('head', [('dx', '>u4'), ('dy', '>u4'), ('dz', '>u4')]), ('data', '>u4', (10,))])

Still it would be nice to do that directly without the need to calculate the size (here the hardcode 3) in advance via size = dt_head.itemsize / np.dtype(">u4").itemsize...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahaldane is also looking into this, in case you want to chip in the discussion we started on #3256 and are continuing on his work here. It seems to me more and more likely that in the end this PR will be closed, and we will have a newer one properly dealing with views of different types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for letting me know, I'll have a look into your links!

@charris
Copy link
Member

charris commented Jul 21, 2015

@jaimefrio Needs a rebase.

@ahaldane
Copy link
Member

Read thought it, and it makes sense, though there are still a couple lines I need to go over. I have some more conceptual questions/comments:

Does it often happen in real life that the minimal stride axis is not the first or last axis? What's an example of that?

Also, I am curious if anyone knows examples in the wild where it is useful to view using a different itemsize (eg np.zeros(3, 'p,p').view('p,p,p')). Even without this PR, understanding how that works is bit non-obvious, since the affected axis depends on the strides. For example,

>>> a = zeros((3,3), dtype='p,p')
>>> b = asfortranarray(a)
>>> a.view('p,p,p').shape
(3, 2)
>>> b.view('p,p,p').shape
(2, 3)

By the way, I looked into the history of the code which does this (in array_descr_set) ... it hasn't changed much from 2005, when it was cut+paste in from an SVN repository (so eariler history is lost).

Relax the previous condition (that the whole array be contiguous) to
allow a view with a dtype of a different size to be produced. See
the code comment for details, basically now only the dimension with
the smallest absolute value stride is required to be contiguous.
@jaimefrio
Copy link
Member Author

I have rebased, and I think this should now pass all tests. Before anyone gets tempted to merge it:

  1. There are no tests!
  2. This mailing list post is the origin of me working on this.
  3. Neither @njsmith nor @seberg were too happy about my fine disregard for the rules of taking views, there was quite some discussion on the mailing list. The only conclusion I get from re-reading that is that we did not find any behavior that made us all happy.

@seberg
Copy link
Member

seberg commented Jul 21, 2015

Not sure. Ian seems to have the proposal of adding dimensions for the view, which somewhat makes sense but I am not quite sure that the logic actually is very consistent with fractional size diffs and larger vs. smaller types.
Is it worth a shot to either deprecate Fortran order views, or introduce an order argument and switch it to C as default with a warning for now (currently I am not sure axis is sufficient, unless you allow multiple axes)?
Either of that would allow you to do anything you like pretty much in terms of generality, including even `order='K', without making things so general that nobody would understand the result when they suddenly have a transposed array.

@ahaldane
Copy link
Member

2 comments:

First, in that list discussion it was suggested we could add an extra 'order' or 'axis' keyword arg to ndarray.view to resolve ambiguities in how to reshape.

What about having a shape argument instead? I like that because changing the shape of an ndarray is sort of like giving the same bytes a different interpretation, just like viewing with a dtype. This way the user could resolve ambiguity about which shape they want for the dtype change, and the expected shape is explicit in the code (easier to read).

>>> arr.view(shape=(2,3)) 
>>> a = zeros((20,1), dtype='i8')
>>> a.view(dtype='i4', shape=(40,1))
>>> a.view(dtype='i4', shape=(20,2))

Unlike reshape this would raise an error if the new shape is incompatible. This would actually be generally useful for people who want to reshape an array without possibility of getting a copy, eg the first two lines here could be replaced with the last

>>> a = myArrayCreator()
>>> a.shape = (2,3)

>>> a = myArrayCreator().view(shape=(2,3))

(and implementation in PyArray_View would be just like for dtype, it assigns to the attribute)

@ahaldane
Copy link
Member

Second, I like this PR as is. (...maybe with corrections for size-1 axes).

It might be a little unintuitive but that's quickly fixed by a brief section in the docs about "views with different itemsizes", along with a big warning that you have to think carefully about memory layout if you want to change the itemsize. I think we should have a new section in the user guide on views (right before byteswapping), the last section of which could be something like this:

Views of different datatype itemsize 
------------------------------------

Taking a view which changes the itemsize (eg `np.zeros(2,'i4').view('i8')` ) is
possible but complicated since it depends on the memory layout of the array and
will change the shape of the array. In order to use such views sensibly you
must understand the array's memory layout (see
`arrays.ndarray.html#internal-memory-layout-of-an-ndarray`) and whether the
array is C-contiguous, Fortran-contiguous, or neither. 

When a view changes the itemsize of an array, numpy looks for the axis over
which the elements are contiguous in memory (this is equivalent to choosing the
axis whose stride is equal to the old itemsize) and allows the dtype change as
long as an integer number of items of the new itemsize would fit along that
axis, and resizes the shape at that axis accordingly. For C-contiguous arrays
this will be the last axis. Beware that, for example, transposed matrices
(`arr.T`) are not C-contiguous, but they can be made so with
`ascontiguousarray`.

Note that if there are dimensions of size 1 and the itemsize decreases it can
be ambiguous which axis should be resized. In this case it is recommended to
specify the desired shape as an argument to the view. By default numpy will
choose the leftmost such axis. [or something else?]

    >>> a = zeros((3,1), dtype='i8') 
    >>> a.view('i4').shape
    (3, 2)
    >>> b = a.view('i4', shape=(6, 1)) 
    >>> c = a.view('i4', shape=(3, 2))

Note that if the itemsize increases or if it changes by a fractional amount (eg
'i,i,i' to 'i,i') there can be no dimensions of size 1, and the choice of 
contiguous axis will be unambiguous.

@homu
Copy link
Contributor

homu commented Oct 18, 2015

☔ The latest upstream changes (presumably #6208) made this pull request unmergeable. Please resolve the merge conflicts.

@charris
Copy link
Member

charris commented Oct 18, 2015

@jaimefrio Needs rebase.

@mattip
Copy link
Member

mattip commented Feb 13, 2019

Should we bring views of non-contiguous arrays up again on the mailing list?

Base automatically changed from master to main March 4, 2021 02:03
@seberg
Copy link
Member

seberg commented Sep 8, 2021

This seems somewhat interesting, but after looking at it and considering that it was still a work in progress, it seems to me that the best thing may be to close it until it is brought up again (with more use-cases maybe). Especially, since it does not seem we were able to quite hash out the desired behaviour and some of it is pretty surprising!

EDIT: If anyone disagrees, please just comment.

@seberg seberg closed this Sep 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants