Skip to content

ENH: add subok flag to stride_tricks (and thus broadcast_arrays) #4622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 26, 2014

Conversation

mhvk
Copy link
Contributor

@mhvk mhvk commented Apr 16, 2014

As is, np.broadcast_arrays does not preserve subclasses, even though for the shape manipulation subclasses should not be a problem. This PR adds a subok option that allows one to change that, but is False by default, so that existing code will continue to behave as is.

@seberg
Copy link
Member

seberg commented Apr 16, 2014

I like this idea in general. I am wondering if it is too much noise, but in general it seems a good idea if we (or those that could make use of it like astropy ;)), add that keyword to other functions as well.

@@ -31,9 +31,15 @@ def as_strided(x, shape=None, strides=None):
# Make sure dtype is correct in case of custom dtype
if array.dtype.kind == 'V':
array.dtype = x.dtype
if subok:
array = array.view(x.__class__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know that the input actually is an array subclass. Also we could use __array_wrap__ which might add support for things like the pandas stuff which are not arrays at all. I think that may be better, but I did not think about it in depth. Other then that, I would prefer if you write type=x.__class__ just to be clear (as opposed to dtype).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, will change to type=. I thought about __array_wrap__, but since what broadcast_arrays does is very similar to what happens with, say, reshape, just handling this as if it is a new view seemed most appropriate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant with not being an array subclass is that this must also work when x is for example a list. If we check for __array_attribute__ that is fine though. If we want to add this to a lot of functions, we really should add more helpers maybe. There already is a np.get_wrap or similar helper I think.

@mhvk
Copy link
Contributor Author

mhvk commented Apr 16, 2014

@seberg, yes, it would be useful in other functions as well, the various stack operations being prime examples. But for those subclasses might in principle need to check that they are consistent, so I thought I would start with a case where I could not see any reason not to allow it. See also the discussion in #4164.

@seberg
Copy link
Member

seberg commented Apr 16, 2014

Not sure about that ;)

In [9]: class a(object):
    def __array__(self):
        return np.arange(10)
    def __array_wrap__(self, arr):
        print "array wrap"
        return self
    def __array_finalize__(self):
        print "finalize"
   ...:         

In [10]: np.reshape(a(), (5, 2))
array wrap
Out[10]: <__main__.a at 0x3f5be50>

@mhvk
Copy link
Contributor Author

mhvk commented Apr 16, 2014

Am much amused by your example! According to the documentation, __array_wrap__ is only called at the end of every ufunc, and a quick test shows that it is not called when the class is an ndarray subclass. But clearly there is more to it...
I guess I partially see what you mean with the class: for as_strided we can not necessarily be sure that x is actually an array, just that it has an __array_interface__. I rewrote to make it safer.

@seberg
Copy link
Member

seberg commented Apr 16, 2014

Not sure, but the default __array_wrap__ will itself call __array_finalize__ I believe, so that might add to the confusion. Array wrap is there for ufuncs, but it is called elsewhere (though I don't know exactly where, might be pretty much everywhere).

if array.dtype.kind == 'V':
array.dtype = x.dtype
if subok and isinstance(x, np.ndarray) and type(x) is not type(array):
array = array.view(type=type(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, but still got to figure this out exactly first (including me). view already calls array finalize, so no need to finalize after the view. However, wrapping might still be better since it allows array likes such as pandas things to work. Double checking, it seems to me that reshape finalizes if we got a subclass and wraps otherwise, but wrapping should be more general. Still thinking :)

@mhvk
Copy link
Contributor Author

mhvk commented Apr 16, 2014

@seberg - carrying on in the main thread so we don't loose the discussion once I push the next installment...
view indeed calls __array_finalize__, but it will be passed array as the object, which does not carry information that the subclass might need. Probably we should mimic what happens with np.reshape. Looking at fromnumeric.py, if the input is an ndarray subclass, it goes to the reshape method, which is defined in PyArray_NewShape (in multiarray/shape.c), where at the end a call is made to PyArray_NewFromDescr (in ctors.c), which goes to PyArray_NewFromDescr_int (ibid), which does call __array_finalize__. If np.reshape encounters something that is not an ndarray, it uses _wrapit, which does check for the existence of __array_wrap__. The idea here seems to be able to deal with other classes that are array-like.

My sense, though, is that we're making it slightly too hard: subok should probably just be what it is in np.array: indicate that it passes through ndarray subclasses, not that it tries to behave for everything that has an __array_wrap__. (And for the specific, most common, case of np.broadcast_arrays, we are already guaranteed to get an ndarray or, with this PR, a subclass.)

p.s. The travis error seems spurious (a build hanging).

@mhvk
Copy link
Contributor Author

mhvk commented Apr 18, 2014

Updated a comment; travis now passes.

@mhvk
Copy link
Contributor Author

mhvk commented Apr 22, 2014

@seberg - did you have further comments on this?

@seberg
Copy link
Member

seberg commented Apr 22, 2014

Well, still not sure if I don't prefer wrap. We have precedence of wrapping for example in insert and delete, though these are old and it is probably OK to rethink... If we see this more as array creation functions, I think not wrapping is better. Otherwise I think wrapping is better.

@mhvk
Copy link
Contributor Author

mhvk commented Apr 22, 2014

@seberg - looking at the documentation for __array_finalize__ and __array_wrap__ [1], it would seem the description of the former (this method is called whenever the system internally allocates a new array from obj, where obj is a subclass (subtype) of the ndarray) is far more appropriate than that of the latter (At the end of every ufunc, this method is called on the input...), so I think __array_finalize__ makes most sense from the least-surprise principle.

That said, I can see the case for x.__array_wrap__(obj) being slightly more obvious than obj.__array_finalize__(x), even if it does the right thing in offering obj the opportunity to copy over relevant attributes from x.

Of course, in principle, we could behave more closely similar to np.reshape and use __array_finalize__ for ndarray subclasses and __array_wrap__ for others. But perhaps that should wait for someone actually requesting that behaviour?

[1] http://docs.scipy.org/doc/numpy/reference/arrays.classes.html

@mhvk
Copy link
Contributor Author

mhvk commented May 4, 2014

@seberg - what shall we do with this one? It would be nice to reach some conclusion, so I can also provide this in astropy/astropy#2327

@seberg
Copy link
Member

seberg commented May 4, 2014

I still prefer using array wrap a little, but if we view it as an array creation function I am fine with either really. At some point it would be nice to move this into C in any case, but just wishing ;)

@mhvk
Copy link
Contributor Author

mhvk commented Jul 9, 2014

@seberg - can we stick to __array_finalize__, by analogy with what np.reshape does? (see comment above). It would be nice to get this in (and not mimick #31)

@charris
Copy link
Member

charris commented Jul 29, 2014

@seberg @mhvk It would be nice to get this finished off.

@seberg
Copy link
Member

seberg commented Jul 29, 2014

Hmmm, array finalize is probably good. But now I am wondering if it isn't cleaner to call x = np.array(x, copy=False, subok=subok) to begin with, and then just view the resulting array result.view(type=class(x))? Have to think about it a little.

@charris
Copy link
Member

charris commented Aug 23, 2014

@seberg @mhvk Knock, knock ;) This also needs a rebase.

@mhvk mhvk force-pushed the lib/stride_tricks/subclasses branch from 37e4a9d to 2d90fe6 Compare August 25, 2014 15:58
@mhvk
Copy link
Contributor Author

mhvk commented Aug 25, 2014

OK, I rebased. @seberg - I'm not quite sure where in as_strided you were thinking of calling x = np.array(x, copy=False, subok=subok).

@mhvk mhvk force-pushed the lib/stride_tricks/subclasses branch from 2d90fe6 to 26a02cd Compare August 25, 2014 17:08
@@ -20,7 +20,7 @@ def __init__(self, interface, base=None):
self.__array_interface__ = interface
self.base = base

def as_strided(x, shape=None, strides=None):
def as_strided(x, shape=None, strides=None, subok=False):
""" Make an ndarray from the given array with the given shape and strides.
"""
interface = dict(x.__array_interface__)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was this. __array_interface__ is basically just one way to define an "array-like". So this function currently does not support most array likes out there at all (for example memoryviews, etc.), since they implement the buffer interface instead. To really support this, I think we should (first thing), call:

x = np.asanyarray(x)

here. After you have done that, __array_finalize__ is guaranteed to be defined. We will guarantee to return an array and not some array-like, which can be seen as an advantage or disadvantage. But I think I am fine with returning always an array (or array subclass here). On the other hand, the pandas guys might disagree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, pandas implements array wrap but not array finalize. I kind of think it is fine to say that as_strided will always return an array, like np.asanyarray, but honestly this is a jungle, so if someone disagrees....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg - I guess adding x = np.asanyarray(x) deals with a somewhat separate issue, but obviously it is good to ensure the code is as general as possible, and this would allow one at least to remove the isinstance(x, np.ndarray) stanza from the if statement below.

I'll do that, but also a question: since with this change, we know that x is an ndarray subclass, might this also allow one to change the strides and shape more easily than through the DummyArray mechanism, so that the dtype and subclass issues get handled more elegantly as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From C, you could do that very easily, but from python there is no other way really.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah, you are right, it deals with a seperate issue... I somewhat thought it might make this one simpler, too, but that doesn't change much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, actually if you really hate DummyArray, you can also use the np.ndarray class constructor. You still will have to manually call finalize, etc. since that does not allow to set a parent.

@seberg
Copy link
Member

seberg commented Aug 27, 2014

Sorry for the delays. Just need to figure out the array-like but not subclass thing, but lets finish it up before the week is over :)

@mhvk
Copy link
Contributor Author

mhvk commented Aug 27, 2014

@seberg - OK, I made the changes, but in a slightly different way than you suggested, so please have a look.

@mhvk mhvk force-pushed the lib/stride_tricks/subclasses branch from 3a6c849 to 2b9064d Compare August 27, 2014 19:50
array = array.view(type=type(x))
# Since we have done something akin to a view from x, we should let
# the subclass finalize (if it has it implemented).
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since x is a subclass now, I think you don't need the try/except at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I did the wrong test: if the subclass does not define __array_finalize__, then it is None and hence not callable. I now replaced it with an if statement.

@mhvk mhvk force-pushed the lib/stride_tricks/subclasses branch from 2b9064d to e859031 Compare August 27, 2014 21:04
@mhvk
Copy link
Contributor Author

mhvk commented Aug 27, 2014

Thanks for the comment; addressed, and a test added to ensure subclasses without __array_finalize__ work as well.

@mhvk
Copy link
Contributor Author

mhvk commented Aug 27, 2014

On DummyArray, now that we are doing the np.array(x), I would think it would indeed be nicer to also use

if shape is None:
    shape = x.shape
if strides is None:
    strides = x.strides
array = np.ndarray(buffer=x, shape=shape, strides=strides, dtype=x.dtype)

It would allow one to remove the check on dtype.kind, but not, as you said, the __array_finalize__.

Do you think I should still do this here, or would this be better for a different PR, since it is getting a bit off-topic? (In princple, I can also back out the np.array(x).)

@seberg
Copy link
Member

seberg commented Aug 28, 2014

Actually, have to check what/if this does anynthing to object arrays (or void arrays with object fields).

@mhvk
Copy link
Contributor Author

mhvk commented Aug 28, 2014

@seberg - I had almost started adding tests for record types at least, but felt too lazy... But can do so, as it would be good to improve the test coverage.

For object arrays, would a test like the following be OK? Could you give me an example of "void arrays with object fields"?

a = np.array([None, True, False])
as_strided(a, strides=(0,))
# seems to work: array([None, None, None], dtype=object)

@seberg
Copy link
Member

seberg commented Aug 28, 2014

Actually, I am surprised it isn't an error :), it certainly is good, just need to see if it changed.

@mhvk
Copy link
Contributor Author

mhvk commented Sep 18, 2014

@seberg - I came back to this again, and while a test with an object dtype works, a complex one does not in current master:

import numpy as np
dt = np.dtype([('x', 'i8'), ('y', '?'), ('z', 'O')])
b = np.array([(1, True, None), (2, False, [3, 4, 5])], dtype=dt)
c = np.array([[-1], [-2]])
b_strided, c_strided = np.broadcast_arrays(b, c)
TypeError                                 Traceback (most recent call last)
...
/usr/lib/python3/dist-packages/numpy/lib/stride_tricks.py in as_strided(x, shape, strides)
     31     # Make sure dtype is correct in case of custom dtype
     32     if array.dtype.kind == 'V':
---> 33         array.dtype = x.dtype
     34     return array
     35 

TypeError: Cannot change data-type for object array.

This error disappears if instead of the DummyArray class, I use the

array = np.ndarray(buffer=x, shape=shape, strides=strides, dtype=x.dtype)

assignment suggested above. However, this does not work for non-contiguous data and thus fails one of the tests.

Since the void arrays with object fields fail in current master as well, however, I think this is not really relevant for the current PR, which just aims to get subclasses to work properly. Maybe we can agree that this is OK as is, and raise an issue for the above?

@seberg
Copy link
Member

seberg commented Sep 18, 2014

We don't need that kind of error here. Since we don't mess with the dtype and the original array is in charge of doing handling reference counting, it is always safe to do. As long as we have less errors then before things are good.

It fails a test which originally did not exist is what you mean and which never worked? In that case I guess we can call it a minor issue that can be fixed another time.

@mhvk
Copy link
Contributor Author

mhvk commented Sep 18, 2014

@seberg - indeed, both with and without this PR as_strided does not work for a void array with object fields, since one cannot directly set the dtype. I raised a new issue for this (#5081), even though this is rather a corner case. Does this leave anything for the present PR?

@seberg
Copy link
Member

seberg commented Sep 21, 2014

Yeah, should be good for merging. Won't do it right now, but if I forget about it in the next couple of days, just ping me. Thanks for putting up with all my small worries :).

@mhvk
Copy link
Contributor Author

mhvk commented Sep 26, 2014

@seberg - the promised ping to remind you to merge this!

@seberg
Copy link
Member

seberg commented Sep 26, 2014

Ack :), lets put it in then. At some point we should maybe just move this whole stuff to C...

seberg added a commit that referenced this pull request Sep 26, 2014
ENH: add subok flag to stride_tricks (and thus broadcast_arrays)
@seberg seberg merged commit 22ae193 into numpy:master Sep 26, 2014
@seberg
Copy link
Member

seberg commented Sep 26, 2014

Thanks for the patience :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants