-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: add subok flag to stride_tricks (and thus broadcast_arrays) #4622
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I like this idea in general. I am wondering if it is too much noise, but in general it seems a good idea if we (or those that could make use of it like astropy ;)), add that keyword to other functions as well. |
@@ -31,9 +31,15 @@ def as_strided(x, shape=None, strides=None): | |||
# Make sure dtype is correct in case of custom dtype | |||
if array.dtype.kind == 'V': | |||
array.dtype = x.dtype | |||
if subok: | |||
array = array.view(x.__class__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know that the input actually is an array subclass. Also we could use __array_wrap__
which might add support for things like the pandas stuff which are not arrays at all. I think that may be better, but I did not think about it in depth. Other then that, I would prefer if you write type=x.__class__
just to be clear (as opposed to dtype
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will change to type=
. I thought about __array_wrap__
, but since what broadcast_arrays
does is very similar to what happens with, say, reshape
, just handling this as if it is a new view seemed most appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant with not being an array subclass is that this must also work when x
is for example a list. If we check for __array_attribute__
that is fine though. If we want to add this to a lot of functions, we really should add more helpers maybe. There already is a np.get_wrap
or similar helper I think.
@seberg, yes, it would be useful in other functions as well, the various |
Not sure about that ;)
|
Am much amused by your example! According to the documentation, |
Not sure, but the default |
if array.dtype.kind == 'V': | ||
array.dtype = x.dtype | ||
if subok and isinstance(x, np.ndarray) and type(x) is not type(array): | ||
array = array.view(type=type(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, but still got to figure this out exactly first (including me). view
already calls array finalize, so no need to finalize after the view. However, wrapping might still be better since it allows array likes such as pandas things to work. Double checking, it seems to me that reshape
finalizes if we got a subclass and wraps otherwise, but wrapping should be more general. Still thinking :)
@seberg - carrying on in the main thread so we don't loose the discussion once I push the next installment... My sense, though, is that we're making it slightly too hard: p.s. The travis error seems spurious (a build hanging). |
Updated a comment; travis now passes. |
@seberg - did you have further comments on this? |
Well, still not sure if I don't prefer wrap. We have precedence of wrapping for example in |
@seberg - looking at the documentation for That said, I can see the case for Of course, in principle, we could behave more closely similar to [1] http://docs.scipy.org/doc/numpy/reference/arrays.classes.html |
@seberg - what shall we do with this one? It would be nice to reach some conclusion, so I can also provide this in astropy/astropy#2327 |
I still prefer using array wrap a little, but if we view it as an array creation function I am fine with either really. At some point it would be nice to move this into C in any case, but just wishing ;) |
Hmmm, array finalize is probably good. But now I am wondering if it isn't cleaner to call |
37e4a9d
to
2d90fe6
Compare
OK, I rebased. @seberg - I'm not quite sure where in |
2d90fe6
to
26a02cd
Compare
@@ -20,7 +20,7 @@ def __init__(self, interface, base=None): | |||
self.__array_interface__ = interface | |||
self.base = base | |||
|
|||
def as_strided(x, shape=None, strides=None): | |||
def as_strided(x, shape=None, strides=None, subok=False): | |||
""" Make an ndarray from the given array with the given shape and strides. | |||
""" | |||
interface = dict(x.__array_interface__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was this. __array_interface__
is basically just one way to define an "array-like". So this function currently does not support most array likes out there at all (for example memoryviews, etc.), since they implement the buffer interface instead. To really support this, I think we should (first thing), call:
x = np.asanyarray(x)
here. After you have done that, __array_finalize__
is guaranteed to be defined. We will guarantee to return an array and not some array-like, which can be seen as an advantage or disadvantage. But I think I am fine with returning always an array (or array subclass here). On the other hand, the pandas guys might disagree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, pandas implements array wrap but not array finalize. I kind of think it is fine to say that as_strided
will always return an array, like np.asanyarray
, but honestly this is a jungle, so if someone disagrees....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg - I guess adding x = np.asanyarray(x)
deals with a somewhat separate issue, but obviously it is good to ensure the code is as general as possible, and this would allow one at least to remove the isinstance(x, np.ndarray)
stanza from the if statement below.
I'll do that, but also a question: since with this change, we know that x is an ndarray
subclass, might this also allow one to change the strides and shape more easily than through the DummyArray
mechanism, so that the dtype
and subclass
issues get handled more elegantly as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From C, you could do that very easily, but from python there is no other way really.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah, you are right, it deals with a seperate issue... I somewhat thought it might make this one simpler, too, but that doesn't change much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, actually if you really hate DummyArray, you can also use the np.ndarray
class constructor. You still will have to manually call finalize, etc. since that does not allow to set a parent.
Sorry for the delays. Just need to figure out the |
@seberg - OK, I made the changes, but in a slightly different way than you suggested, so please have a look. |
3a6c849
to
2b9064d
Compare
array = array.view(type=type(x)) | ||
# Since we have done something akin to a view from x, we should let | ||
# the subclass finalize (if it has it implemented). | ||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since x is a subclass now, I think you don't need the try/except at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I did the wrong test: if the subclass does not define __array_finalize__
, then it is None
and hence not callable. I now replaced it with an if statement.
2b9064d
to
e859031
Compare
Thanks for the comment; addressed, and a test added to ensure subclasses without |
On
It would allow one to remove the check on Do you think I should still do this here, or would this be better for a different PR, since it is getting a bit off-topic? (In princple, I can also back out the |
Actually, have to check what/if this does anynthing to object arrays (or void arrays with object fields). |
@seberg - I had almost started adding tests for record types at least, but felt too lazy... But can do so, as it would be good to improve the test coverage. For object arrays, would a test like the following be OK? Could you give me an example of "void arrays with object fields"?
|
Actually, I am surprised it isn't an error :), it certainly is good, just need to see if it changed. |
@seberg - I came back to this again, and while a test with an object dtype works, a complex one does not in current master:
This error disappears if instead of the
assignment suggested above. However, this does not work for non-contiguous data and thus fails one of the tests. Since the void arrays with object fields fail in current master as well, however, I think this is not really relevant for the current PR, which just aims to get subclasses to work properly. Maybe we can agree that this is OK as is, and raise an issue for the above? |
We don't need that kind of error here. Since we don't mess with the dtype and the original array is in charge of doing handling reference counting, it is always safe to do. As long as we have less errors then before things are good. It fails a test which originally did not exist is what you mean and which never worked? In that case I guess we can call it a minor issue that can be fixed another time. |
Yeah, should be good for merging. Won't do it right now, but if I forget about it in the next couple of days, just ping me. Thanks for putting up with all my small worries :). |
@seberg - the promised ping to remind you to merge this! |
Ack :), lets put it in then. At some point we should maybe just move this whole stuff to C... |
ENH: add subok flag to stride_tricks (and thus broadcast_arrays)
Thanks for the patience :) |
As is,
np.broadcast_arrays
does not preserve subclasses, even though for the shape manipulation subclasses should not be a problem. This PR adds asubok
option that allows one to change that, but isFalse
by default, so that existing code will continue to behave as is.