-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: automatically convert recarray dtype to np.record #5943
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
bf2597f
to
742801f
Compare
ugh there are still two test failures, deep in MaskedArray code. I'll work on this later. |
My sympathies ;) |
@ahaldane - I hadn't thought about the reversibility. Since that cannot possibly be guaranteed anyway, I think it is not a major problem, but perhaps it is good in the documentation to mention how one would get a regular structured array out (i.e., reverse the dtype change). Is it just |
Update, here's what's new
to take into account non-structured (non-void) record arrays. IE, both of the following record arrays
The user probably wants the final result to be a record array, but the middle view currently converts to ndarray. There are already a fair number of places non-structured recarrays turn up (see above), so no reason to hide them.
As for the MaskedArray ssues, they pass tests now but I think there may still be something wrong. I still need to think more about it. |
@ahaldane - I had a quick look through the latest PR and it seems OK (but as I don't use |
dc5161c
to
73bb3e1
Compare
Updated with extra tweaks to the MaskedArray part. (I think the recarray part is done). I feel better about the MaskedArray changes now. As a reminder, this needed to happen to pass unit tests related to masked record arrays. A problem was that the masked singleton's shape was being overwritten if I wasn't careful (another example of #5806). I think my changes are safe wrt this now. As an extra precaution I also tweaked I also removed code that updates the mask's shape after the dtype is updated. I am not sure what its purpose was, and removing it doesn't affect any unit tests. Maskedarray doesn't support dtype changes that change the shape anyway. Maybe I have to think more about it, but right now it seems OK to me, and is easily changed back. |
def __array_finalize__(self, obj): | ||
if type(obj) is not type(self): | ||
# invoke __setattr__ | ||
self.dtype = self.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahaldane - In principle, at this point self.dtype is obj.dtype
-- I think it would read better to just write self.dtype = obj.dtype
, and expand your comment. e.g.,
# invoke __setattr__, which for void dtypes will do
# self.dtype = sb.dtype((record, obj.dtype))
self.dtype = obj.dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded the comment, but I still have to use self.dtype
since the condition is "self.dtype is not obj.dtype" ;)
73bb3e1
to
cfd3dac
Compare
Updated with better comment + expanded tests, plus added the shape checks mentioned above back in. When adding the new tests I finally figured out why the shape manipultion was there - it raises a ValueError if the dtype change cannot apply to the mask. Eg, when changing from 'f4,f4' to 'f8', the mask would change from |
cfd3dac
to
f8b5308
Compare
if type(obj) is not type(self): | ||
# invoke __setattr__, which for void dtypes will do | ||
# self.dtype = sb.dtype((record, self.dtype)) | ||
self.dtype = self.dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ahaldane - it should be:
class myarray(np.ndarray):
def __array_finalize__(self, obj):
print(self, obj)
print(type(self), type(obj), type(self) is type(obj))
if obj is not None:
print(self.dtype, obj.dtype, self.dtype is obj.dtype)
# Now the three ways we can get to __array_finalize__
myarray((4,), buffer=np.arange(4.))
# [ 0. 1. 2. 3.] None
# <class '__main__.myarray'> <class 'NoneType'>
# --> myarray([ 0., 1., 2., 3.])
mya[:2]
# [ 0. 1.] [ 0. 1. 2. 3.]
# <class '__main__.myarray'> <class '__main__.myarray'>
# float64 float64 True
# --> myarray([ 0., 1.])
np.arange(4.).view(myarray)
# [ 0. 1. 2. 3.] [ 0. 1. 2. 3.]
# <class '__main__.myarray'> <class 'numpy.ndarray'> False
# float64 float64 True
# --> Out[15]: myarray([ 0., 1., 2., 3.])
I think this is just a problem with your if
statement; it should be
if obj is not None and type(self) is not type(obj):
self.dtype = obj.dtype
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm being too clever, but I think we want the dtype to be set to np.record
in the case of an explicit constructor. I still think the clause I have is the right one.
class TestClass(np.ndarray):
def __array_finalize__(self, obj):
print(obj is not None and type(self) is not type(obj),
type(obj) is not type(self))
a = np.arange(10)
t1 = a.view(TestClass) # view fron ndarray
t2 = TestClass(4) # explicit constructor
t3 = t1[:2] # slice
t4 = t1.view(TestClass) # view from same class
This prints out (left is your if, right is my if)
(True, True)
(False, True)
(False, False)
(False, False)
I think the right is what I want. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, if you need it to be set also for explicit construction then, yes, your case is the right one. Since this was not obvious to me, and therefore may not be obvious to a future person looking at the code, maybe write the comment as:
# Both when constructing a new instance (obj=None) and when viewing
# from another class -- obj.view(type(self)) -- we invoke __setattr__,
# which for void dtypes will do self.dtype = sb.dtype((record, self.dtype)).
@ahaldane - nice catch on the mask change with incompatible dtype. While |
f8b5308
to
867ed71
Compare
@mhvk the detailed comment is a good idea. Done. |
Great, looks all OK to me! |
Don't merge, I found a problem:
will fix soon. |
Oh never mind, that is a scalar recarray. It is working as expected, I just confused myself. But it did make me notice, perhaps a better if clause in array_finalize (as we were discussing above) is
I think that's clearer, right? |
Agree. |
867ed71
to
698a8d2
Compare
Viewing an ndarray as a np.recarray now automatically converts the dtype to np.record. This commit also fixes assignment to MaskedArray's dtype attribute, fixes the repr of recarrays with non-structured dtype, and removes recarray.view so that viewing a recarray as a non-structured dtype no longer converts to ndarray type. Fixes numpy#3581
698a8d2
to
a93b862
Compare
Yes, better, now the test is much more self-explanatory. |
BUG: automatically convert recarray dtype to np.record
Thanks, and thanks @mhvk for helpful comments & feedback. |
Record array views were updated in numpy#5943 to return np.record dtype where possible, but forgot about the case of sub-arrays. That's fixed here, so accessing subarray fields by attribute or index works sensibly, as well as viewing a record array as a subarray dtype, and printing subarrays. This also happens to fix numpy#6459, since it affects the same lines. Fixes numpy#6497 numpy#6459
Record array views were updated in numpy#5943 to return np.record dtype where possible, but forgot about the case of sub-arrays. That's fixed here, so accessing subarray fields by attribute or index works sensibly, as well as viewing a record array as a subarray dtype, and printing subarrays. This also happens to fix numpy#6459, since it affects the same lines. Fixes numpy#6497 numpy#6459
Record array views were updated in numpy#5943 to return np.record dtype where possible, but forgot about the case of sub-arrays. That's fixed here, so accessing subarray fields by attribute or index works sensibly, as well as viewing a record array as a subarray dtype, and printing subarrays. This also happens to fix numpy#6459, since it affects the same lines. Fixes numpy#6497 numpy#6459
As discussed in #3581, this PR automatically converts the dtype of
np.recarray
s tonp.record
.To get unit tests to pass I also had to add a setattr to the
MaskedArray
class: MaskedArrays did not support assignment to dtype attribute, as demonstrated in the following examples:MaskedArray.setattr now catches assignments to dtype and updates the mask accordingly.
Possible issues with this PR, if anyone has any ideas:
Viewing as
a.view(np.recarray)
now changes the dtype. It's not totally clear to me this won't break anything. (Example: It broke maskedarrays above, but due to a maskedaray bug).Another example of the last point: Views are not reversible.
np.record
(since it is not possible to do so). This is not new to this PR.Benefits:
I still need to finish writing unit tests