BUG: automatically convert recarray dtype to np.record #5943

ahaldane · 2015-06-05T16:39:24Z

As discussed in #3581, this PR automatically converts the dtype of np.recarrays to np.record.

To get unit tests to pass I also had to add a setattr to the MaskedArray class: MaskedArrays did not support assignment to dtype attribute, as demonstrated in the following examples:

>>> a = np.zeros(4, dtype='f4,i4')
>>> m = np.ma.array(a)
>>> m.dtype = np.dtype('f8')
>>> m   # Exception
>>> a.view(dtype='f8', type=np.ma.MaskedArray) #Exception

MaskedArray.setattr now catches assignments to dtype and updates the mask accordingly.

Possible issues with this PR, if anyone has any ideas:

Viewing as a.view(np.recarray) now changes the dtype. It's not totally clear to me this won't break anything. (Example: It broke maskedarrays above, but due to a maskedaray bug).
Another example of the last point: Views are not reversible.

>>> a = np.zeros(4, 'f4,i4')
>>> b = a.view(np.recarray).view(np.ndarray)
>>> b.dtype
dtype((numpy.record, [('f0', '<f4'), ('f1', '<i4')]))

Note (not really a problem): Attempt to create record arrays of non-structured type does not set the dtype to np.record (since it is not possible to do so). This is not new to this PR.

>>> a = np.zeros(5, 'f8')
>>> np.rec.array(a)
array([ 0.,  0.,  0.,  0.], 
      dtype=float64).view(numpy.recarray)

Benefits:

Goal is to improve results in record array is made up of numpy.void, not numpy.record #3581

I still need to finish writing unit tests

ahaldane · 2015-06-05T18:10:55Z

ugh there are still two test failures, deep in MaskedArray code. I'll work on this later.

charris · 2015-06-05T18:19:26Z

My sympathies ;)

mhvk · 2015-06-05T18:38:28Z

@ahaldane - I hadn't thought about the reversibility. Since that cannot possibly be guaranteed anyway, I think it is not a major problem, but perhaps it is good in the documentation to mention how one would get a regular structured array out (i.e., reverse the dtype change). Is it just rec.view(dtype=np.dtype(np.void, rec.dtype), np.ndarray)?

ahaldane · 2015-06-08T00:47:49Z

Update, here's what's new

Updated docs for how to reverse the view. @mhvk That makes sense. And I think the reverse view needs to be

rec.view(rec.dtype.fields or rec.dtype, np.ndarray)

to take into account non-structured (non-void) record arrays. IE, both of the following record arrays

rec = np.rec.array(ones(4, dtype='f4,i4')) # structured, dtype is np.void
rec = np.rec.array(ones(4, dtype='f8'))    # unstructured, dtype is np.float64

While working out the view above, I noticed that non-void record arrays can be created/represented using rec.array despite not having np.record dtype, so I modified recarray.__repr__ accordingly.
I removed the recarray.view method. The only case it differed from ndarray.view was for non-structured dtype, in which case it returned an ndarray instead of a (non-structured) recarray. But this breaks chained views:

>>> rec.view('f8').view('f4,i4')

The user probably wants the final result to be a record array, but the middle view currently converts to ndarray. There are already a fair number of places non-structured recarrays turn up (see above), so no reason to hide them.

Hopefully clarified the docs for the dtype((base_dtype, new_dtype)) form of dtype specification.

As for the MaskedArray ssues, they pass tests now but I think there may still be something wrong. I still need to think more about it.

mhvk · 2015-06-08T01:53:34Z

@ahaldane - I had a quick look through the latest PR and it seems OK (but as I don't use recarray much myself, someone else should definitely look as well).

ahaldane · 2015-06-13T17:43:21Z

Updated with extra tweaks to the MaskedArray part. (I think the recarray part is done).

I feel better about the MaskedArray changes now. As a reminder, this needed to happen to pass unit tests related to masked record arrays.

A problem was that the masked singleton's shape was being overwritten if I wasn't careful (another example of #5806). I think my changes are safe wrt this now. As an extra precaution I also tweaked MaskedArray.anom by quitting early if the result is masked, to avoid any extra manipulation of masked.

I also removed code that updates the mask's shape after the dtype is updated. I am not sure what its purpose was, and removing it doesn't affect any unit tests. Maskedarray doesn't support dtype changes that change the shape anyway. Maybe I have to think more about it, but right now it seems OK to me, and is easily changed back.

mhvk · 2015-06-18T18:54:53Z

numpy/core/records.py

+    def __array_finalize__(self, obj):
+        if type(obj) is not type(self):
+            # invoke __setattr__
+            self.dtype = self.dtype


@ahaldane - In principle, at this point self.dtype is obj.dtype -- I think it would read better to just write self.dtype = obj.dtype, and expand your comment. e.g.,

# invoke __setattr__, which for void dtypes will do # self.dtype = sb.dtype((record, obj.dtype)) self.dtype = obj.dtype

I expanded the comment, but I still have to use self.dtype since the condition is "self.dtype is not obj.dtype" ;)

ahaldane · 2015-06-19T14:41:35Z

Updated with better comment + expanded tests, plus added the shape checks mentioned above back in. When adding the new tests I finally figured out why the shape manipultion was there - it raises a ValueError if the dtype change cannot apply to the mask.

Eg, when changing from 'f4,f4' to 'f8', the mask would change from [('f0', '?'), ('f1', '?')] to bool, but it is unclear how to construct the new mask values from the old mask - should you OR the old mask? Currently MA just disallows that dtype change. 'f4,f4' to 'f4' on the other hand has no such problem.

mhvk · 2015-06-19T15:18:29Z

numpy/core/records.py

+        if type(obj) is not type(self):
+            # invoke __setattr__, which for void dtypes will do
+            # self.dtype = sb.dtype((record, self.dtype))
+            self.dtype = self.dtype


@ahaldane - it should be:

class myarray(np.ndarray): def __array_finalize__(self, obj): print(self, obj) print(type(self), type(obj), type(self) is type(obj)) if obj is not None: print(self.dtype, obj.dtype, self.dtype is obj.dtype) # Now the three ways we can get to __array_finalize__ myarray((4,), buffer=np.arange(4.)) # [ 0. 1. 2. 3.] None # <class '__main__.myarray'> <class 'NoneType'> # --> myarray([ 0., 1., 2., 3.]) mya[:2] # [ 0. 1.] [ 0. 1. 2. 3.] # <class '__main__.myarray'> <class '__main__.myarray'> # float64 float64 True # --> myarray([ 0., 1.]) np.arange(4.).view(myarray) # [ 0. 1. 2. 3.] [ 0. 1. 2. 3.] # <class '__main__.myarray'> <class 'numpy.ndarray'> False # float64 float64 True # --> Out[15]: myarray([ 0., 1., 2., 3.])

I think this is just a problem with your if statement; it should be

if obj is not None and type(self) is not type(obj): self.dtype = obj.dtype

Maybe I'm being too clever, but I think we want the dtype to be set to np.record in the case of an explicit constructor. I still think the clause I have is the right one.

class TestClass(np.ndarray): def __array_finalize__(self, obj): print(obj is not None and type(self) is not type(obj), type(obj) is not type(self)) a = np.arange(10) t1 = a.view(TestClass) # view fron ndarray t2 = TestClass(4) # explicit constructor t3 = t1[:2] # slice t4 = t1.view(TestClass) # view from same class

This prints out (left is your if, right is my if)

(True, True) (False, True) (False, False) (False, False)

I think the right is what I want. Is that correct?

OK, if you need it to be set also for explicit construction then, yes, your case is the right one. Since this was not obvious to me, and therefore may not be obvious to a future person looking at the code, maybe write the comment as:

# Both when constructing a new instance (obj=None) and when viewing # from another class -- obj.view(type(self)) -- we invoke __setattr__, # which for void dtypes will do self.dtype = sb.dtype((record, self.dtype)).

mhvk · 2015-06-19T15:21:51Z

@ahaldane - nice catch on the mask change with incompatible dtype. While oring the mask seems OK for the example you give, it does seem safest to just disallow it, at least for now!

ahaldane · 2015-06-19T19:00:13Z

@mhvk the detailed comment is a good idea. Done.

mhvk · 2015-06-19T19:09:51Z

Great, looks all OK to me!

ahaldane · 2015-06-19T20:02:58Z

Don't merge, I found a problem:

>>> r = np.rec.array(np.ones(4, dtype='f4,f4'))
>>> r.view('f8').dtype
dtype('float64')

will fix soon.

ahaldane · 2015-06-19T20:35:08Z

Oh never mind, that is a scalar recarray. It is working as expected, I just confused myself.

But it did make me notice, perhaps a better if clause in array_finalize (as we were discussing above) is

if self.dtype.type is not record

I think that's clearer, right?

charris · 2015-06-19T20:41:17Z

Agree.

Viewing an ndarray as a np.recarray now automatically converts the dtype to np.record. This commit also fixes assignment to MaskedArray's dtype attribute, fixes the repr of recarrays with non-structured dtype, and removes recarray.view so that viewing a recarray as a non-structured dtype no longer converts to ndarray type. Fixes numpy#3581

mhvk · 2015-06-19T20:58:43Z

Yes, better, now the test is much more self-explanatory.

BUG: automatically convert recarray dtype to np.record

charris · 2015-06-19T21:39:15Z

@ahaldane @mhvk Thanks, merged. I hope it is correct ;)

ahaldane · 2015-06-19T21:55:13Z

Thanks, and thanks @mhvk for helpful comments & feedback.

Record array views were updated in numpy#5943 to return np.record dtype where possible, but forgot about the case of sub-arrays. That's fixed here, so accessing subarray fields by attribute or index works sensibly, as well as viewing a record array as a subarray dtype, and printing subarrays. This also happens to fix numpy#6459, since it affects the same lines. Fixes numpy#6497 numpy#6459

ahaldane force-pushed the record_finalize branch 4 times, most recently from bf2597f to 742801f Compare June 5, 2015 18:10

ahaldane force-pushed the record_finalize branch from 742801f to dc5161c Compare June 8, 2015 00:28

charris added 00 - Bug component: numpy._core component: numpy.lib labels Jun 13, 2015

ahaldane force-pushed the record_finalize branch from dc5161c to 73bb3e1 Compare June 13, 2015 17:27

This was referenced Jun 14, 2015

Numpy 1.10 issues for io.fits astropy/astropy#3854

Closed

BUG Ensure masked object arrays can always return single items. #5962

Merged

charris added this to the 1.10.0 release milestone Jun 17, 2015

mhvk reviewed Jun 18, 2015
View reviewed changes

ahaldane force-pushed the record_finalize branch from 73bb3e1 to cfd3dac Compare June 19, 2015 14:33

ahaldane force-pushed the record_finalize branch from cfd3dac to f8b5308 Compare June 19, 2015 15:00

mhvk reviewed Jun 19, 2015
View reviewed changes

ahaldane force-pushed the record_finalize branch from f8b5308 to 867ed71 Compare June 19, 2015 18:58

ahaldane force-pushed the record_finalize branch from 867ed71 to 698a8d2 Compare June 19, 2015 20:42

ahaldane force-pushed the record_finalize branch from 698a8d2 to a93b862 Compare June 19, 2015 20:43

charris added a commit that referenced this pull request Jun 19, 2015

Merge pull request #5943 from ahaldane/record_finalize

e42bea5

BUG: automatically convert recarray dtype to np.record

charris merged commit e42bea5 into numpy:master Jun 19, 2015

ahaldane mentioned this pull request Oct 18, 2015

BUG: recarrays viewed as subarrays don't convert to np.record type #6500

Merged

charris mentioned this pull request Oct 18, 2015

backport 6500: BUG: recarrays viewed as subarrays don't convert to np.record type #6504

Merged

eendebakpt mentioned this pull request May 7, 2025

DEP: Deprecate setting the strides and dtype of a numpy array #28901

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: automatically convert recarray dtype to np.record #5943

BUG: automatically convert recarray dtype to np.record #5943

ahaldane commented Jun 5, 2015

ahaldane commented Jun 5, 2015

charris commented Jun 5, 2015

mhvk commented Jun 5, 2015

ahaldane commented Jun 8, 2015

mhvk commented Jun 8, 2015

ahaldane commented Jun 13, 2015

mhvk Jun 18, 2015

ahaldane Jun 19, 2015

ahaldane commented Jun 19, 2015

mhvk Jun 19, 2015

ahaldane Jun 19, 2015

mhvk Jun 19, 2015

mhvk commented Jun 19, 2015

ahaldane commented Jun 19, 2015

mhvk commented Jun 19, 2015

ahaldane commented Jun 19, 2015

ahaldane commented Jun 19, 2015

charris commented Jun 19, 2015

mhvk commented Jun 19, 2015

charris commented Jun 19, 2015

ahaldane commented Jun 19, 2015

BUG: automatically convert recarray dtype to np.record #5943

BUG: automatically convert recarray dtype to np.record #5943

Conversation

ahaldane commented Jun 5, 2015

ahaldane commented Jun 5, 2015

charris commented Jun 5, 2015

mhvk commented Jun 5, 2015

ahaldane commented Jun 8, 2015

mhvk commented Jun 8, 2015

ahaldane commented Jun 13, 2015

mhvk Jun 18, 2015

Choose a reason for hiding this comment

ahaldane Jun 19, 2015

Choose a reason for hiding this comment

ahaldane commented Jun 19, 2015

mhvk Jun 19, 2015

Choose a reason for hiding this comment

ahaldane Jun 19, 2015

Choose a reason for hiding this comment

mhvk Jun 19, 2015

Choose a reason for hiding this comment

mhvk commented Jun 19, 2015

ahaldane commented Jun 19, 2015

mhvk commented Jun 19, 2015

ahaldane commented Jun 19, 2015

ahaldane commented Jun 19, 2015

charris commented Jun 19, 2015

mhvk commented Jun 19, 2015

charris commented Jun 19, 2015

ahaldane commented Jun 19, 2015