Skip to content

BUG: recarray __repr__ gives inaccurate representation #5523

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 30, 2015

Conversation

ahaldane
Copy link
Member

This is a followup to #5483, with the goal of solving #3581,

In #5483, I solved the problem that a "recarray" and a "record array" (nomenclature defined in #5482) looked identical by making sure that a type's subclass was listed in the repr. However, there is still the problem that recarrays are represented using the function 'rec.array' even though this function technically creates record arrays, not recarrays.

So I have updated recarray.__repr__ to give accurate representations. Demonstration:

Setup:

>>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')])
>>> recordarr = np.rec.array(a)
>>> recarr = a.view(np.recarray)
>>> recordview = a.view(dtype((np.record, a.dtype)))

(Very) Old Behavior:

>>> recordarr
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', 'S4')])
>>> recarr
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', 'S4')])
>>> recordview
array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', 'S4')])

Behavior after #5483:

>>> recordarr
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')]))
>>> recarr
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', 'S4')])
>>> recordview
array([(1, 'ABC'), (2, 'DEF')], 
      dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')]))

New Behavior:

>>> recordarr
rec.array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', '|S4')])
>>> recarr
array([(1, 'ABC'), (2, 'DEF')], 
      dtype=[('foo', '<i8'), ('bar', 'S4')]).view(numpy.recarray)
>>> recordview
array([(1, 'ABC'), (2, 'DEF')], 
      dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')]))

Representing the recarr using array plus a view seems a little clunky, but there aren't many other options. The only other option I can think of is to use the recarray constructor, but that seems even clunkier since it has a totally different signature.

However, perhaps the clunkier description will be motivation for people to always use np.rec.array, and help emphasize that taking view(np.recarray) gives you a weird hybrid.

Also, I now get some extra | in the dtype since I use dtype.descr, hopefully that's acceptable.

I also fixed some missing parentheses in the record array docs.

@ahaldane ahaldane force-pushed the recarray_fixrepr branch 3 times, most recently from ed10d2b to d802e09 Compare January 29, 2015 19:54
# show zero-length shape unless it is (0,)
lst = "[], shape=%s" % (repr(self.shape),)

if self.dtype.type == record:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe be self.dtype.type is record.

@charris
Copy link
Member

charris commented Jan 30, 2015

Well, that representation is certainly a bit strange, but, I suppose, accurate ;)

LGTM modulo possible nitpick.

In numpy#5483, I solved the problem that a
"recarray" and a "record array" (nomenclature defined in
numpy#5482) looked identical by making
sure that a type's subclass was listed in the repr. However, recarrays
are still represented using the function 'rec.array' even though this
function technically creates record arrays, not recarrays.

So I have updated recarray.__repr__.

Setup:

    >>> a = np.array([(1,'ABC'), (2, "DEF")], dtype=[('foo', int), ('bar', 'S4')])
    >>> recordarr = np.rec.array(a)
    >>> recarr = a.view(np.recarray)

Behavior after numpy#5483:

    >>> recordarr
    rec.array([(1, 'ABC'), (2, 'DEF')],
          dtype=(numpy.record, [('foo', '<i8'), ('bar', 'S4')]))
    >>> recarr
    rec.array([(1, 'ABC'), (2, 'DEF')],
          dtype=[('foo', '<i8'), ('bar', 'S4')])

New Behavior:

    >>> recordarr
    rec.array([(1, 'ABC'), (2, 'DEF')],
          dtype=[('foo', '<i8'), ('bar', '|S4')])
    >>> recarr
    array([(1, 'ABC'), (2, 'DEF')],
          dtype=[('foo', '<i8'), ('bar', 'S4')]).view(numpy.recarray)
@ahaldane
Copy link
Member Author

Done.

Thanks for checking over everything!

I think this fixes #3581. At least, it explains why that behavior is expected.

charris added a commit that referenced this pull request Jan 30, 2015
BUG: recarray __repr__ gives inaccurate representation
@charris charris merged commit ea52028 into numpy:master Jan 30, 2015
@charris
Copy link
Member

charris commented Jan 30, 2015

Merged, thanks @ahaldane .

@juliantaylor
Copy link
Contributor

this breaks scipy/io/tests/test_idl.py

ERROR: test_idl.TestStructures.test_scalars_replicated_3d
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/usr/lib/python2.7/dist-packages/scipy/io/tests/test_idl.py", line 190, in test_scalars_replicated_3d
    s = readsav(path.join(DATA_PATH, 'struct_scalars_replicated_3d.sav'), verbose=False)
  File "/usr/lib/python2.7/dist-packages/scipy/io/idl.py", line 813, in readsav
    replace, new = _replace_heap(r['data'], heap)
  File "/usr/lib/python2.7/dist-packages/scipy/io/idl.py", line 599, in _replace_heap
    for ir, record in enumerate(variable):
  File "/tmp/local/lib/python2.7/site-packages/numpy/core/records.py", line 481, in __getitem__
    return obj.view(dtype=(self.dtype.type, obj.dtype.descr))
  File "/tmp/local/lib/python2.7/site-packages/numpy/core/records.py", line 540, in view
    return ndarray.view(self, dtype)
  File "/tmp/local/lib/python2.7/site-packages/numpy/core/records.py", line 457, in __setattr__
    raise exctype(value)
TypeError: Cannot change data-type for object array.

@ahaldane
Copy link
Member Author

ahaldane commented Mar 3, 2015

I'll take a closer look tonight.

The purpose of the line that raises the error is to make accessing a structured field of an array with np.record dtype, also be returned as an np.record. I think the idea is correct and should be kept.

I don't think the bug is in this code, but rather I think this exposes the problem seen in a number of other issues, that you can't view object arrays. I happen to be trying to solve this problem in #5548 (see part 1 there for a list of similar problems with recarrays).

I haven't had enough time to work on that PR lately, and the scope of changes is getting a little too ambitious. However, I think the portion of that PR that fixes object array views is good to go. I could submit that as a smaller PR on its own, which I think would solve this and related issues.

@rgommers
Copy link
Member

gh-5762 was opened for this and marked as a blocker for 1.10.

@ahaldane
Copy link
Member Author

I don't think the bug is in this PR, exactly. Rather, this PR uses some views of the original array, and this PR merely exposes the fact that currently views into object arrays do not work properly.

This (gh-5762) would be solved by my other PR #5548, which allows views into object arrays. I just ran the test case in gh-5762 with that PR and it works fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants