Skip to content

recarray.__getitem__ with field list gives "TypeError: Cannot change data-type for object array." when dtype contains object #3256

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benmoran opened this issue Apr 17, 2013 · 11 comments

Comments

@benmoran
Copy link

This code works for me on numpy 1.6.2 but on 1.7.1 I get an exception, because in the new version numpy.core._internal._index_fields calls ary.view() and this doesn't work on object arrays.

import numpy as np
ra = np.recarray((2,), dtype=[('x', object), ('y', float), ('z', int)]) 
ra[['x','y']]

Example on 1.7.1:

In [13]: ra[['x','y']]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

.../python-271-20110303-2/lib/python2.7/site-packages/numpy/core/records.pyc in __getitem__(self, indx)
    455 
    456     def __getitem__(self, indx):
--> 457         obj = ndarray.__getitem__(self, indx)
    458         if (isinstance(obj, ndarray) and obj.dtype.isbuiltin):
    459             return obj.view(ndarray)

.../lib/python2.7/site-packages/numpy/core/_internal.pyc in _index_fields(ary, fields)
    294 
    295     view_dtype = {'names':names, 'formats':formats, 'offsets':offsets, 'itemsize':dt.itemsize}
--> 296     view = ary.view(dtype=view_dtype)
    297 
    298     # Return a copy for now until behavior is fully deprecated


.../lib/python2.7/site-packages/numpy/core/records.pyc in view(self, dtype, type)
    495             if dtype.fields is None:
    496                 return self.__array__().view(dtype)
--> 497             return ndarray.view(self, dtype)
    498         else:
    499             return ndarray.view(self, dtype, type)

.../lib/python2.7/site-packages/numpy/core/records.pyc in __setattr__(self, attr, val)
    437             if attr not in fielddict:
    438                 exctype, value = sys.exc_info()[:2]
--> 439                 raise exctype, value
    440         else:
    441             fielddict = ndarray.__getattribute__(self, 'dtype').fields or {}

TypeError: Cannot change data-type for object array.

@ndawe
Copy link

ndawe commented Sep 5, 2013

Input from the devs would be great here. I see the same problem in 1.7.1.

@risa2000
Copy link

I wonder if it was fixed since I am seeing the same problem on NumPy 1.8.0:

Python 2.7.6 (default, Nov 10 2013, 19:24:18) [MSC v.1500 32 bit (Intel)]
Type "copyright", "credits" or "license" for more information.

IPython 1.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import numpy as np

In [2]: ra = np.recarray((2,), dtype=[('x', object), ('y', float), ('z', int)])

In [3]: ra[['x','y']]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-5fa8448ed344> in <module>()
----> 1 ra[['x','y']]

C:\Python27\lib\site-packages\numpy\core\records.pyc in __getitem__(self, indx)
    457
    458     def __getitem__(self, indx):
--> 459         obj = ndarray.__getitem__(self, indx)
    460         if (isinstance(obj, ndarray) and obj.dtype.isbuiltin):
    461             return obj.view(ndarray)

C:\Python27\lib\site-packages\numpy\core\_internal.pyc in _index_fields(ary, fields)
    299
    300     view_dtype = {'names':names, 'formats':formats, 'offsets':offsets, 'itemsize':dt.itemsiz e}
--> 301     view = ary.view(dtype=view_dtype)
    302
    303     # Return a copy for now until behavior is fully deprecated

C:\Python27\lib\site-packages\numpy\core\records.pyc in view(self, dtype, type)
    497             if dtype.fields is None:
    498                 return self.__array__().view(dtype)
--> 499             return ndarray.view(self, dtype)
    500         else:
    501             return ndarray.view(self, dtype, type)

C:\Python27\lib\site-packages\numpy\core\records.pyc in __setattr__(self, attr, val)
    439             if attr not in fielddict:
    440                 exctype, value = sys.exc_info()[:2]
--> 441                 raise exctype(value)
    442         else:
    443             fielddict = ndarray.__getattribute__(self, 'dtype').fields or {}

TypeError: Cannot change data-type for object array.

In [4]:

@seberg
Copy link
Member

seberg commented Jan 21, 2014

No, nobody worked on that. You would need to add some extra logic to PyArray_View to relax the constraints. Also more uncontiguous array views could be allowed I think.

Edit: That said, I am not sure it is easy to actually check the legality generally, but allowing it at least for this kind of index usage should not be very hard.

@yamins81
Copy link

has this been addressed somewhere?

@ahaldane
Copy link
Member

I've taken a look. First, this is not a bug in recarray but in ndarray, since it happens with plain ndarrays:

>>> a = np.array([(1,2)], dtype=[('x', np.object), ('y', float)])
>>> a[['x','y']]
TypeError: Cannot change data-type for object array.

What is going on?

As pointed out above, numpy.core._internal._index_fields is used to retrieve multiple fields from ndarrays at once, which it does using a view. This fails because you cannot take views of ndarrays with Python Objects in them.

That limitation makes sense to me because np.view allows the user to reinterpret the array memory as whatever datatype s/he wants, and in the case of a python object (which contains a pointer to other parts of memory) it would allow the user to overwrite arbitrary memory.

But, indexing with a single field works (eg, a['x']). How come? Because the function for getting a single field, PyArray_GetField, is written in C and "manually" gets a view using C functions (specifically, PyArray_NewFromDescr) which are not limited in this way. There is no memory-access risk here because PyArray_GetField only takes a "limited" view which does not change the datatype annotation of the underlying memory - Objects stay as Objects. In contrast the function np.view is more powerful as it also allows you to change the datatype, and it's therefore banned from doing anything with Objects.

How to fix this?

One way is to rewrite numpy.core._internal._index_fields in C so that it can also "manually" take a view.

A more ambitious idea is that there should be a less powerful version of np.view which is only allowed to 'mask' the datatype without changing the type annotations of the memory, but is then free to work with Object datatypes. Actually, now that I've read this bug report I realize there are two lines in records.py which fail for Python Object datatypes, which fail for a similar (but slightly different) reason, but could also be fixed with such a function.

Finally, a bad idea I include for completeness: It looks like numpy.core._internal._index_fields currently makes a copy of the view anyway. It could be easily rewritten to copy the data without taking a view in the first place. However, judging from the comments in that function and from the FutureWarning I get messing around with it, the fact that it makes a copy is not desired.

@ahaldane
Copy link
Member

Hmm. Further investigation into datatype views uncovers a new possible bug: ndarray.getfield allows the user to reinterpret arbitrary data as Python Objects.

>>> a = np.array([1,2,3], dtype='i8')
>>> a.getfield(np.dtype('O'), 0)
zsh: segmentation fault (core dumped)  python2

So what I said above is wrong - PyArray_GetField is not safe.

I'm thinking a solution to both bugs is to fundamentally change np.getfield (and PyArray_GetField). First, these should not be able to change the datatype. Instead of taking arguments getfield(dtype, offset), maybe it should be getfield(fieldname, subclass=None). This would allow you to extract a field without changing datatype, but also allow you to subclass the datatype (to solve my problems in records.py). Second, it should probably be getfields not getfield, ie, it can handle multiple fields not just one.

But this probably breaks the API pretty badly. I see PyArray_GetField listed in numpy/core/code_generators/numpy_api.py, which I assume lists things that shouldn't be changed.

@jaimefrio
Copy link
Member

Maybe I am missing something, but it would seem to me that, we are treating views of structured arrays, and arrays with objects, with more respect than they deserve. Given any dtype, no matter how deeply nested, one could recursively search it, and extract the offsets of all 'O' entries. This is the relevant information to not mess things up when taking views, e.g., there should be no reason why one shouldn't be allowed to take a view of a dtype like [('', '<u4'), ('', 'O'), ('', '<u4')] as [('', '<u4'), ('', 'O'), ('', '>f8'), ('', 'O'), ('', '>u2'), ('', '<u2')].

If you actually create those:

dta = np.dtype([('', '<u4'), ('', 'O'), ('', '<u4')])
dtb = np.dtype([('', '<u4'), ('', 'O'), ('', '>f8'), ('', 'O'), ('', '>u2'), ('', '<u2')])

>>> dta.itemsize
12
>>> dta.fields
dict_proxy({'f0': (dtype('uint32'), 0), 'f1': (dtype('O'), 4), 'f2': (dtype('uint32'), 8)})

>>> dtb.itemsize
24
>>> dtb.fields
dict_proxy({'f0': (dtype('uint32'), 0), 'f1': (dtype('O'), 4), 'f2': (dtype('>f8'), 8),
             'f3': (dtype('O'), 16), 'f4': (dtype('>u2'), 20), 'f5': (dtype('uint16'), 22)})

It should be easy to see that they indeed are compatible. A general algorithm to figure this out will be a little more elaborate, but the implementation should be relatively straightforward.

If PyArray_View were to implement such an algorithm, or more precisely array_descr_set, see #5508, for my current attempt at relaxing the requirements, then a lot of these type changes could be done fearlessly.

Does this make any sense, or am I completely wrong?

@ahaldane
Copy link
Member

Hi @jaimefrio,

Since I wrote my last comments I came to the same conclusion. I've half written up the code to do this (non working branch here, most recent commit here).

It changes the same region of code you're working on in #5508 (which looks good btw), so hopefully we won't step on each other's feet.

@jaimefrio
Copy link
Member

Feel free to hack away with total disregard for #5508 if that helps you arrive at a better solution: I have been thinking about this a little bit, and I am pretty sure structured and unstructured dtypes can be handled with a common code base.

I was actually planning on prototyping the algorithm to check dtype compatibility in Python, and sharing it with the list to see what some of the more knowledgeable folks think of it. Will probably still do it, if only to see how it compares with what you have in mind.

@ahaldane
Copy link
Member

Sounds good to me. Actually, in that case I'll leave this for a bit and go back to finish what I was working on in records.py, which I still need to do!

FWIW, here's the latest algorithm I wrote up which I think is compatible with your PR. It builds and runs, and seems to work in the one or two cases I could test.

@jaimefrio
Copy link
Member

I have reworked your two functions a bit and added a few tests here, and I think it now takes care of every imaginable corner case, please take a look if you have some time.

I am also writing an e-mail for the numpy list, in case someone wants to chime in.

ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 4, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 4, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 4, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 4, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 4, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 5, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 5, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 5, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
ahaldane added a commit to ahaldane/numpy that referenced this issue Jun 5, 2015
Previously views of structured arrays containing objects were completely
disabled.  This commit adds more lenient check for whether an object-array
view is allowed, and adds similar checks to getfield/setfield

Fixes numpy#2346. Fixes numpy#3256. Fixes numpy#2599. Fixes numpy#3253. Fixes numpy#3286.
Fixes numpy#5762.
zar1 added a commit to dssg/diogenes that referenced this issue Oct 22, 2015
jasonrwang pushed a commit to jasonrwang/EMAworkbench that referenced this issue Jan 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants