ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype #5920

embray · 2015-05-27T14:17:46Z

arraydescr_new allows passing in a dict to represent a multi-field
dtype. However, the .fields attribute on such dtypes is returned
as a dictproxy object. Therefore it is convenient, for copying
existing multi-field dtypes, to be able to pass in a dictproxy object
as well.

I have another PR I will post shortly that motivated this, and that
demonstrates an example of the (admittedly uncommon) case
where this is useful.

This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)

jaimefrio · 2015-05-27T17:13:32Z

I'm probably missing some corner case, but... Why don't we simply rename _convert_from_dict to -convert_from_mapping and handle all of those directly through the PyMapping_* family of functions? Is there any object we don't want to see here that would slip through a PyMapping_Check?

embray · 2015-05-27T18:30:12Z

I considered that at first, but I don't know what the performance implications would be for using the generic methods over the dict-specific functions (which is more common), though maybe since it's just type construction it would all be in the margins anyways. I could try it and see.

However, PyMapping_Check is unreliable (see for instance https://bugs.python.org/issue5945) and would not be useful trying to accept generic mapping-like objects. In fact, there isn't a great way in pure C to check if an object satisfies the mapping ABC--if nothing else use of PyDict_Check should be used first for the most common case, since that's quick and reliable.

charris · 2015-05-27T20:28:52Z

Commit messages need some work ;)

embray · 2015-05-27T20:35:56Z

Squashed to a single commit (since the followup commits were just for a couple cases I missed before I added the tests).

This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)

jaimefrio · 2015-05-28T01:59:19Z

Even if we have to stick with the current type check, I still think using PyMapping_GetItemString is better.

It doesn't look like there should be much of a performance hit:

PyMapping_GetItemString calls PyObject_GetItem, which calls the mp_subscript function of the tp_as_mapping member of the type, which calls the function that fetches the return object.
PyDict_GetItemString calls PyDict_GetItem, which calls the function that fetches the return object.

And it would let you get rid of that ugly, almost an exact duplicate of the previous one, extra branch at the bottom of the function.

embray · 2015-05-28T15:29:04Z

@jaimefrio By getting rid of that duplicate branch, do you mean changing all the if (PyDict_Check(obj)) to if (PyDict_Check(obj) || Py_TYPE(obj) == &PyDictProxy_Type) ?

(I suppose I could also include a macro for the latter check).

jaimefrio · 2015-05-28T15:47:00Z

Yes, that's what I had in mind.

charris · 2015-05-30T03:38:39Z

@jaimefrio This has been updated.
@embray Could you mention this enhancement in the 1.10 release notes? There is probably some dtype documentation that needs updating also.

jaimefrio · 2015-05-30T03:56:03Z

I don't think it has...

embray · 2015-06-01T11:44:27Z

I'll take a look--I still plan to update this with @jaimefrio's suggestion.

…ne wrinkle this revealed is that PyMapping_GetItemString sets a KeyError exception on lookup failure, while PyDict_GetItemString does not--this bug could have surfaced anyways if using the PyMapping method alongside the PyDict method.

embray · 2015-06-01T13:09:32Z

I've pushed an updated version of the patch using just PyMapping_GetItemString. One difference that appeared between it and PyDict_GetItemString is that the latter does not set an exception on lookup failure, while the former does (since it is considered equivalent to the syntax obj[key], and so goes through exception creation, etc.)

This could have been a problem in the original version of the patch too, since the error state wasn't been cleared for all failed dict lookups.

embray · 2015-06-01T13:10:45Z

I looked in the dtype docs for anything that this could conflict with, but didn't see anything. It seems like a bit much of a rare corner-case to be worth calling out explicitly (users will rarely be creating their own dictproxy objects). I think in this case it's best that there's just less surprise when it does work.

jaimefrio · 2015-06-01T13:24:32Z

numpy/core/src/multiarray/descriptor.c

@@ -1165,20 +1175,23 @@ _convert_from_dict(PyObject *obj, int align)
        }
        /* Set the itemsize */
        new->elsize = itemsize;
+    } else {


This would be easier to follow if this branch came first, as you did for 'aligned'.

jaimefrio · 2015-06-01T13:32:13Z

Almost there! ;-)

embray · 2015-06-01T13:56:36Z

I'll squash again before ready to merge, if desired.

jaimefrio · 2015-06-01T14:31:35Z

Squashing would be nice, it is ready to go now.

a dtype arraydescr_new allows passing in a dict to represent a multi-field dtype. However, the .fields attribute on such dtypes is returned as a dictproxy object. Therefore it is convenient, for copying existing multi-field dtypes, to be able to pass in a dictproxy object as well.

embray · 2015-06-01T15:17:31Z

Rebased on master and squashed.

ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype

jaimefrio · 2015-06-01T16:36:42Z

Thanks Erik!

embray · 2015-06-01T16:49:53Z

Great, thanks for reviewing and the quick merge.

This is a followup to PR numpy#5505, which didn't go quite far enough. This fixes two issues in particular: 1) The record class also needs an updated `__getitem__` that works analogously to its `__getattribute__` so that a nested record is returned as a `record` object and not a plain `np.void`. In other words the old behavior is: ```python >>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)], ... dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]), ('baz', int)]) >>> rec[0].bar (1, 1) >>> type(rec[0].bar) <class 'numpy.record'> >>> type(rec[0]['bar']) <type 'numpy.void'> ``` demonstrated inconsistency between `.bar` and `['bar']` on the record object. The new behavior is: ```python >>> type(rec[0]['bar']) <class 'numpy.record'> ``` 2) The second issue is more subtle. The fix to numpy#5505 used the `obj.dtype.descr` attribute to create a new dtype of type `record`. However, this does not recreate the correct type if the fields are not aligned. To demonstrate: ```python >>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)}) >>> dt.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) >>> dt.descr [('C', '|S5'), ('', '|V1'), ('D', '|S5')] >>> new_dt = np.dtype((np.record, dt.descr)) >>> new_dt dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')])) >>> new_dt.fields dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` Using the `fields` dict to construct the new type reconstructs the correct type with the correct offsets: ```python >>> new_dt2 = np.dtype((np.record, dt.fields)) >>> new_dt2.fields dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)}) ``` (Note: This is based on numpy#5920 for convenience, but I could decouple the changes if that's preferable.)

Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString was replaced by PyMapping_GetItemString which returns a new ref. Fixes numpy#6636

embray mentioned this pull request May 27, 2015

BUG: Further fixes to record and recarray getitem/getattr #5921

Merged

charris added 01 - Enhancement component: numpy._core labels May 27, 2015

embray force-pushed the descr-fields-dictproxy branch from d901ef3 to 4771f10 Compare May 27, 2015 20:35

jaimefrio reviewed Jun 1, 2015
View reviewed changes

embray force-pushed the descr-fields-dictproxy branch from 03a9e8b to 1acd14c Compare June 1, 2015 15:17

jaimefrio added a commit that referenced this pull request Jun 1, 2015

Merge pull request #5920 from embray/descr-fields-dictproxy

9e7a0b2

ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype

jaimefrio merged commit 9e7a0b2 into numpy:master Jun 1, 2015

embray deleted the descr-fields-dictproxy branch June 1, 2015 16:49

ahaldane mentioned this pull request Nov 6, 2015

memory leak in nested dtypes in numpy.recarray #6636

Closed

ahaldane mentioned this pull request Nov 6, 2015

BUG: Fix memleak in _convert_from_dict #6642

Merged

ahaldane added a commit to ahaldane/numpy that referenced this pull request Nov 6, 2015

BUG: Fix memleak in _convert_from_dict

81e50a3

Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString was replaced by PyMapping_GetItemString which returns a new ref. Fixes numpy#6636

charris pushed a commit to charris/numpy that referenced this pull request Nov 6, 2015

BUG: Fix memleak in _convert_from_dict

2cc1975

Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString was replaced by PyMapping_GetItemString which returns a new ref. Fixes numpy#6636

jaimefrio pushed a commit to jaimefrio/numpy that referenced this pull request Mar 22, 2016

BUG: Fix memleak in _convert_from_dict

f0442df

Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString was replaced by PyMapping_GetItemString which returns a new ref. Fixes numpy#6636

Uh oh!

ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype #5920

ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype #5920

Uh oh!

Conversation

embray commented May 27, 2015

Uh oh!

jaimefrio commented May 27, 2015

Uh oh!

embray commented May 27, 2015

Uh oh!

charris commented May 27, 2015

Uh oh!

embray commented May 27, 2015

Uh oh!

jaimefrio commented May 28, 2015

Uh oh!

embray commented May 28, 2015

Uh oh!

jaimefrio commented May 28, 2015

Uh oh!

charris commented May 30, 2015

Uh oh!

jaimefrio commented May 30, 2015

Uh oh!

embray commented Jun 1, 2015

Uh oh!

embray commented Jun 1, 2015

Uh oh!

embray commented Jun 1, 2015

Uh oh!

jaimefrio Jun 1, 2015

Choose a reason for hiding this comment

Uh oh!

embray Jun 1, 2015

Choose a reason for hiding this comment

Uh oh!

jaimefrio commented Jun 1, 2015

Uh oh!

embray commented Jun 1, 2015

Uh oh!

jaimefrio commented Jun 1, 2015

Uh oh!

embray commented Jun 1, 2015

Uh oh!

jaimefrio commented Jun 1, 2015

Uh oh!

embray commented Jun 1, 2015

Uh oh!

Uh oh!