Skip to content

ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype #5920

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 1, 2015

Conversation

embray
Copy link
Contributor

@embray embray commented May 27, 2015

arraydescr_new allows passing in a dict to represent a multi-field
dtype. However, the .fields attribute on such dtypes is returned
as a dictproxy object. Therefore it is convenient, for copying
existing multi-field dtypes, to be able to pass in a dictproxy object
as well.

I have another PR I will post shortly that motivated this, and that
demonstrates an example of the (admittedly uncommon) case
where this is useful.

embray added a commit to embray/numpy that referenced this pull request May 27, 2015
This is a followup to PR numpy#5505, which didn't go quite far
enough.  This fixes two issues in particular:

1) The record class also needs an updated `__getitem__` that works
analogously to its `__getattribute__` so that a nested record is
returned as a `record` object and not a plain `np.void`.  In other
words the old behavior is:

```python
>>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)],
...       dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]),
('baz', int)])
>>> rec[0].bar
(1, 1)
>>> type(rec[0].bar)
<class 'numpy.record'>
>>> type(rec[0]['bar'])
<type 'numpy.void'>
```

demonstrated inconsistency between `.bar` and `['bar']` on the record
object.  The new behavior is:

```python
>>> type(rec[0]['bar'])
<class 'numpy.record'>
```

2) The second issue is more subtle.  The fix to numpy#5505 used the
`obj.dtype.descr` attribute to create a new dtype of type `record`.
However, this does not recreate the correct type if the fields are
not aligned.  To demonstrate:

```python
>>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)})
>>> dt.fields
dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)})
>>> dt.descr
[('C', '|S5'), ('', '|V1'), ('D', '|S5')]
>>> new_dt = np.dtype((np.record, dt.descr))
>>> new_dt
dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')]))
>>> new_dt.fields
dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D':
(dtype('S5'), 6)})
```

Using the `fields` dict to construct the new type reconstructs the
correct type with the correct offsets:

```python
>>> new_dt2 = np.dtype((np.record, dt.fields))
>>> new_dt2.fields
dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)})
```

(Note: This is based on numpy#5920 for convenience, but I could decouple
the changes if that's preferable.)
@jaimefrio
Copy link
Member

I'm probably missing some corner case, but... Why don't we simply rename _convert_from_dict to -convert_from_mapping and handle all of those directly through the PyMapping_* family of functions? Is there any object we don't want to see here that would slip through a PyMapping_Check?

@embray
Copy link
Contributor Author

embray commented May 27, 2015

I considered that at first, but I don't know what the performance implications would be for using the generic methods over the dict-specific functions (which is more common), though maybe since it's just type construction it would all be in the margins anyways. I could try it and see.

However, PyMapping_Check is unreliable (see for instance https://bugs.python.org/issue5945) and would not be useful trying to accept generic mapping-like objects. In fact, there isn't a great way in pure C to check if an object satisfies the mapping ABC--if nothing else use of PyDict_Check should be used first for the most common case, since that's quick and reliable.

@charris
Copy link
Member

charris commented May 27, 2015

Commit messages need some work ;)

@embray embray force-pushed the descr-fields-dictproxy branch from d901ef3 to 4771f10 Compare May 27, 2015 20:35
@embray
Copy link
Contributor Author

embray commented May 27, 2015

Squashed to a single commit (since the followup commits were just for a couple cases I missed before I added the tests).

embray added a commit to embray/numpy that referenced this pull request May 27, 2015
This is a followup to PR numpy#5505, which didn't go quite far
enough.  This fixes two issues in particular:

1) The record class also needs an updated `__getitem__` that works
analogously to its `__getattribute__` so that a nested record is
returned as a `record` object and not a plain `np.void`.  In other
words the old behavior is:

```python
>>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)],
...       dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]),
('baz', int)])
>>> rec[0].bar
(1, 1)
>>> type(rec[0].bar)
<class 'numpy.record'>
>>> type(rec[0]['bar'])
<type 'numpy.void'>
```

demonstrated inconsistency between `.bar` and `['bar']` on the record
object.  The new behavior is:

```python
>>> type(rec[0]['bar'])
<class 'numpy.record'>
```

2) The second issue is more subtle.  The fix to numpy#5505 used the
`obj.dtype.descr` attribute to create a new dtype of type `record`.
However, this does not recreate the correct type if the fields are
not aligned.  To demonstrate:

```python
>>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)})
>>> dt.fields
dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)})
>>> dt.descr
[('C', '|S5'), ('', '|V1'), ('D', '|S5')]
>>> new_dt = np.dtype((np.record, dt.descr))
>>> new_dt
dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')]))
>>> new_dt.fields
dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D':
(dtype('S5'), 6)})
```

Using the `fields` dict to construct the new type reconstructs the
correct type with the correct offsets:

```python
>>> new_dt2 = np.dtype((np.record, dt.fields))
>>> new_dt2.fields
dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)})
```

(Note: This is based on numpy#5920 for convenience, but I could decouple
the changes if that's preferable.)
@jaimefrio
Copy link
Member

Even if we have to stick with the current type check, I still think using PyMapping_GetItemString is better.

It doesn't look like there should be much of a performance hit:

  • PyMapping_GetItemString calls PyObject_GetItem, which calls the mp_subscript function of the tp_as_mapping member of the type, which calls the function that fetches the return object.
  • PyDict_GetItemString calls PyDict_GetItem, which calls the function that fetches the return object.

And it would let you get rid of that ugly, almost an exact duplicate of the previous one, extra branch at the bottom of the function.

@embray
Copy link
Contributor Author

embray commented May 28, 2015

@jaimefrio By getting rid of that duplicate branch, do you mean changing all the if (PyDict_Check(obj)) to if (PyDict_Check(obj) || Py_TYPE(obj) == &PyDictProxy_Type) ?

(I suppose I could also include a macro for the latter check).

@jaimefrio
Copy link
Member

Yes, that's what I had in mind.

@charris
Copy link
Member

charris commented May 30, 2015

@jaimefrio This has been updated.
@embray Could you mention this enhancement in the 1.10 release notes? There is probably some dtype documentation that needs updating also.

@jaimefrio
Copy link
Member

I don't think it has...

@embray
Copy link
Contributor Author

embray commented Jun 1, 2015

I'll take a look--I still plan to update this with @jaimefrio's suggestion.

embray added a commit to embray/numpy that referenced this pull request Jun 1, 2015
…ne wrinkle this revealed is that PyMapping_GetItemString sets a KeyError exception on lookup failure, while PyDict_GetItemString does not--this bug could have surfaced anyways if using the PyMapping method alongside the PyDict method.
@embray
Copy link
Contributor Author

embray commented Jun 1, 2015

I've pushed an updated version of the patch using just PyMapping_GetItemString. One difference that appeared between it and PyDict_GetItemString is that the latter does not set an exception on lookup failure, while the former does (since it is considered equivalent to the syntax obj[key], and so goes through exception creation, etc.)

This could have been a problem in the original version of the patch too, since the error state wasn't been cleared for all failed dict lookups.

@embray
Copy link
Contributor Author

embray commented Jun 1, 2015

I looked in the dtype docs for anything that this could conflict with, but didn't see anything. It seems like a bit much of a rare corner-case to be worth calling out explicitly (users will rarely be creating their own dictproxy objects). I think in this case it's best that there's just less surprise when it does work.

@@ -1165,20 +1175,23 @@ _convert_from_dict(PyObject *obj, int align)
}
/* Set the itemsize */
new->elsize = itemsize;
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be easier to follow if this branch came first, as you did for 'aligned'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

@jaimefrio
Copy link
Member

Almost there! ;-)

@embray
Copy link
Contributor Author

embray commented Jun 1, 2015

I'll squash again before ready to merge, if desired.

@jaimefrio
Copy link
Member

Squashing would be nice, it is ready to go now.

a dtype

arraydescr_new allows passing in a dict to represent a multi-field
dtype.  However, the .fields attribute on such dtypes is returned
as a dictproxy object.  Therefore it is convenient, for copying
existing multi-field dtypes, to be able to pass in a dictproxy object
as well.
@embray embray force-pushed the descr-fields-dictproxy branch from 03a9e8b to 1acd14c Compare June 1, 2015 15:17
@embray
Copy link
Contributor Author

embray commented Jun 1, 2015

Rebased on master and squashed.

jaimefrio added a commit that referenced this pull request Jun 1, 2015
ENH: Allow dictproxy objects to be used in place of dicts when creating a dtype
@jaimefrio jaimefrio merged commit 9e7a0b2 into numpy:master Jun 1, 2015
@jaimefrio
Copy link
Member

Thanks Erik!

@embray embray deleted the descr-fields-dictproxy branch June 1, 2015 16:49
@embray
Copy link
Contributor Author

embray commented Jun 1, 2015

Great, thanks for reviewing and the quick merge.

embray added a commit to embray/numpy that referenced this pull request Jun 1, 2015
This is a followup to PR numpy#5505, which didn't go quite far
enough.  This fixes two issues in particular:

1) The record class also needs an updated `__getitem__` that works
analogously to its `__getattribute__` so that a nested record is
returned as a `record` object and not a plain `np.void`.  In other
words the old behavior is:

```python
>>> rec = np.rec.array([('abc ', (1,1), 1), ('abc', (2,3), 1)],
...       dtype=[('foo', 'S4'), ('bar', [('A', int), ('B', int)]),
('baz', int)])
>>> rec[0].bar
(1, 1)
>>> type(rec[0].bar)
<class 'numpy.record'>
>>> type(rec[0]['bar'])
<type 'numpy.void'>
```

demonstrated inconsistency between `.bar` and `['bar']` on the record
object.  The new behavior is:

```python
>>> type(rec[0]['bar'])
<class 'numpy.record'>
```

2) The second issue is more subtle.  The fix to numpy#5505 used the
`obj.dtype.descr` attribute to create a new dtype of type `record`.
However, this does not recreate the correct type if the fields are
not aligned.  To demonstrate:

```python
>>> dt = np.dtype({'C': ('S5', 0), 'D': ('S5', 6)})
>>> dt.fields
dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)})
>>> dt.descr
[('C', '|S5'), ('', '|V1'), ('D', '|S5')]
>>> new_dt = np.dtype((np.record, dt.descr))
>>> new_dt
dtype((numpy.record, [('C', 'S5'), ('f1', 'V1'), ('D', 'S5')]))
>>> new_dt.fields
dict_proxy({'f1': (dtype('V1'), 5), 'C': (dtype('S5'), 0), 'D':
(dtype('S5'), 6)})
```

Using the `fields` dict to construct the new type reconstructs the
correct type with the correct offsets:

```python
>>> new_dt2 = np.dtype((np.record, dt.fields))
>>> new_dt2.fields
dict_proxy({'C': (dtype('S5'), 0), 'D': (dtype('S5'), 6)})
```

(Note: This is based on numpy#5920 for convenience, but I could decouple
the changes if that's preferable.)
ahaldane added a commit to ahaldane/numpy that referenced this pull request Nov 6, 2015
Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString
was replaced by PyMapping_GetItemString which returns a new ref.

Fixes numpy#6636
charris pushed a commit to charris/numpy that referenced this pull request Nov 6, 2015
Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString
was replaced by PyMapping_GetItemString which returns a new ref.

Fixes numpy#6636
jaimefrio pushed a commit to jaimefrio/numpy that referenced this pull request Mar 22, 2016
Fixes a memleak introduced in numpy#5920, where PyDict_GetItemString
was replaced by PyMapping_GetItemString which returns a new ref.

Fixes numpy#6636
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants