Skip to content

MAINT: Move dtype string functions to python #10602

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 11, 2018

Conversation

eric-wieser
Copy link
Member

@eric-wieser eric-wieser commented Feb 16, 2018

Doing proper error checking of string concatenation in python super tedious.
As a results, we're not doing it, making us prone to exceptions turning into segfaults.
This moves all of the __str__ and __repr__ functions, where speed is irrelevant, into np.core._dtype.


Minimal refactors made during the conversion from C - mainly just using str.join instead of manual loops. I'm sure more could be done here, but now that's it's python it should be easier for new contributors to modify.

CircleCI failures are because this is working off an earlier branch point

""" Normalize byteorder to '<' or '>' """
# hack to obtain the native and swapped byte order characters
swapped = np.dtype(int).newbyteorder('s')
native = swapped.newbyteorder('s')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better way to get hold of these?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is np.little_endianor sys.byteorder, but then you have to decide '<' or '>' yourself.

from a list of the field names and dtypes with no additional
dtype parameters.

Duplicates the C `is_dtype_struct_simple_unaligned_layout` functio.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add this as dtype.ispacked in another PR

}
PyUString_ConcatAndDel(&ret, PyObject_Repr(title));
if (i != names_size - 1) {
PyUString_ConcatAndDel(&ret, PyUString_FromString(","));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of thing segfaults if PyUString_FromString(...) returns NULL (ie, we run out of memory). This type of thing what motivated the switch to python - getting this right everywhere is too hard and too verbose in C, and for no gain.

@eric-wieser
Copy link
Member Author

Any thoughts on this?

@mattip
Copy link
Member

mattip commented Aug 19, 2018

Could you rebase this after the coverage integration? I would like to see that we are checking all the new code in tests.

@eric-wieser eric-wieser force-pushed the dtype-__str__-__repr__-in-python branch from 73dece0 to 73e328c Compare August 19, 2018 18:26
@eric-wieser
Copy link
Member Author

Good idea - done

if dtype.fields is not None:
return _struct_str(dtype, include_align=True)
elif dtype.subdtype:
return _subarray_str(dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line not hit in tests, see coverage

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added

if byteorder == '=':
return native.byteorder
if byteorder == 's':
return swapped.byteorder
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line not hit in tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it's possible to hit - but this same test existed in the old C code.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a TODO

@mattip
Copy link
Member

mattip commented Aug 19, 2018

Coverage seems pretty good, only two blocks (other than exceptions) not covered

Doing proper error checking of string concatenation in python super tedious.
As a results, we're not doing it, making us prone to exceptions turning into segfaults.
This moves all of the __str__ and __repr__ functions, where speed is irrelevant, into np.core._dtype.
@eric-wieser eric-wieser force-pushed the dtype-__str__-__repr__-in-python branch from 73e328c to fa616a8 Compare August 20, 2018 04:40
@eric-wieser
Copy link
Member Author

@mattip: Coverage improved, I think

@charris
Copy link
Member

charris commented Sep 11, 2018

I agree the this sort of thing should be coded in Python, I'd like to do more like this. Given that we have made a whole new module, I suppose we could use Cython at some point. @mattip Does PyPy speed up this sort of thing? @eric-wieser I suppose if this has a time effect in practice that it will show up in some of the benchmarks. I don't know that timing dtype creation by itself is useful.

@charris charris merged commit 371cc4d into numpy:master Sep 11, 2018
@charris
Copy link
Member

charris commented Sep 11, 2018

Thanks Eric.

@mattip
Copy link
Member

mattip commented Sep 11, 2018

PyPy will JIT after ~1000 calls, if someone is calling repr(dtype) that many times I thing something is probably wrong with their code.

@ahaldane
Copy link
Member

This seems to have triggered a circular import problem when using the dtype-transer debug mode. (set NPY_DT_DBG_TRACING to 1 at the top of dtype_transfer.c).

The problem seems to be that debug tracing tries to print a dtype before the dtype object has been created, I'm still trying to see what happened. It involves getlimits.py which has been the source of circular import issues in the past.

>>> import numpy as np
Calculating dtype transfer from  to 
Traceback (most recent call last):
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/_dtype.py", line 23, in __repr__
    arg_str = _construction_repr(dtype, include_align=False)
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/_dtype.py", line 77, in _construction_repr
    return _scalar_str(dtype, short=short)
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/_dtype.py", line 81, in _scalar_str
    byteorder = _byte_order_str(dtype)
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/_dtype.py", line 151, in _byte_order_str
    swapped = np.dtype(int).newbyteorder('s')
AttributeError: module 'numpy' has no attribute 'dtype'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 200, in _lock_unlock_module
  File "<frozen importlib._bootstrap>", line 163, in _get_module_lock
SystemError: <built-in function acquire_lock> returned a result with an error set

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/__init__.py", line 142, in <module>
    from . import core
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/__init__.py", line 72, in <module>
    from . import getlimits
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/getlimits.py", line 125, in <module>
    tiny=_f16(2 ** -14))
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/getlimits.py", line 75, in __init__
    self.epsilon = self.eps = float_to_float(kwargs.pop('eps'))
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/getlimits.py", line 70, in <lambda>
    float_to_float = lambda v : _fr1(float_conv(v))
  File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/getlimits.py", line 29, in _fr1
    a = a.copy()
SystemError: <method 'copy' of 'numpy.ndarray' objects> returned a result with an error set

@charris
Copy link
Member

charris commented Sep 30, 2018

@ahaldane Could you open an issue for this?

else:
return "'%sU%d'" % (byteorder, dtype.itemsize / 4)

elif dtype.type == np.void:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously this caught subclasses of void too. Do we actually want that behavior?

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Oct 22, 2018
Fixes numpy#12206, which was a regression introduced by numpy#10602

Frankly I'd consider the results expected by `test_void_subclass_unsized` and `test_void_subclass_sized` undesirable, but they match the behavior in 1.12.x
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants