-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Move dtype string functions to python #10602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT: Move dtype string functions to python #10602
Conversation
5a5426f
to
73dece0
Compare
""" Normalize byteorder to '<' or '>' """ | ||
# hack to obtain the native and swapped byte order characters | ||
swapped = np.dtype(int).newbyteorder('s') | ||
native = swapped.newbyteorder('s') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a better way to get hold of these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is np.little_endian
or sys.byteorder
, but then you have to decide '<' or '>' yourself.
from a list of the field names and dtypes with no additional | ||
dtype parameters. | ||
|
||
Duplicates the C `is_dtype_struct_simple_unaligned_layout` functio. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add this as dtype.ispacked
in another PR
} | ||
PyUString_ConcatAndDel(&ret, PyObject_Repr(title)); | ||
if (i != names_size - 1) { | ||
PyUString_ConcatAndDel(&ret, PyUString_FromString(",")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This kind of thing segfaults if PyUString_FromString(...)
returns NULL
(ie, we run out of memory). This type of thing what motivated the switch to python - getting this right everywhere is too hard and too verbose in C, and for no gain.
Any thoughts on this? |
Could you rebase this after the coverage integration? I would like to see that we are checking all the new code in tests. |
73dece0
to
73e328c
Compare
Good idea - done |
if dtype.fields is not None: | ||
return _struct_str(dtype, include_align=True) | ||
elif dtype.subdtype: | ||
return _subarray_str(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line not hit in tests, see coverage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test added
numpy/core/_dtype.py
Outdated
if byteorder == '=': | ||
return native.byteorder | ||
if byteorder == 's': | ||
return swapped.byteorder |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line not hit in tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if it's possible to hit - but this same test existed in the old C code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a TODO
Coverage seems pretty good, only two blocks (other than exceptions) not covered |
Doing proper error checking of string concatenation in python super tedious. As a results, we're not doing it, making us prone to exceptions turning into segfaults. This moves all of the __str__ and __repr__ functions, where speed is irrelevant, into np.core._dtype.
73e328c
to
fa616a8
Compare
@mattip: Coverage improved, I think |
I agree the this sort of thing should be coded in Python, I'd like to do more like this. Given that we have made a whole new module, I suppose we could use Cython at some point. @mattip Does PyPy speed up this sort of thing? @eric-wieser I suppose if this has a time effect in practice that it will show up in some of the benchmarks. I don't know that timing dtype creation by itself is useful. |
Thanks Eric. |
PyPy will JIT after ~1000 calls, if someone is calling repr(dtype) that many times I thing something is probably wrong with their code. |
This seems to have triggered a circular import problem when using the dtype-transer debug mode. (set The problem seems to be that debug tracing tries to print a dtype before the dtype object has been created, I'm still trying to see what happened. It involves >>> import numpy as np
Calculating dtype transfer from to
Traceback (most recent call last):
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/_dtype.py", line 23, in __repr__
arg_str = _construction_repr(dtype, include_align=False)
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/_dtype.py", line 77, in _construction_repr
return _scalar_str(dtype, short=short)
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/_dtype.py", line 81, in _scalar_str
byteorder = _byte_order_str(dtype)
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/_dtype.py", line 151, in _byte_order_str
swapped = np.dtype(int).newbyteorder('s')
AttributeError: module 'numpy' has no attribute 'dtype'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<frozen importlib._bootstrap>", line 200, in _lock_unlock_module
File "<frozen importlib._bootstrap>", line 163, in _get_module_lock
SystemError: <built-in function acquire_lock> returned a result with an error set
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/__init__.py", line 142, in <module>
from . import core
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/__init__.py", line 72, in <module>
from . import getlimits
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/getlimits.py", line 125, in <module>
tiny=_f16(2 ** -14))
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/getlimits.py", line 75, in __init__
self.epsilon = self.eps = float_to_float(kwargs.pop('eps'))
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/getlimits.py", line 70, in <lambda>
float_to_float = lambda v : _fr1(float_conv(v))
File "/home/allan/nmpy_dev/build/testenv/lib/python3.7/site-packages/numpy/core/getlimits.py", line 29, in _fr1
a = a.copy()
SystemError: <method 'copy' of 'numpy.ndarray' objects> returned a result with an error set |
@ahaldane Could you open an issue for this? |
else: | ||
return "'%sU%d'" % (byteorder, dtype.itemsize / 4) | ||
|
||
elif dtype.type == np.void: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this caught subclasses of void too. Do we actually want that behavior?
Fixes numpy#12206, which was a regression introduced by numpy#10602 Frankly I'd consider the results expected by `test_void_subclass_unsized` and `test_void_subclass_sized` undesirable, but they match the behavior in 1.12.x
Doing proper error checking of string concatenation in python super tedious.
As a results, we're not doing it, making us prone to exceptions turning into segfaults.
This moves all of the
__str__
and__repr__
functions, where speed is irrelevant, into np.core._dtype.Minimal refactors made during the conversion from C - mainly just using
str.join
instead of manual loops. I'm sure more could be done here, but now that's it's python it should be easier for new contributors to modify.CircleCI failures are because this is working off an earlier branch point