-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Add a float80 signature under valgrind. #21813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Without this fix, when running import tensorflow under valgrind 3.18: /opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py:492: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.float128'> True does not match any known type: falling back to type probe function machar = _get_machar(dtype) Traceback (most recent call last): File "/home/feyu_google_com/dtensor_bert_train.py", line 23, in <module> import tensorflow as tf File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module> from tensorflow.python.tools import module_util as _module_util File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 45, in <module> from tensorflow.python.feature_column import feature_column_lib as feature_column File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column_lib.py", line 18, in <module> from tensorflow.python.feature_column.feature_column import * File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column.py", line 143, in <module> from tensorflow.python.layers import base File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/layers/base.py", line 16, in <module> from tensorflow.python.keras.legacy_tf_layers import base File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/__init__.py", line 25, in <module> from tensorflow.python.keras import models File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/models.py", line 22, in <module> from tensorflow.python.keras.engine import functional File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/functional.py", line 32, in <module> from tensorflow.python.keras.engine import training as training_lib File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/training.py", line 54, in <module> from tensorflow.python.keras.saving import hdf5_format File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 37, in <module> import h5py File "/opt/conda/envs/py310/lib/python3.10/site-packages/h5py/__init__.py", line 45, in <module> from ._conv import register_converters as _register_converters, \ File "h5py/_conv.pyx", line 1, in init h5py._conv File "h5py/h5t.pyx", line 278, in init h5py.h5t File "h5py/h5t.pyx", line 273, in h5py.h5t._get_available_ftypes File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 485, in __new__ obj = object.__new__(cls)._init(dtype) File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 512, in _init self._str_smallest_normal = machar._str_smallest_normal.strip() AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'. Did you mean: 'smallest_normal'?
I always just ignore these issues to be honest. valgrind lacks proper longdouble support, so anything using longdouble is prone to failures. Now, that error should probably never happen, whether we find Maybe we can add a hack like this, but I think I would want at least a |
This fixes the missing attributes. I tested the warning and fix on valgrind itself. These attributes were added in numpygh-18536 but the fallback path was not checked there. Replaces numpygh-21813, although something more like it may make sense if it allows us to just delete MachAr completely.
I have opened gh-21815 as an alternative/replacement. That is the right thing in any case to avoid the error (that was always wrong). Whether it is enough for you, I am not quite certain. |
@@ -266,6 +266,8 @@ def _register_known_types(): | |||
tiny=tiny_f80) | |||
# float80, first 10 bytes containing actual storage | |||
_register_type(float80_ma, b'\xcd\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf') | |||
# The following signature is produced under valgrind. | |||
_register_type(float80_ma, b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For posterity (or if we go with this). If we do this, this should be registered as float64_ma
, because that is effectively the precision we expect if we see this signature.
The alternative looks good to me. My only (slight) concern is with the alternative fix the valgrind path and non-valgrind path appear to exercise two different code paths, which may be sort of surprise. However as you pointed out valgrind is already causing divergent behavior anyway (due to lack of long double). Let's close this one and go with your #21815. Thanks! |
This fixes the missing attributes. I tested the warning and fix on valgrind itself. These attributes were added in numpygh-18536 but the fallback path was not checked there. Replaces numpygh-21813, although something more like it may make sense if it allows us to just delete MachAr completely.
Without this fix, when running import tensorflow under valgrind 3.18:
I am not sure if this is exactly the right way to fix it, but it allowed my valgrind analysis to proceed.