Skip to content

Add a float80 signature under valgrind. #21813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

rainwoodman
Copy link
Contributor

@rainwoodman rainwoodman commented Jun 21, 2022

Without this fix, when running import tensorflow under valgrind 3.18:

/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py:492: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.float128'> True does not match any known type: falling back to type probe function
  machar = _get_machar(dtype)
Traceback (most recent call last):
  File "/home/feyu_google_com/dtensor_bert_train.py", line 23, in <module>
    import tensorflow as tf
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 45, in <module>
    from tensorflow.python.feature_column import feature_column_lib as feature_column
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column_lib.py", line 18, in <module>
    from tensorflow.python.feature_column.feature_column import *
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column.py", line 143, in <module>
    from tensorflow.python.layers import base
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/layers/base.py", line 16, in <module>
    from tensorflow.python.keras.legacy_tf_layers import base
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/__init__.py", line 25, in <module>
    from tensorflow.python.keras import models
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/models.py", line 22, in <module>
    from tensorflow.python.keras.engine import functional
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/functional.py", line 32, in <module>
    from tensorflow.python.keras.engine import training as training_lib
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/training.py", line 54, in <module>
    from tensorflow.python.keras.saving import hdf5_format
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 37, in <module>
    import h5py
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/h5py/__init__.py", line 45, in <module>
    from ._conv import register_converters as _register_converters, \
  File "h5py/_conv.pyx", line 1, in init h5py._conv
  File "h5py/h5t.pyx", line 278, in init h5py.h5t
  File "h5py/h5t.pyx", line 273, in h5py.h5t._get_available_ftypes
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 485, in __new__
    obj = object.__new__(cls)._init(dtype)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 512, in _init
    self._str_smallest_normal = machar._str_smallest_normal.strip()
AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'. Did you mean: 'smallest_normal'?

I am not sure if this is exactly the right way to fix it, but it allowed my valgrind analysis to proceed.

Without this fix, when running import tensorflow under valgrind 3.18:

/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py:492: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.float128'> True does not match any known type: falling back to type probe function
  machar = _get_machar(dtype)
Traceback (most recent call last):
  File "/home/feyu_google_com/dtensor_bert_train.py", line 23, in <module>
    import tensorflow as tf
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 45, in <module>
    from tensorflow.python.feature_column import feature_column_lib as feature_column
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column_lib.py", line 18, in <module>
    from tensorflow.python.feature_column.feature_column import *
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column.py", line 143, in <module>
    from tensorflow.python.layers import base
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/layers/base.py", line 16, in <module>
    from tensorflow.python.keras.legacy_tf_layers import base
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/__init__.py", line 25, in <module>
    from tensorflow.python.keras import models
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/models.py", line 22, in <module>
    from tensorflow.python.keras.engine import functional
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/functional.py", line 32, in <module>
    from tensorflow.python.keras.engine import training as training_lib
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/training.py", line 54, in <module>
    from tensorflow.python.keras.saving import hdf5_format
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 37, in <module>
    import h5py
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/h5py/__init__.py", line 45, in <module>
    from ._conv import register_converters as _register_converters, \
  File "h5py/_conv.pyx", line 1, in init h5py._conv
  File "h5py/h5t.pyx", line 278, in init h5py.h5t
  File "h5py/h5t.pyx", line 273, in h5py.h5t._get_available_ftypes
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 485, in __new__
    obj = object.__new__(cls)._init(dtype)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 512, in _init
    self._str_smallest_normal = machar._str_smallest_normal.strip()
AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'. Did you mean: 'smallest_normal'?
@seberg
Copy link
Member

seberg commented Jun 21, 2022

I always just ignore these issues to be honest. valgrind lacks proper longdouble support, so anything using longdouble is prone to failures.

Now, that error should probably never happen, whether we find longdouble or not, so that may be a small bug-fix waiting to be done.
However, the "magic" value added here is a the double value cast to long double. If that is picked up we know that floating point support is broken (at the very least strtold, which is used here).

Maybe we can add a hack like this, but I think I would want at least a UserWarning on it telling users that their longdouble support is broken.

seberg added a commit to seberg/numpy that referenced this pull request Jun 21, 2022
This fixes the missing attributes.  I tested the warning and fix
on valgrind itself.
These attributes were added in numpygh-18536 but the fallback path was
not checked there.

Replaces numpygh-21813, although something more like it may make sense
if it allows us to just delete MachAr completely.
@seberg
Copy link
Member

seberg commented Jun 21, 2022

I have opened gh-21815 as an alternative/replacement. That is the right thing in any case to avoid the error (that was always wrong). Whether it is enough for you, I am not quite certain.

@@ -266,6 +266,8 @@ def _register_known_types():
tiny=tiny_f80)
# float80, first 10 bytes containing actual storage
_register_type(float80_ma, b'\xcd\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf')
# The following signature is produced under valgrind.
_register_type(float80_ma, b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For posterity (or if we go with this). If we do this, this should be registered as float64_ma, because that is effectively the precision we expect if we see this signature.

@rainwoodman
Copy link
Contributor Author

The alternative looks good to me.
My particular situation is just the error at import numpy stopped me from obtaining the full valgrind log -- the model I am debugging doesn't use long double.

My only (slight) concern is with the alternative fix the valgrind path and non-valgrind path appear to exercise two different code paths, which may be sort of surprise. However as you pointed out valgrind is already causing divergent behavior anyway (due to lack of long double).

Let's close this one and go with your #21815. Thanks!

charris pushed a commit to charris/numpy that referenced this pull request Jun 28, 2022
This fixes the missing attributes.  I tested the warning and fix
on valgrind itself.
These attributes were added in numpygh-18536 but the fallback path was
not checked there.

Replaces numpygh-21813, although something more like it may make sense
if it allows us to just delete MachAr completely.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants