Add a float80 signature under valgrind. #21813

rainwoodman · 2022-06-21T19:42:47Z

Without this fix, when running import tensorflow under valgrind 3.18:

/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py:492: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.float128'> True does not match any known type: falling back to type probe function
  machar = _get_machar(dtype)
Traceback (most recent call last):
  File "/home/feyu_google_com/dtensor_bert_train.py", line 23, in <module>
    import tensorflow as tf
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module>
    from tensorflow.python.tools import module_util as _module_util
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 45, in <module>
    from tensorflow.python.feature_column import feature_column_lib as feature_column
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column_lib.py", line 18, in <module>
    from tensorflow.python.feature_column.feature_column import *
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column.py", line 143, in <module>
    from tensorflow.python.layers import base
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/layers/base.py", line 16, in <module>
    from tensorflow.python.keras.legacy_tf_layers import base
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/__init__.py", line 25, in <module>
    from tensorflow.python.keras import models
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/models.py", line 22, in <module>
    from tensorflow.python.keras.engine import functional
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/functional.py", line 32, in <module>
    from tensorflow.python.keras.engine import training as training_lib
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/training.py", line 54, in <module>
    from tensorflow.python.keras.saving import hdf5_format
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 37, in <module>
    import h5py
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/h5py/__init__.py", line 45, in <module>
    from ._conv import register_converters as _register_converters, \
  File "h5py/_conv.pyx", line 1, in init h5py._conv
  File "h5py/h5t.pyx", line 278, in init h5py.h5t
  File "h5py/h5t.pyx", line 273, in h5py.h5t._get_available_ftypes
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 485, in __new__
    obj = object.__new__(cls)._init(dtype)
  File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 512, in _init
    self._str_smallest_normal = machar._str_smallest_normal.strip()
AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'. Did you mean: 'smallest_normal'?

I am not sure if this is exactly the right way to fix it, but it allowed my valgrind analysis to proceed.

Without this fix, when running import tensorflow under valgrind 3.18: /opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py:492: UserWarning: Signature b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf\x00\x00\x00\x00\x00\x00' for <class 'numpy.float128'> True does not match any known type: falling back to type probe function machar = _get_machar(dtype) Traceback (most recent call last): File "/home/feyu_google_com/dtensor_bert_train.py", line 23, in <module> import tensorflow as tf File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/__init__.py", line 37, in <module> from tensorflow.python.tools import module_util as _module_util File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/__init__.py", line 45, in <module> from tensorflow.python.feature_column import feature_column_lib as feature_column File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column_lib.py", line 18, in <module> from tensorflow.python.feature_column.feature_column import * File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/feature_column/feature_column.py", line 143, in <module> from tensorflow.python.layers import base File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/layers/base.py", line 16, in <module> from tensorflow.python.keras.legacy_tf_layers import base File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/__init__.py", line 25, in <module> from tensorflow.python.keras import models File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/models.py", line 22, in <module> from tensorflow.python.keras.engine import functional File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/functional.py", line 32, in <module> from tensorflow.python.keras.engine import training as training_lib File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/engine/training.py", line 54, in <module> from tensorflow.python.keras.saving import hdf5_format File "/opt/conda/envs/py310/lib/python3.10/site-packages/tensorflow/python/keras/saving/hdf5_format.py", line 37, in <module> import h5py File "/opt/conda/envs/py310/lib/python3.10/site-packages/h5py/__init__.py", line 45, in <module> from ._conv import register_converters as _register_converters, \ File "h5py/_conv.pyx", line 1, in init h5py._conv File "h5py/h5t.pyx", line 278, in init h5py.h5t File "h5py/h5t.pyx", line 273, in h5py.h5t._get_available_ftypes File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 485, in __new__ obj = object.__new__(cls)._init(dtype) File "/opt/conda/envs/py310/lib/python3.10/site-packages/numpy/core/getlimits.py", line 512, in _init self._str_smallest_normal = machar._str_smallest_normal.strip() AttributeError: 'MachAr' object has no attribute '_str_smallest_normal'. Did you mean: 'smallest_normal'?

seberg · 2022-06-21T20:15:33Z

I always just ignore these issues to be honest. valgrind lacks proper longdouble support, so anything using longdouble is prone to failures.

Now, that error should probably never happen, whether we find longdouble or not, so that may be a small bug-fix waiting to be done.
However, the "magic" value added here is a the double value cast to long double. If that is picked up we know that floating point support is broken (at the very least strtold, which is used here).

Maybe we can add a hack like this, but I think I would want at least a UserWarning on it telling users that their longdouble support is broken.

This fixes the missing attributes. I tested the warning and fix on valgrind itself. These attributes were added in numpygh-18536 but the fallback path was not checked there. Replaces numpygh-21813, although something more like it may make sense if it allows us to just delete MachAr completely.

seberg · 2022-06-21T20:49:42Z

I have opened gh-21815 as an alternative/replacement. That is the right thing in any case to avoid the error (that was always wrong). Whether it is enough for you, I am not quite certain.

seberg · 2022-06-21T20:59:27Z

numpy/core/getlimits.py

@@ -266,6 +266,8 @@ def _register_known_types():
                            tiny=tiny_f80)
    # float80, first 10 bytes containing actual storage
    _register_type(float80_ma, b'\xcd\xcc\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf')
+    # The following signature is produced under valgrind.
+    _register_type(float80_ma, b'\x00\xd0\xcc\xcc\xcc\xcc\xcc\xcc\xfb\xbf')


For posterity (or if we go with this). If we do this, this should be registered as float64_ma, because that is effectively the precision we expect if we see this signature.

rainwoodman · 2022-06-22T18:57:22Z

The alternative looks good to me.
My particular situation is just the error at import numpy stopped me from obtaining the full valgrind log -- the model I am debugging doesn't use long double.

My only (slight) concern is with the alternative fix the valgrind path and non-valgrind path appear to exercise two different code paths, which may be sort of surprise. However as you pointed out valgrind is already causing divergent behavior anyway (due to lack of long double).

Let's close this one and go with your #21815. Thanks!

This fixes the missing attributes. I tested the warning and fix on valgrind itself. These attributes were added in numpygh-18536 but the fallback path was not checked there. Replaces numpygh-21813, although something more like it may make sense if it allows us to just delete MachAr completely.

seberg mentioned this pull request Jun 21, 2022

BUG: Fix discovered MachAr (still used within valgrind) #21815

Merged

seberg reviewed Jun 21, 2022

View reviewed changes

rainwoodman closed this Jun 22, 2022

seberg mentioned this pull request Jun 22, 2022

BUG: AttributeError: MachAr object has no attribute _str_smallest_normal #21827

Closed

charris mentioned this pull request Jun 28, 2022

BUG: Fix discovered MachAr (still used within valgrind) #21866

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add a float80 signature under valgrind. #21813

Add a float80 signature under valgrind. #21813

Uh oh!

rainwoodman commented Jun 21, 2022 •

edited

Loading

Uh oh!

seberg commented Jun 21, 2022

Uh oh!

seberg commented Jun 21, 2022

Uh oh!

seberg Jun 21, 2022

Uh oh!

rainwoodman commented Jun 22, 2022

Uh oh!

Uh oh!

Uh oh!

Add a float80 signature under valgrind. #21813

Add a float80 signature under valgrind. #21813

Uh oh!

Conversation

rainwoodman commented Jun 21, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

seberg commented Jun 21, 2022

Uh oh!

seberg commented Jun 21, 2022

Uh oh!

seberg Jun 21, 2022

Choose a reason for hiding this comment

Uh oh!

rainwoodman commented Jun 22, 2022

Uh oh!

Uh oh!

rainwoodman commented Jun 21, 2022 •

edited

Loading