Description
In a number of places in tests, numpy.testing.assert_allclose
is used with default absolute tolerance parameter which is atol=0
.
This means in particular that np.testing.assert_allclose(0, 1e-16)
will fail. More context for the reasons behind this choice can be found in numpy/numpy#3183 (comment) and PEP485, which can be summed up with,
If, for a given use case, a user needs to compare to zero, the test will be guaranteed to fail the first time, and the user can select an appropriate value.
The issue is that occasionally, the tests will pass, and but then may fail on some other platform.
For instance, this test in estimator_checks.py
passes CI on master, but then randomly fail for osx: [1], [2] correspond to the same commit, one for the PR, one in the forked repo (this test fails with the bellow message and not necessary for the same Python versions) ,
```
________________________ test_non_meta_estimators[1045] ________________________
args = ('GaussianProcess', GaussianProcess(beta0=None, corr='squared_exponential', normalize=True,
...orage_mode='full', theta0=0.1, thetaL=None, thetaU=None,
verbose=False))
[...]
> assert_allclose(y_pred.ravel(), y_pred_2d.ravel())
E AssertionError:
E Not equal to tolerance rtol=1e-07, atol=0
E
E (mismatch 40.0%)
E x: array([ -1.089129e-13, 1.000000e+00, 2.000000e+00, 1.272316e-13,
E 1.000000e+00, 2.000000e+00, 5.051515e-14, 1.000000e+00,
E 2.000000e+00, 9.214851e-15])
E y: array([ -1.086908e-13, 1.000000e+00, 2.000000e+00, 1.155742e-13,
E 1.000000e+00, 2.000000e+00, 5.062617e-14, 1.000000e+00,
E 2.000000e+00, 9.325873e-15])
```
- this test in
test_gradient_boosting_loss_functions.py
compares floats to 0 withatol=0
but doesn't fail. - Test - sklearn/cluster/tests/test_mean_shift.py::test_estimate_bandwidth_1sample fails on ppc64le platform #10561 is another manifestation of this that fails on ppc64le
When atol
is used, it not very consistent.
As to sklearn.utils.testing.assert_allclose_dense_sparse
it has by default atol=1e-9
and not 0.
While the necessary absolute tolerance is test dependent, it might still be useful to
- have a default value (e.g. 1e-9) when it's needed (e.g.
DEFAULT_ATOL
insklearn.utils.testing
) , except for the cases when it has to be increased for specific reasons. - use
atol
when it's definitely reasonable to do so (e.g. in probability equalities) - make
sklearn.utils.testing.assert_allclose_dense_sparse
andsklearn.utils.testing.assert_allclose
have the same defaultatol
. - Check that we don't have floating point equalities with 0.0 even if CI tests passes ([1], [2], [3] ..)
This might help improving the numerical stability of tests and prevent some of the tests failures on less common platforms cc @yarikoptic
What do you think?