⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #24221

scikit-learn-bot · 2022-08-22T02:54:32Z

CI failed on Linux_nogil.pylatest_pip_nogil (Aug 22, 2022)

test_kernel_gradient[kernel11]

lesteve · 2022-08-22T08:20:13Z

I can not reproduce this one locally, let's see if it happens consistently in the CI ...

ogrisel · 2022-08-22T10:13:51Z

Note that this test also has @fails_if_pypy (without any comment unfortunately on the what or the why) so it might trigger some a low-level quirkiness.

For the record, here it fails only with:

kernel = 1.41**2 * Matern(length_scale=[0.5, 2], nu=0.5)

ogrisel · 2022-08-22T10:16:41Z

I will manually retrigger this CI build in Azure to see if it's deterministic or not.

Here is the traceback of the original failure:

________________________ test_kernel_gradient[kernel11] ________________________
[gw1] linux -- Python 3.9.10 /home/vsts/work/1/s/testvenv/bin/python

kernel = 1.41**2 * Matern(length_scale=[0.5, 2], nu=0.5)

    @fails_if_pypy
    @pytest.mark.parametrize("kernel", kernels)
    def test_kernel_gradient(kernel):
        # Compare analytic and numeric gradient of kernels.
        K, K_gradient = kernel(X, eval_gradient=True)
    
        assert K_gradient.shape[0] == X.shape[0]
        assert K_gradient.shape[1] == X.shape[0]
        assert K_gradient.shape[2] == kernel.theta.shape[0]
    
        def eval_kernel_for_theta(theta):
            kernel_clone = kernel.clone_with_theta(theta)
            K = kernel_clone(X, eval_gradient=False)
            return K
    
        K_gradient_approx = _approx_fprime(kernel.theta, eval_kernel_for_theta, 1e-10)
    
>       assert_almost_equal(K_gradient, K_gradient_approx, 4)
E       AssertionError: 
E       Arrays are not almost equal to 4 decimals
E       
E       Mismatched elements: 8 / 75 (10.7%)
E       Max absolute difference: 71.55003086
E       Max relative difference: 0.67469822
E        x: array([[[2.0000e+000, 1.3853e-309, 9.3709e-310],
        [2.38801479e-01, 4.98647235e-01, 8.87095952e-03],
        [2.00000017e+00, 0.00000000e+00, 0.00000000e+00]]])
eval_kernel_for_theta = <function test_kernel_gradient.<locals>.eval_kernel_for_theta at 0x4c915408d50>
kernel     = 1.41**2 * Matern(length_scale=[0.5, 2], nu=0.5)

scikit-learn-bot · 2022-08-22T10:43:11Z

CI is no longer failing! ✅

Successful run on Aug 22, 2022

ogrisel · 2022-08-22T14:04:29Z

Ok, so apparently this not deterministic. It could be a race condition of some sort though but given that it's not easy to reproduce, I think we can close for now. If it shows up again we will give it a higher priority to try to investigate the root cause.

lesteve · 2022-08-23T11:25:47Z

I looked a bit more into this one, I can reproduce the failure with pypy. We are using code like:

result = np.divide(numerator, denominator, where=denominator != 0)

result[denominator == 0] values are undefined when using this. For CPython it seems like the np.empty allocated for the return value reuses the temporary array created for denominator != 0 in the where argument so that result[denominator == 0] only contains 0.

A snippet to reproduce a similar behaviour:

import numpy as np

numerator = np.array([1., 1., 1.])
denominator = np.array([0., 1., 1.])
where = denominator != 0

for i in range(5):
    # this temporary variable impacts the values for denominator == 0 ...
    # probably by reusing the memory :-(
    tmp = numerator * (i + denominator)
    print(f"tmp address: {tmp.__array_interface__['data'][0]}")
    del tmp
    divide_result = np.divide(numerator, denominator, where=where)
    print(f"div address: {divide_result.__array_interface__['data'][0]}")
    print(f"{divide_result}")

Output with CPython:

tmp address: 94334388189392
div address: 94334388189392
[0. 1. 1.]
tmp address: 94334388576960
div address: 94334388576960
[1. 1. 1.]
tmp address: 94334388189392
div address: 94334388189392
[2. 1. 1.]
tmp address: 94334388576960
div address: 94334388576960
[3. 1. 1.]
tmp address: 94334388189392
div address: 94334388189392
[4. 1. 1.]

Output with pypy:

tmp address: 94000129327360
div address: 94000126685472
[4.64422333e-310 1.00000000e+000 1.00000000e+000]
tmp address: 94000126688656
div address: 94000129178512
[4.64422345e-310 1.00000000e+000 1.00000000e+000]
tmp address: 94000129180688
div address: 94000129179344
[4.64422345e-310 1.00000000e+000 1.00000000e+000]
tmp address: 94000135025200
div address: 94000129181024
[4.64422374e-310 1.00000000e+000 1.00000000e+000]
tmp address: 94000127349824
div address: 94000135025536
[4.64422336e-310 1.00000000e+000 1.00000000e+000]

lesteve · 2022-08-24T09:53:16Z

Opened #24245 to fix the problem for pypy and probably the intermittent nogil one too.

github-actions bot added the Needs Triage Issue requires triage label Aug 22, 2022

lesteve added Build / CI and removed Needs Triage Issue requires triage labels Aug 22, 2022

ogrisel closed this as completed Aug 22, 2022

lesteve mentioned this issue Aug 24, 2022

FIX np.divide undefined behaviour with where in gaussian processes #24245

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #24221

⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #24221

scikit-learn-bot commented Aug 22, 2022

lesteve commented Aug 22, 2022

ogrisel commented Aug 22, 2022 •

edited

Loading

ogrisel commented Aug 22, 2022 •

edited

Loading

scikit-learn-bot commented Aug 22, 2022

ogrisel commented Aug 22, 2022

lesteve commented Aug 23, 2022

lesteve commented Aug 24, 2022

⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #24221

⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #24221

Comments

scikit-learn-bot commented Aug 22, 2022

lesteve commented Aug 22, 2022

ogrisel commented Aug 22, 2022 • edited Loading

ogrisel commented Aug 22, 2022 • edited Loading

scikit-learn-bot commented Aug 22, 2022

CI is no longer failing! ✅

ogrisel commented Aug 22, 2022

lesteve commented Aug 23, 2022

lesteve commented Aug 24, 2022

ogrisel commented Aug 22, 2022 •

edited

Loading

ogrisel commented Aug 22, 2022 •

edited

Loading