Skip to content

⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #24221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scikit-learn-bot opened this issue Aug 22, 2022 · 7 comments
Closed

⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #24221

scikit-learn-bot opened this issue Aug 22, 2022 · 7 comments

Comments

@scikit-learn-bot
Copy link
Contributor

CI failed on Linux_nogil.pylatest_pip_nogil (Aug 22, 2022)

  • test_kernel_gradient[kernel11]
@github-actions github-actions bot added the Needs Triage Issue requires triage label Aug 22, 2022
@lesteve lesteve added Build / CI and removed Needs Triage Issue requires triage labels Aug 22, 2022
@lesteve
Copy link
Member

lesteve commented Aug 22, 2022

I can not reproduce this one locally, let's see if it happens consistently in the CI ...

@ogrisel
Copy link
Member

ogrisel commented Aug 22, 2022

Note that this test also has @fails_if_pypy (without any comment unfortunately on the what or the why) so it might trigger some a low-level quirkiness.

For the record, here it fails only with:

kernel = 1.41**2 * Matern(length_scale=[0.5, 2], nu=0.5)

@ogrisel
Copy link
Member

ogrisel commented Aug 22, 2022

I will manually retrigger this CI build in Azure to see if it's deterministic or not.

Here is the traceback of the original failure:

________________________ test_kernel_gradient[kernel11] ________________________
[gw1] linux -- Python 3.9.10 /home/vsts/work/1/s/testvenv/bin/python

kernel = 1.41**2 * Matern(length_scale=[0.5, 2], nu=0.5)

    @fails_if_pypy
    @pytest.mark.parametrize("kernel", kernels)
    def test_kernel_gradient(kernel):
        # Compare analytic and numeric gradient of kernels.
        K, K_gradient = kernel(X, eval_gradient=True)
    
        assert K_gradient.shape[0] == X.shape[0]
        assert K_gradient.shape[1] == X.shape[0]
        assert K_gradient.shape[2] == kernel.theta.shape[0]
    
        def eval_kernel_for_theta(theta):
            kernel_clone = kernel.clone_with_theta(theta)
            K = kernel_clone(X, eval_gradient=False)
            return K
    
        K_gradient_approx = _approx_fprime(kernel.theta, eval_kernel_for_theta, 1e-10)
    
>       assert_almost_equal(K_gradient, K_gradient_approx, 4)
E       AssertionError: 
E       Arrays are not almost equal to 4 decimals
E       
E       Mismatched elements: 8 / 75 (10.7%)
E       Max absolute difference: 71.55003086
E       Max relative difference: 0.67469822
E        x: array([[[2.0000e+000, 1.3853e-309, 9.3709e-310],
        [2.38801479e-01, 4.98647235e-01, 8.87095952e-03],
        [2.00000017e+00, 0.00000000e+00, 0.00000000e+00]]])
eval_kernel_for_theta = <function test_kernel_gradient.<locals>.eval_kernel_for_theta at 0x4c915408d50>
kernel     = 1.41**2 * Matern(length_scale=[0.5, 2], nu=0.5)

@scikit-learn-bot
Copy link
Contributor Author

CI is no longer failing! ✅

Successful run on Aug 22, 2022

@ogrisel
Copy link
Member

ogrisel commented Aug 22, 2022

Ok, so apparently this not deterministic. It could be a race condition of some sort though but given that it's not easy to reproduce, I think we can close for now. If it shows up again we will give it a higher priority to try to investigate the root cause.

@ogrisel ogrisel closed this as completed Aug 22, 2022
@lesteve
Copy link
Member

lesteve commented Aug 23, 2022

I looked a bit more into this one, I can reproduce the failure with pypy. We are using code like:

result = np.divide(numerator, denominator, where=denominator != 0)

result[denominator == 0] values are undefined when using this. For CPython it seems like the np.empty allocated for the return value reuses the temporary array created for denominator != 0 in the where argument so that result[denominator == 0] only contains 0.

A snippet to reproduce a similar behaviour:

import numpy as np

numerator = np.array([1., 1., 1.])
denominator = np.array([0., 1., 1.])
where = denominator != 0

for i in range(5):
    # this temporary variable impacts the values for denominator == 0 ...
    # probably by reusing the memory :-(
    tmp = numerator * (i + denominator)
    print(f"tmp address: {tmp.__array_interface__['data'][0]}")
    del tmp
    divide_result = np.divide(numerator, denominator, where=where)
    print(f"div address: {divide_result.__array_interface__['data'][0]}")
    print(f"{divide_result}")

Output with CPython:

tmp address: 94334388189392
div address: 94334388189392
[0. 1. 1.]
tmp address: 94334388576960
div address: 94334388576960
[1. 1. 1.]
tmp address: 94334388189392
div address: 94334388189392
[2. 1. 1.]
tmp address: 94334388576960
div address: 94334388576960
[3. 1. 1.]
tmp address: 94334388189392
div address: 94334388189392
[4. 1. 1.]

Output with pypy:

tmp address: 94000129327360
div address: 94000126685472
[4.64422333e-310 1.00000000e+000 1.00000000e+000]
tmp address: 94000126688656
div address: 94000129178512
[4.64422345e-310 1.00000000e+000 1.00000000e+000]
tmp address: 94000129180688
div address: 94000129179344
[4.64422345e-310 1.00000000e+000 1.00000000e+000]
tmp address: 94000135025200
div address: 94000129181024
[4.64422374e-310 1.00000000e+000 1.00000000e+000]
tmp address: 94000127349824
div address: 94000135025536
[4.64422336e-310 1.00000000e+000 1.00000000e+000]

@lesteve
Copy link
Member

lesteve commented Aug 24, 2022

Opened #24245 to fix the problem for pypy and probably the intermittent nogil one too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants