Skip to content

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Aug 24, 2022

Triggered by investigating #24221.

We are using code like:

result = np.divide(numerator, denominator, where=denominator!=0)

result[denominator == 0] values are undefined when using this. For CPython it seems like the np.empty allocated for the return value reuses a temporary array created in a previous line that computes (X[:, np.newaxis, :] - X[np.newaxis, :, :]) ** 2 so that result[denominator == 0] only contains 0. For PyPy this is not the case and at one point we get a 4. rather than a 0. (don't ask me where the 4. comes from ...).

A snippet to reproduce a similar behaviour:

import numpy as np

numerator = np.array([1., 1., 1.])
denominator = np.array([0., 1., 1.])
where = denominator != 0

tmp = np.array([-999., -999., -999.])
print(f"tmp address: {tmp.__array_interface__['data'][0]}")
del tmp
divide_result = np.divide(numerator, denominator, where=where)
print(f"div address: {divide_result.__array_interface__['data'][0]}")
print(f"{divide_result}")

Output with CPython (address is the same so the -999. of the tmp array is reused):

tmp address: 94515367946736
div address: 94515367946736
[-999.    1.    1.]

Output with pypy (address is not the same first value is random):

tmp address: 94369592900336
div address: 94369602251488
[4.66247785e-310 1.00000000e+000 1.00000000e+000]

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR and the explanation!

Besides a minor comment on the whats new placement, LGTM

@@ -198,6 +198,13 @@ Changelog
:mod:`sklearn.feature_selection`
................................

:mod:`sklearn.gaussian_process`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section seems to be out of place in the whats new.

@thomasjpfan thomasjpfan added the Quick Review For PRs that are quick to review label Aug 27, 2022
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @lesteve. We should almost always be using the out arg of divide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:gaussian_process Quick Review For PRs that are quick to review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants