Skip to content

TST use global_random_seed in sklearn/_loss/tests/test_link.py #23751

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

marenwestermann
Copy link
Member

Reference Issues/PRs

towards #22827

What does this implement/fix? Explain your changes.

I used global_random_seed in the functions test_link_inverse_identity and test_link_out_argument. With respect to both tests, this caused a number of test failures, all related to the default value of rtol in assert_allclose. I increased this value in both tests which made all tests pass. Please check if the increased tolerance is acceptable.

Any other comments?

ping @lorentzenchr

test_link_inverse_identity
test_link_out_argument
Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marenwestermann Thanks for working on this.

@@ -73,15 +73,15 @@ def test_link_inverse_identity(link):
# we do not need to distinguish.
raw_prediction = rng.normal(loc=0, scale=10, size=(n_samples))

assert_allclose(link.link(link.inverse(raw_prediction)), raw_prediction)
assert_allclose(link.link(link.inverse(raw_prediction)), raw_prediction, rtol=1e-2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rtol is a bit high. Is it specific for a certain link function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Good point regarding the link function. I checked and all test failures (in both tests) are related to the LogitLink link function. Should it therefore maybe be tested separately?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thinks raw_prediction = rng.normal(loc=0, scale=10) sometimes creates very large (or very negative) values. Logit link then takes the exponential and the logarithms which leads to loss of precision.

We can use different random numbers, how about rng.uniform(-30, 30)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is worth trying. We should document as well with a comment this behaviour and why we do this choice. This is not a trivial thing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried your suggestion but it results in the same test failures compared to using rng.normal, i.e. all test failures are related to the LogitLink function and the failures occur with the same random seeds.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with rng.uniform(-20, 20). The reason is that for the logit link

from scipy.special import expit, logit

x=20
logit(expit(x))

the term expit(x) comes very close to 1 for large positive x and therefore loses precision. For large negative values, expit(x) is close to zero an therefore does not lose precision.
With float64, logit(expit(20)) is correct to about rtol 2e-9.

@glemaitre glemaitre self-requested a review June 28, 2022 09:10
@glemaitre glemaitre removed their request for review June 28, 2022 09:50
@marenwestermann
Copy link
Member Author

I found a solution by using global_random_seed % 10 and setting rtol=1e-06 in both tests. I'm aware that this is not an ideal solution.

@marenwestermann
Copy link
Member Author

Also, to debug the issue, here's a minimal reproducible of one of the failing tests:

import numpy as np
import pandas as pd
from numpy.testing import assert_allclose
from sklearn._loss.link import (
    _LINKS,
    MultinomialLogit,
)

rng = np.random.RandomState(8)
link = _LINKS['logit']()
n_samples, n_classes = 100, None
if link.is_multiclass:
    n_classes = 10
    raw_prediction = rng.normal(loc=0, scale=10, size=(n_samples, n_classes))
    if isinstance(link, MultinomialLogit):
        raw_prediction = link.symmetrize_raw_prediction(raw_prediction)
else:
    # So far, the valid interval of raw_prediction is (-inf, inf) and
    # we do not need to distinguish.
    raw_prediction = rng.normal(loc=0, scale=10, size=(n_samples))

print(f"stats: {pd.DataFrame(raw_prediction).describe()}")
assert_allclose(link.link(link.inverse(raw_prediction)), raw_prediction)

which fails with:

stats:                 0
count  100.000000
mean     0.616100
std     11.310588
min    -31.349198
25%     -6.079379
50%      0.878574
75%      7.340690
max     25.450872
Traceback (most recent call last):
  File "/Users/maren/tmp/1.py", line 23, in <module>
    assert_allclose(link.link(link.inverse(raw_prediction)), raw_prediction)
  File "/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1530, in assert_allclose
    assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
  File "/Users/maren/mambaforge/envs/sklearn-dev/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 844, in assert_array_compare
    raise AssertionError(msg)
AssertionError:
Not equal to tolerance rtol=1e-07, atol=0

Mismatched elements: 3 / 100 (3%)
Max absolute difference: 4.60124204e-06
Max relative difference: 1.85067717e-07
 x: array([ 9.120472e-01,  1.091283e+01, -1.946970e+01, -1.386350e+01,
       -2.296492e+01,  2.409834e+01,  1.727836e+01,  2.204556e+01,
        7.948276e+00,  9.764211e+00, -1.183427e+01,  1.916364e+01,...
 y: array([ 9.120472e-01,  1.091283e+01, -1.946970e+01, -1.386350e+01,
       -2.296492e+01,  2.409834e+01,  1.727836e+01,  2.204556e+01,
        7.948276e+00,  9.764211e+00, -1.183427e+01,  1.916364e+01,...

The maximum number of mismatched elements in the failing tests is 3 (as shown in this example).

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there:wink:

@glemaitre glemaitre merged commit 96b91d7 into scikit-learn:main Jun 30, 2022
@glemaitre
Copy link
Member

Thanks @marenwestermann LGTM.

@marenwestermann
Copy link
Member Author

Thanks a lot for the reviews, I learned a lot! :)

@lorentzenchr
Copy link
Member

Thanks @marenwestermann 🙏

ogrisel pushed a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022
…t-learn#23751)

Co-authored-by: Maren Westermann <maren.westermann@free-now.com>
@marenwestermann marenwestermann deleted the test-link-global-random-seed branch March 7, 2023 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants