Skip to content

MAINT Make multioutput tests deterministic #20468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Jul 5, 2021

To avoid rare random failures such as the one that occurred when running the last nightly builds:

https://github.com/scikit-learn/scikit-learn/runs/2986642144?check_suite_focus=true

Note however that the failure observed in this test run is weird:

      def test_multi_target_regression():
          X, y = datasets.make_regression(n_targets=3)
          X_train, y_train = X[:50], y[:50]
          X_test, y_test = X[50:], y[50:]
      
          references = np.zeros_like(y_test)
          for n in range(3):
              rgr = GradientBoostingRegressor(random_state=0)
              rgr.fit(X_train, y_train[:, n])
              references[:, n] = rgr.predict(X_test)
      
          rgr = MultiOutputRegressor(GradientBoostingRegressor(random_state=0))
          rgr.fit(X_train, y_train)
          y_pred = rgr.predict(X_test)
      
  >       assert_almost_equal(references, y_pred)
  E       AssertionError: 
  E       Arrays are not almost equal to 7 decimals
  E       
  E       Mismatched elements: 25 / 150 (16.7%)
  E       Max absolute difference: 0.18403063
  E       Max relative difference: 0.06804465
  E        x: array([[-146.7381045,   54.9331913,  -68.1286592],
  E              [ -70.7373213,   19.757404 , -188.343298 ],
  E              [ -50.9623975,  102.3977605,   29.7333162],...
  E        y: array([[-146.793342 ,   54.9331913,  -68.1286592],
  E              [ -70.7902073,   19.757404 , -188.343298 ],
  E              [ -50.9623975,  102.3977605,   29.7333162],...

It might be caused a numerical stability problem that only appears with very specific data. However I could not reproduce it locally by scanning the random_state of the make_regression function in range(500)...

Still the magnitude of the failure is worrying...

@ogrisel ogrisel added Build / CI module:test-suite everything related to our tests labels Jul 5, 2021
@ogrisel
Copy link
Member Author

ogrisel commented Jul 5, 2021

Let me try to push a follow-up debug commit that I plan to undo so check whether we can reproduce the problem on the CI with a large number of seeds.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 5, 2021

2 of the ARM runs failed on travis because the debug test with 1000 random seeds latested more than 10 minutes without output.

The latest failure https://app.travis-ci.com/github/scikit-learn/scikit-learn/jobs/521780414 seems unrelated with many numpy.linalg.LinAlgError: Eigenvalues did not converge on other tests (which is worrying but not in the scope of this PR...).

@ogrisel
Copy link
Member Author

ogrisel commented Jul 5, 2021

The macos builds failed with:

wget https://homebrew.bintray.com/bottles/libomp-11.0.0.high_sierra.bottle.tar.gz
--2021-07-05 12:40:00--  https://homebrew.bintray.com/bottles/libomp-11.0.0.high_sierra.bottle.tar.gz
Resolving homebrew.bintray.com (homebrew.bintray.com)... 52.88.131.165, 34.210.204.177
Connecting to homebrew.bintray.com (homebrew.bintray.com)|52.88.131.165|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-07-05 12:40:01 ERROR 403: Forbidden.

which is also annoying but also not related to this PR.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 5, 2021

The linux 32 bit python builds failed for some seeds such as:

  seed = 95
  
      @pytest.mark.parametrize("seed", range(1000))
      def test_multi_target_regression(seed):
          X, y = datasets.make_regression(n_targets=3, random_state=seed)
          X_train, y_train = X[:50], y[:50]
          X_test, y_test = X[50:], y[50:]
      
          references = np.zeros_like(y_test)
          for n in range(3):
              rgr = GradientBoostingRegressor(random_state=0)
              rgr.fit(X_train, y_train[:, n])
              references[:, n] = rgr.predict(X_test)
      
          rgr = MultiOutputRegressor(GradientBoostingRegressor(random_state=0))
          rgr.fit(X_train, y_train)
          y_pred = rgr.predict(X_test)
      
  >       assert_almost_equal(references, y_pred)
  E       AssertionError: 
  E       Arrays are not almost equal to 7 decimals
  E       
  E       Mismatched elements: 2 / 150 (1.33%)
  E       Max absolute difference: 0.03136504
  E       Max relative difference: 0.00051924
  E        x: array([[ -72.3890004,  157.8077157,   34.6056844],
  E              [  -4.8907783,   40.3488598,    3.7400021],
  E              [  46.3881095, -160.8625623,    3.3398913],...
  E        y: array([[ -72.3890004,  157.8077157,   34.6056844],
  E              [  -4.8907783,   40.3488598,    3.7400021],
  E              [  46.3881095, -160.8625623,    3.3398913],...

but the seeds with failures are not the same for different builds with different versions for the dependencies (Python, numpy...).

So maybe there are datasets that generate splits with near machine precision ties that cause rare platform specific behaviors...

@ogrisel
Copy link
Member Author

ogrisel commented Jul 5, 2021

I am not sure why this problem would impact 32 bit Python more than 64 bit Python though...

@ogrisel
Copy link
Member Author

ogrisel commented Jul 5, 2021

I reverted the debug commit to make it possible to merge this PR even if we do not understand the precise cause of unstable test.

@glemaitre
Copy link
Member

Maybe related to: #8853

We could use another estimator than the GradientBoostingRegressor.

@ogrisel
Copy link
Member Author

ogrisel commented Jul 5, 2021

We could use another estimator than the GradientBoostingRegressor.

Fixing the random seed is enough for now. But it would still be interested to investigate #8853 in more details.

@glemaitre
Copy link
Member

And it could also be linked to #12259 where we pick up a random feature with some tie-breaking even if the random_state is fixed if I recall properly.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we need to dig further but this should be done in a separate PR. This one makes the tests deterministic which was not the case and should have been in the first place. so LGTM.

@glemaitre glemaitre merged commit 2bcbc91 into scikit-learn:main Jul 13, 2021
@glemaitre
Copy link
Member

Merging. We would need to further investigate the underlying cause and the other failures.

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants