MAINT Make multioutput tests deterministic #20468

ogrisel · 2021-07-05T12:36:25Z

To avoid rare random failures such as the one that occurred when running the last nightly builds:

https://github.com/scikit-learn/scikit-learn/runs/2986642144?check_suite_focus=true

Note however that the failure observed in this test run is weird:

      def test_multi_target_regression():
          X, y = datasets.make_regression(n_targets=3)
          X_train, y_train = X[:50], y[:50]
          X_test, y_test = X[50:], y[50:]
      
          references = np.zeros_like(y_test)
          for n in range(3):
              rgr = GradientBoostingRegressor(random_state=0)
              rgr.fit(X_train, y_train[:, n])
              references[:, n] = rgr.predict(X_test)
      
          rgr = MultiOutputRegressor(GradientBoostingRegressor(random_state=0))
          rgr.fit(X_train, y_train)
          y_pred = rgr.predict(X_test)
      
  >       assert_almost_equal(references, y_pred)
  E       AssertionError: 
  E       Arrays are not almost equal to 7 decimals
  E       
  E       Mismatched elements: 25 / 150 (16.7%)
  E       Max absolute difference: 0.18403063
  E       Max relative difference: 0.06804465
  E        x: array([[-146.7381045,   54.9331913,  -68.1286592],
  E              [ -70.7373213,   19.757404 , -188.343298 ],
  E              [ -50.9623975,  102.3977605,   29.7333162],...
  E        y: array([[-146.793342 ,   54.9331913,  -68.1286592],
  E              [ -70.7902073,   19.757404 , -188.343298 ],
  E              [ -50.9623975,  102.3977605,   29.7333162],...

It might be caused a numerical stability problem that only appears with very specific data. However I could not reproduce it locally by scanning the random_state of the make_regression function in range(500)...

Still the magnitude of the failure is worrying...

ogrisel · 2021-07-05T12:40:56Z

Let me try to push a follow-up debug commit that I plan to undo so check whether we can reproduce the problem on the CI with a large number of seeds.

ogrisel · 2021-07-05T13:10:08Z

2 of the ARM runs failed on travis because the debug test with 1000 random seeds latested more than 10 minutes without output.

The latest failure https://app.travis-ci.com/github/scikit-learn/scikit-learn/jobs/521780414 seems unrelated with many numpy.linalg.LinAlgError: Eigenvalues did not converge on other tests (which is worrying but not in the scope of this PR...).

ogrisel · 2021-07-05T13:22:25Z

The macos builds failed with:

wget https://homebrew.bintray.com/bottles/libomp-11.0.0.high_sierra.bottle.tar.gz
--2021-07-05 12:40:00--  https://homebrew.bintray.com/bottles/libomp-11.0.0.high_sierra.bottle.tar.gz
Resolving homebrew.bintray.com (homebrew.bintray.com)... 52.88.131.165, 34.210.204.177
Connecting to homebrew.bintray.com (homebrew.bintray.com)|52.88.131.165|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-07-05 12:40:01 ERROR 403: Forbidden.

which is also annoying but also not related to this PR.

ogrisel · 2021-07-05T13:25:37Z

The linux 32 bit python builds failed for some seeds such as:

  seed = 95
  
      @pytest.mark.parametrize("seed", range(1000))
      def test_multi_target_regression(seed):
          X, y = datasets.make_regression(n_targets=3, random_state=seed)
          X_train, y_train = X[:50], y[:50]
          X_test, y_test = X[50:], y[50:]
      
          references = np.zeros_like(y_test)
          for n in range(3):
              rgr = GradientBoostingRegressor(random_state=0)
              rgr.fit(X_train, y_train[:, n])
              references[:, n] = rgr.predict(X_test)
      
          rgr = MultiOutputRegressor(GradientBoostingRegressor(random_state=0))
          rgr.fit(X_train, y_train)
          y_pred = rgr.predict(X_test)
      
  >       assert_almost_equal(references, y_pred)
  E       AssertionError: 
  E       Arrays are not almost equal to 7 decimals
  E       
  E       Mismatched elements: 2 / 150 (1.33%)
  E       Max absolute difference: 0.03136504
  E       Max relative difference: 0.00051924
  E        x: array([[ -72.3890004,  157.8077157,   34.6056844],
  E              [  -4.8907783,   40.3488598,    3.7400021],
  E              [  46.3881095, -160.8625623,    3.3398913],...
  E        y: array([[ -72.3890004,  157.8077157,   34.6056844],
  E              [  -4.8907783,   40.3488598,    3.7400021],
  E              [  46.3881095, -160.8625623,    3.3398913],...

but the seeds with failures are not the same for different builds with different versions for the dependencies (Python, numpy...).

So maybe there are datasets that generate splits with near machine precision ties that cause rare platform specific behaviors...

ogrisel · 2021-07-05T13:28:40Z

I am not sure why this problem would impact 32 bit Python more than 64 bit Python though...

…uild]" This reverts commit c009f8d.

ogrisel · 2021-07-05T13:30:02Z

I reverted the debug commit to make it possible to merge this PR even if we do not understand the precise cause of unstable test.

glemaitre · 2021-07-05T14:49:02Z

Maybe related to: #8853

We could use another estimator than the GradientBoostingRegressor.

ogrisel · 2021-07-05T16:39:55Z

We could use another estimator than the GradientBoostingRegressor.

Fixing the random seed is enough for now. But it would still be interested to investigate #8853 in more details.

glemaitre · 2021-07-05T16:47:57Z

And it could also be linked to #12259 where we pick up a random feature with some tie-breaking even if the random_state is fixed if I recall properly.

jeremiedbb

I agree that we need to dig further but this should be done in a separate PR. This one makes the tests deterministic which was not the case and should have been in the first place. so LGTM.

glemaitre · 2021-07-13T09:31:49Z

Merging. We would need to further investigate the underlying cause and the other failures.

…arn#20468)

Make multioutput tests deterministic

e9cc9fd

ogrisel added Build / CI module:test-suite everything related to our tests labels Jul 5, 2021

DEBUG: try test_multi_target_regression with 1000 seeds [cd build]

c009f8d

Revert "DEBUG: try test_multi_target_regression with 1000 seeds [cd b…

473c329

…uild]" This reverts commit c009f8d.

ogrisel added the No Changelog Needed label Jul 5, 2021

ogrisel mentioned this pull request Jul 5, 2021

[CI] Add circleci build and test for arm64. #20460

Merged

jeremiedbb approved these changes Jul 9, 2021

View reviewed changes

glemaitre merged commit 2bcbc91 into scikit-learn:main Jul 13, 2021

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

MAINT Make multioutput tests deterministic in common tests (scikit-le…

55a9f0f

…arn#20468)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT Make multioutput tests deterministic #20468

MAINT Make multioutput tests deterministic #20468

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

glemaitre commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

glemaitre commented Jul 5, 2021

Uh oh!

jeremiedbb left a comment

Uh oh!

glemaitre commented Jul 13, 2021

Uh oh!

Uh oh!

Uh oh!

MAINT Make multioutput tests deterministic #20468

MAINT Make multioutput tests deterministic #20468

Uh oh!

Conversation

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

glemaitre commented Jul 5, 2021

Uh oh!

ogrisel commented Jul 5, 2021

Uh oh!

glemaitre commented Jul 5, 2021

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Jul 13, 2021

Uh oh!

Uh oh!