-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Make multioutput tests deterministic #20468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Make multioutput tests deterministic #20468
Conversation
Let me try to push a follow-up debug commit that I plan to undo so check whether we can reproduce the problem on the CI with a large number of seeds. |
2 of the ARM runs failed on travis because the debug test with 1000 random seeds latested more than 10 minutes without output. The latest failure https://app.travis-ci.com/github/scikit-learn/scikit-learn/jobs/521780414 seems unrelated with many |
The macos builds failed with:
which is also annoying but also not related to this PR. |
The linux 32 bit python builds failed for some seeds such as: seed = 95
@pytest.mark.parametrize("seed", range(1000))
def test_multi_target_regression(seed):
X, y = datasets.make_regression(n_targets=3, random_state=seed)
X_train, y_train = X[:50], y[:50]
X_test, y_test = X[50:], y[50:]
references = np.zeros_like(y_test)
for n in range(3):
rgr = GradientBoostingRegressor(random_state=0)
rgr.fit(X_train, y_train[:, n])
references[:, n] = rgr.predict(X_test)
rgr = MultiOutputRegressor(GradientBoostingRegressor(random_state=0))
rgr.fit(X_train, y_train)
y_pred = rgr.predict(X_test)
> assert_almost_equal(references, y_pred)
E AssertionError:
E Arrays are not almost equal to 7 decimals
E
E Mismatched elements: 2 / 150 (1.33%)
E Max absolute difference: 0.03136504
E Max relative difference: 0.00051924
E x: array([[ -72.3890004, 157.8077157, 34.6056844],
E [ -4.8907783, 40.3488598, 3.7400021],
E [ 46.3881095, -160.8625623, 3.3398913],...
E y: array([[ -72.3890004, 157.8077157, 34.6056844],
E [ -4.8907783, 40.3488598, 3.7400021],
E [ 46.3881095, -160.8625623, 3.3398913],... but the seeds with failures are not the same for different builds with different versions for the dependencies (Python, numpy...). So maybe there are datasets that generate splits with near machine precision ties that cause rare platform specific behaviors... |
I am not sure why this problem would impact 32 bit Python more than 64 bit Python though... |
…uild]" This reverts commit c009f8d.
I reverted the debug commit to make it possible to merge this PR even if we do not understand the precise cause of unstable test. |
Maybe related to: #8853 We could use another estimator than the |
Fixing the random seed is enough for now. But it would still be interested to investigate #8853 in more details. |
And it could also be linked to #12259 where we pick up a random feature with some tie-breaking even if the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that we need to dig further but this should be done in a separate PR. This one makes the tests deterministic which was not the case and should have been in the first place. so LGTM.
Merging. We would need to further investigate the underlying cause and the other failures. |
To avoid rare random failures such as the one that occurred when running the last nightly builds:
https://github.com/scikit-learn/scikit-learn/runs/2986642144?check_suite_focus=true
Note however that the failure observed in this test run is weird:
It might be caused a numerical stability problem that only appears with very specific data. However I could not reproduce it locally by scanning the
random_state
of themake_regression
function inrange(500)
...Still the magnitude of the failure is worrying...