[MRG] TST use global_random_seed in sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py #23559

haochunchang · 2022-06-07T14:22:40Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Add global_random_seed fixture to 3 tests.
Change test_lad_equals_quantiles seed to 0-99 to cover all possible seed.

Any other comments?

glemaitre · 2022-06-29T15:29:15Z

Could you change the following test:

@pytest.mark.parametrize("loss", ("log_loss", "exponential"))
def test_classification_synthetic(loss, global_random_seed):
    # Test GradientBoostingClassifier on synthetic dataset used by
    # Hastie et al. in ESLII - Figure 10.9
    X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=global_random_seed)

    X_train, X_test = X[:2000], X[2000:]
    y_train, y_test = y[:2000], y[2000:]

    # Increasing the number of trees should decrease the test error
    common_params = {
        "max_depth": 1,
        "learning_rate": 1.0,
        "loss": loss,
        "random_state": global_random_seed,
    }
    gbrt_100_stumps = GradientBoostingClassifier(n_estimators=100, **common_params)
    gbrt_100_stumps.fit(X_train, y_train)

    gbrt_200_stumps = GradientBoostingClassifier(n_estimators=200, **common_params)
    gbrt_200_stumps.fit(X_train, y_train)

    assert gbrt_100_stumps.score(X_test, y_test) < gbrt_200_stumps.score(X_test, y_test)

    # Decision stumps are better suited for this dataset with a large number of
    # estimators.
    common_params = {
        "n_estimators": 200,
        "learning_rate": 1.0,
        "loss": loss,
        "random_state": global_random_seed,
    }
    gbrt_stumps = GradientBoostingClassifier(max_depth=1, **common_params)
    gbrt_stumps.fit(X_train, y_train)

    gbrt_10_nodes = GradientBoostingClassifier(max_leaf_nodes=10, **common_params)
    gbrt_10_nodes.fit(X_train, y_train)

    assert gbrt_stumps.score(X_test, y_test) > gbrt_10_nodes.score(X_test, y_test)

I checked the ESL book and it was not obvious what it was testing. By changing the random state, the test was failing.

So checking, the book is advocating that stumps are more suitable than deeper trees with this dataset (p. 363). So I modified the test accordingly and make sure that it passes all tests with different random state.

glemaitre · 2022-06-29T15:47:43Z

You can add the global random seed to test_regression_dataset, you will need to modify the MSE threshold from 0.04 to 0.045

glemaitre · 2022-06-29T16:02:38Z

Another test to modify and relaxing the MSE error:

def test_regression_synthetic(global_random_seed):
    # Test on synthetic regression datasets used in Leo Breiman,
    # `Bagging Predictors?. Machine Learning 24(2): 123-140 (1996).
    random_state = check_random_state(global_random_seed)
    regression_params = {
        "n_estimators": 100,
        "max_depth": 4,
        "min_samples_split": 2,
        "learning_rate": 0.1,
        "loss": "squared_error",
    }

    # Friedman1
    X, y = datasets.make_friedman1(n_samples=1200, random_state=random_state, noise=1.0)
    X_train, y_train = X[:200], y[:200]
    X_test, y_test = X[200:], y[200:]

    clf = GradientBoostingRegressor()
    clf.fit(X_train, y_train)
    mse = mean_squared_error(y_test, clf.predict(X_test))
    assert mse < 5.5

    # Friedman2
    X, y = datasets.make_friedman2(n_samples=1200, random_state=random_state)
    X_train, y_train = X[:200], y[:200]
    X_test, y_test = X[200:], y[200:]

    clf = GradientBoostingRegressor(**regression_params)
    clf.fit(X_train, y_train)
    mse = mean_squared_error(y_test, clf.predict(X_test))
    assert mse < 2500.0

    # Friedman3
    X, y = datasets.make_friedman3(n_samples=1200, random_state=random_state)
    X_train, y_train = X[:200], y[:200]
    X_test, y_test = X[200:], y[200:]

    clf = GradientBoostingRegressor(**regression_params)
    clf.fit(X_train, y_train)
    mse = mean_squared_error(y_test, clf.predict(X_test))
    assert mse < 0.025

glemaitre · 2022-06-29T16:06:10Z

You can add it to the test test_max_feature_regression

sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py

glemaitre · 2022-06-30T10:09:54Z

Ups I see that I did not put my comment for the right test file.
@haochunchang you can use the comments for the other PR of yours: #23549

haochunchang · 2022-07-01T04:11:58Z

Ups I see that I did not put my comment for the right test file. @haochunchang you can use the comments for the other PR of yours: #23549

No problem, thank you

…on.py

test_multinomial_deviance test_init_raw_predictions_values test_lad_equals_quantiles

jeremiedbb

Thanks for the PR @haochunchang. LGTM

haochunchang added 4 commits June 7, 2022 22:17

Add global_random_seed to test_multinomial_deviance

10e396b

Add global_random_seed to test_init_raw_predictions_shapes

9bda914

Add global_random_seed to test_init_raw_predictions_values

b1d46fe

Change seed of test_lad_equals_quantiles to 0-99

15519e9

github-actions bot added the module:ensemble label Jun 7, 2022

glemaitre added the No Changelog Needed label Jun 29, 2022

glemaitre self-requested a review June 29, 2022 13:28

ogrisel reviewed Jun 29, 2022

View reviewed changes

sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py Outdated Show resolved Hide resolved

Use global_random_seed in ensemble/test_gradient_boosting_loss_functi…

da14ad8

…on.py

haochunchang requested a review from ogrisel July 2, 2022 02:34

jeremiedbb added 2 commits September 25, 2022 13:55

Merge remote-tracking branch 'upstream/main' into pr/haochunchang/23559

c4ca78e

check [all random seeds]

dbba3ae

test_multinomial_deviance test_init_raw_predictions_values test_lad_equals_quantiles

jeremiedbb approved these changes Sep 25, 2022

View reviewed changes

jeremiedbb merged commit 1964f3d into scikit-learn:main Sep 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG] TST use global_random_seed in sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py #23559

[MRG] TST use global_random_seed in sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py #23559

Uh oh!

haochunchang commented Jun 7, 2022

Uh oh!

glemaitre commented Jun 29, 2022

Uh oh!

glemaitre commented Jun 29, 2022

Uh oh!

glemaitre commented Jun 29, 2022

Uh oh!

glemaitre commented Jun 29, 2022

Uh oh!

Uh oh!

glemaitre commented Jun 30, 2022 •

edited

Loading

Uh oh!

haochunchang commented Jul 1, 2022

Uh oh!

jeremiedbb left a comment

Uh oh!

Uh oh!

Uh oh!

[MRG] TST use global_random_seed in sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py #23559

[MRG] TST use global_random_seed in sklearn/ensemble/tests/test_gradient_boosting_loss_functions.py #23559

Uh oh!

Conversation

haochunchang commented Jun 7, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

glemaitre commented Jun 29, 2022

Uh oh!

glemaitre commented Jun 29, 2022

Uh oh!

glemaitre commented Jun 29, 2022

Uh oh!

glemaitre commented Jun 29, 2022

Uh oh!

Uh oh!

glemaitre commented Jun 30, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

haochunchang commented Jul 1, 2022

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Jun 30, 2022 •

edited

Loading