Skip to content

ENH add sample weight support to MLP #30155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Jan 16, 2025
Merged

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Oct 26, 2024

Reference Issues/PRs

Supersedes and closes #25646 (which seems stalled).

What does this implement/fix? Explain your changes.

This adds sample_weight support to MLPClassifier and MLPRegressor.

Any other comments?

The objective function being minimized reads $obj = \frac{1}{n} \sum_i loss_i + \frac{alpha}{2n}\lVert coef\rVert^2$. This is then extended to $obj = \frac{1}{\sum_i s_i} \sum_i s_i \cdot loss_i + \frac{\alpha}{2\sum_i s_i}\lVert coef\rVert^2$ with sample weights $s_i$.

Tests rely on the common tests (so far).

Copy link

github-actions bot commented Oct 26, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 3f9d493. Link to the linter CI: here

@zshu115x
Copy link

Hi @lorentzenchr, I think I addressed all the comments for this PR: #25646, and I was just waiting for approval or further review. If you plan to review and approve it. I can get it ready again.

@glemaitre glemaitre self-requested a review November 6, 2024 19:39
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Just a couple of comments regarding using fully the common test.

@glemaitre glemaitre requested review from glemaitre and removed request for glemaitre November 6, 2024 20:31
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally looks good, just a few comments

@@ -542,6 +542,17 @@
]
},
MDS: {"check_dict_unchanged": dict(max_iter=5, n_components=1, n_init=2)},
MLPClassifier: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is good enough for me. Thanks @lorentzenchr

@ogrisel mentioned to me stochastic solver and the equivalence test might not be best friend. So SGD/Adam + early-stopping should be omitted.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly what I did in 432bcc0.

Copy link
Member

@ogrisel ogrisel Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note I would like to develop a similar test that can handle estimator/hyper-param combinations that lead to a stochastic fit (e.g. with internal resampling operations like bagging / CV, non-convex objective functions sensitive to random inits, ...).

However, this involves running the fit operation many times (typically 30 times or more) with different random seeds and then doing a KS-test on predictions for estimators fitted with weighted data vs repeated data. As a result, this is much more expensive to run and setting a non-noisy yet informative pvalue threshold might be operationally tricky in the context of a CI that runs many such tests. So for now, this is done manually in a side notebook as referenced in #16298 (comment).

@glemaitre glemaitre self-requested a review November 7, 2024 09:26
Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @lorentzenchr and @zshu115x

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a test to check that we pass the sample_weight to the underlying function during early-stopping

@pytest.mark.parametrize(
    "MLPEstimator, score_func",
    [(MLPClassifier, accuracy_score), (MLPRegressor, r2_score)],
)
@pytest.mark.parametrize("solver", ["sgd", "adam"])
def test_mlp_early_stopping_score_sample_weight(MLPEstimator, score_func, monkeypatch, solver):
    """Check that sample_weight is passed to scoring functions when using
    early-stopping."""
    rng = np.random.RandomState(42)
    sample_weight = rng.randint(1, 6, size=y_iris.size)

    def mock_score(y_true, y_pred, sample_weight=None):
        mock_score.last_sample_weight = sample_weight
        return score_func(y_true, y_pred, sample_weight=sample_weight)

    mock_score.last_sample_weight = None

    if isinstance(MLPEstimator(), MLPClassifier):
        monkeypatch.setattr(
            "sklearn.neural_network._multilayer_perceptron.accuracy_score", mock_score
        )
    else:
        monkeypatch.setattr(
            "sklearn.neural_network._multilayer_perceptron.r2_score", mock_score
        )

    mlp = MLPEstimator(
        hidden_layer_sizes=(10,),
        solver=solver,
        max_iter=10,
        early_stopping=True,
        validation_fraction=0.5,
        random_state=42,
    )

    mlp.fit(X_iris, y_iris, sample_weight=sample_weight)

    assert mock_score.last_sample_weight is not None
    assert mock_score.last_sample_weight.shape == (sample_weight.size // 2,)

Comment on lines 545 to 555
MLPClassifier: {
"check_sample_weight_equivalence": [
dict(solver="lbfgs"),
dict(solver="lbfgs", early_stopping=True, n_iter_no_change=1, tol=1e-2),
]
},
MLPRegressor: {
"check_sample_weight_equivalence": [
dict(solver="sgd", tol=1e-2, random_state=42),
]
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stupid me, there is no early-stopping for lbfgs so we are actually not testing anything. So my proposal was useless here.

Suggested change
MLPClassifier: {
"check_sample_weight_equivalence": [
dict(solver="lbfgs"),
dict(solver="lbfgs", early_stopping=True, n_iter_no_change=1, tol=1e-2),
]
},
MLPRegressor: {
"check_sample_weight_equivalence": [
dict(solver="sgd", tol=1e-2, random_state=42),
]
},
MLPClassifier: {
"check_sample_weight_equivalence": dict(solver="lbfgs")
},
MLPRegressor: {
"check_sample_weight_equivalence": dict(solver="sgd", tol=1e-2, random_state=42)
},

I'll provide a test that at least check that we pass sample-weight to the underlying scorer when computing early-stopping.

Copy link
Member

@ogrisel ogrisel Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could indeed have an estimator-specific test with a fixed random seed to check that it's doing something reasonable on a synthetic dataset crafted specifically to lead to very different predictions if the weights were not handled properly:

import numpy as np
from sklearn.neural_network import MLPClassifier

X = np.asarray([0, 0, 0, 1, 1, 1, 1]).reshape(-1, 1)
y = np.asarray([0, 1, 1, 2, 2, 2, 1])
sw = np.asarray([2, 0, 0, 0.1, 0.1, 0.1, 3])

global_random_seed = None  # TODO: set me when adapting this to pytest.
solver = "sgd"  # TODO: adapt to use pytest.parametrize instead.
clf = MLPClassifier(
    random_state=global_random_seed, solver=solver, max_iter=10_000
).fit(X, y, sample_weight=sw)
np.testing.assert_array_equal(clf.predict([[0], [1]]), [0, 1])

I would expect this to pass for all solvers and for any value of global_random_seed.

For MLPRegressor it might be slightly trickier to write a similar test, but I suppose this is possible as well.

Copy link
Member Author

@lorentzenchr lorentzenchr Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre Do we really need that? You can read the code and convince yourself that sample weights are passed. I have the impression that we get neurotic if we go for that kind of rigor.
Such a test would also be a unprecedented, we don't have it for SGD nor GBT nor HGBT, ...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me the benefit is twofold:

  • We had in the past in multiple estimators this bug where we don't pass the sample weight the evaluation internal metric (for early-stopping or internal CV). So there is not reason that this issue happen in the future if we don't have a non-regression test.
  • IMO, the test also serve as documentation: e.g. I could find about it when going through the test suite of the histogram gradient boosting.

So even though we might never need to touch this code and it might seem an overkill, I still find it useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hesitate to commit this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather test values of best_validation_score_ than the monkeypatch solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've to stop for today.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather test values of best_validation_score_ than the monkeypatch solution.

This is a viable alternative as well. I might get some time to make a small test tonight inside this direction.

Comment on lines 545 to 555
MLPClassifier: {
"check_sample_weight_equivalence": [
dict(solver="lbfgs"),
dict(solver="lbfgs", early_stopping=True, n_iter_no_change=1, tol=1e-2),
]
},
MLPRegressor: {
"check_sample_weight_equivalence": [
dict(solver="sgd", tol=1e-2, random_state=42),
]
},
Copy link
Member

@ogrisel ogrisel Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could indeed have an estimator-specific test with a fixed random seed to check that it's doing something reasonable on a synthetic dataset crafted specifically to lead to very different predictions if the weights were not handled properly:

import numpy as np
from sklearn.neural_network import MLPClassifier

X = np.asarray([0, 0, 0, 1, 1, 1, 1]).reshape(-1, 1)
y = np.asarray([0, 1, 1, 2, 2, 2, 1])
sw = np.asarray([2, 0, 0, 0.1, 0.1, 0.1, 3])

global_random_seed = None  # TODO: set me when adapting this to pytest.
solver = "sgd"  # TODO: adapt to use pytest.parametrize instead.
clf = MLPClassifier(
    random_state=global_random_seed, solver=solver, max_iter=10_000
).fit(X, y, sample_weight=sw)
np.testing.assert_array_equal(clf.predict([[0], [1]]), [0, 1])

I would expect this to pass for all solvers and for any value of global_random_seed.

For MLPRegressor it might be slightly trickier to write a similar test, but I suppose this is possible as well.

@glemaitre glemaitre self-requested a review November 7, 2024 23:44
@ogrisel
Copy link
Member

ogrisel commented Nov 8, 2024

Reposting an earlier suggestion that disappeared because it was posted on a thread for an outdated piece of code:

check_sample_weight_equivalence can currently only properly test weight semantics on estimators with hyperparameter settings that lead to a deterministic fit method.

So as a complement, we could test that sample_weight are respected when fitting with a stochastic solver with a dedicated test. This test could use fixed random seeds to check that the solver is converging to something reasonable on a synthetic dataset crafted specifically to lead to very different predictions if the weights were not handled properly:

import numpy as np
from sklearn.neural_network import MLPClassifier

X = np.asarray([0, 0, 0, 1, 1, 1, 1]).reshape(-1, 1)
y = np.asarray([0, 1, 1, 2, 2, 2, 1])
sw = np.asarray([2, 0, 0, 0.1, 0.1, 0.1, 3])

global_random_seed = None  # TODO: set me when adapting this to pytest.
solver = "sgd"  # TODO: adapt to use pytest.parametrize instead.
clf = MLPClassifier(
    random_state=global_random_seed, solver=solver, max_iter=10_000
).fit(X, y, sample_weight=sw)
np.testing.assert_array_equal(clf.predict([[0], [1]]), [0, 1])

I would expect this to pass for all solvers and for any value of global_random_seed.

For MLPRegressor it might be slightly trickier to write a similar test, but I suppose this is possible as well (or maybe even reusing this code but with an assert_allclose with suitable atol in the last line.

EDIT: rephrased and fixed meaningless long sentence in the second paragraph.

@lorentzenchr
Copy link
Member Author

Since check_sample_weight_equivalence can currently only properly test weight semantics on estimator with hyperparameter settings that lead to a deterministic fit method could indeed have an estimator-specific test with a fixed random seed to check that it's doing something reasonable on a synthetic dataset crafted specifically to lead to very different predictions if the weights were not handled properly:

Firstly, I don't understand this long sentence. Secondly, yes we have stochastic solvers with check_sample_weight_equivalence as specified in PER_ESTIMATOR_CHECK_PARAMS. Thirdly, I don't get what your additional test should actually check (what is not yet tested).

@glemaitre glemaitre added this to the 1.6 milestone Nov 8, 2024
@ogrisel
Copy link
Member

ogrisel commented Nov 12, 2024

Firstly, I don't understand this long sentence.

Sorry, I must have been distracted when writing this comment. I fixed it by splitting in several sentences.

Secondly, yes we have stochastic solvers with check_sample_weight_equivalence as specified in PER_ESTIMATOR_CHECK_PARAMS.

I had not realized that the common estimator check would pass for MLPRegressor with solver="sgd". That's quite unexpected. I suppose this might be a bit brittle and might start to fail if we change check_sample_weight_equivalence to make it sensitive, as being explored in #30143.

Thirdly, I don't get what your additional test should actually check (what is not yet tested).

I agree that if the common check is stable enough to not be marked XFAIL, then it's good enough. Let's keep this PR as is for now and maybe reevaluate for alternative testing only when actually needed.

@jeremiedbb jeremiedbb modified the milestones: 1.6, 1.7 Dec 3, 2024
@adamjstewart
Copy link
Contributor

Anything I can do to help get this merged? Not sure where it's stuck in the review process.

@lorentzenchr
Copy link
Member Author

@adamjstewart You could review if you you judge yourself knowledgeable for this PR.

@adamjstewart
Copy link
Contributor

I wouldn't exactly judge myself as knowledgeable of sklearn internals, but I have been using your branch locally for my experiments and haven't discovered any errors yet.

@lorentzenchr
Copy link
Member Author

@adamjstewart A review would really help.

Copy link
Contributor

@adamjstewart adamjstewart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ran this and it didn't crash. Would love to see type hints, but the existing code is also missing type hints.

@ogrisel ogrisel self-requested a review January 10, 2025 15:08
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @lorentzenchr. I finally reviewed this PR. Just a few minor renaming suggestions below. I can take care of them and merge if you wish.

@lorentzenchr
Copy link
Member Author

I can take care of them and merge if you wish.

Yes, please - and thank you for the offer.

@ogrisel ogrisel enabled auto-merge (squash) January 16, 2025 14:02
@ogrisel
Copy link
Member

ogrisel commented Jan 16, 2025

BTW, I tried merging this branch into my local branch for #30143, and I confirmed that the refined equivalence check will fail for the SGD solver. Lowering the tol of the SGD solver, would be enough not make the check pass: a more costly statistical test over several random state values would be required to thoroughly check the equivalence with the stochastic gradient solver.

But let's do that in a follow-up PR (e.g. in #30143 itself or another one).

@ogrisel ogrisel merged commit 311bf6b into scikit-learn:main Jan 16, 2025
29 checks passed
@lorentzenchr lorentzenchr deleted the mlp_sw branch January 16, 2025 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants