ENH add sample weight support to MLP #30155

lorentzenchr · 2024-10-26T16:18:49Z

Reference Issues/PRs

Supersedes and closes #25646 (which seems stalled).

What does this implement/fix? Explain your changes.

This adds sample_weight support to MLPClassifier and MLPRegressor.

Any other comments?

The objective function being minimized reads $obj = \frac{1}{n} \sum_i loss_i + \frac{alpha}{2n}\lVert coef\rVert^2$. This is then extended to $obj = \frac{1}{\sum_i s_i} \sum_i s_i \cdot loss_i + \frac{\alpha}{2\sum_i s_i}\lVert coef\rVert^2$ with sample weights $s_i$.

Tests rely on the common tests (so far).

github-actions · 2024-10-26T16:20:06Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 3f9d493. Link to the linter CI: here}

zshu115x · 2024-10-29T03:25:18Z

Hi @lorentzenchr, I think I addressed all the comments for this PR: #25646, and I was just waiting for approval or further review. If you plan to review and approve it. I can get it ready again.

doc/whats_new/upcoming_changes/sklearn.neural_network/30155.feature.rst

sklearn/neural_network/tests/test_mlp.py

sklearn/neural_network/_multilayer_perceptron.py

glemaitre

It looks good. Just a couple of comments regarding using fully the common test.

OmarManzoor

Generally looks good, just a few comments

sklearn/neural_network/_multilayer_perceptron.py

glemaitre · 2024-11-07T09:22:40Z

sklearn/utils/_test_common/instance_generator.py

@@ -542,6 +542,17 @@
        ]
    },
    MDS: {"check_dict_unchanged": dict(max_iter=5, n_components=1, n_init=2)},
+    MLPClassifier: {


It is good enough for me. Thanks @lorentzenchr

@ogrisel mentioned to me stochastic solver and the equivalence test might not be best friend. So SGD/Adam + early-stopping should be omitted.

Exactly what I did in 432bcc0.

Note I would like to develop a similar test that can handle estimator/hyper-param combinations that lead to a stochastic fit (e.g. with internal resampling operations like bagging / CV, non-convex objective functions sensitive to random inits, ...).

However, this involves running the fit operation many times (typically 30 times or more) with different random seeds and then doing a KS-test on predictions for estimators fitted with weighted data vs repeated data. As a result, this is much more expensive to run and setting a non-noisy yet informative pvalue threshold might be operationally tricky in the context of a CI that runs many such tests. So for now, this is done manually in a side notebook as referenced in #16298 (comment).

OmarManzoor

LGTM. Thanks @lorentzenchr and @zshu115x

glemaitre

Here is a test to check that we pass the sample_weight to the underlying function during early-stopping

@pytest.mark.parametrize(
    "MLPEstimator, score_func",
    [(MLPClassifier, accuracy_score), (MLPRegressor, r2_score)],
)
@pytest.mark.parametrize("solver", ["sgd", "adam"])
def test_mlp_early_stopping_score_sample_weight(MLPEstimator, score_func, monkeypatch, solver):
    """Check that sample_weight is passed to scoring functions when using
    early-stopping."""
    rng = np.random.RandomState(42)
    sample_weight = rng.randint(1, 6, size=y_iris.size)

    def mock_score(y_true, y_pred, sample_weight=None):
        mock_score.last_sample_weight = sample_weight
        return score_func(y_true, y_pred, sample_weight=sample_weight)

    mock_score.last_sample_weight = None

    if isinstance(MLPEstimator(), MLPClassifier):
        monkeypatch.setattr(
            "sklearn.neural_network._multilayer_perceptron.accuracy_score", mock_score
        )
    else:
        monkeypatch.setattr(
            "sklearn.neural_network._multilayer_perceptron.r2_score", mock_score
        )

    mlp = MLPEstimator(
        hidden_layer_sizes=(10,),
        solver=solver,
        max_iter=10,
        early_stopping=True,
        validation_fraction=0.5,
        random_state=42,
    )

    mlp.fit(X_iris, y_iris, sample_weight=sample_weight)

    assert mock_score.last_sample_weight is not None
    assert mock_score.last_sample_weight.shape == (sample_weight.size // 2,)

glemaitre · 2024-11-07T09:55:14Z

sklearn/utils/_test_common/instance_generator.py

+    MLPClassifier: {
+        "check_sample_weight_equivalence": [
+            dict(solver="lbfgs"),
+            dict(solver="lbfgs", early_stopping=True, n_iter_no_change=1, tol=1e-2),
+        ]
+    },
+    MLPRegressor: {
+        "check_sample_weight_equivalence": [
+            dict(solver="sgd", tol=1e-2, random_state=42),
+        ]
+    },


Stupid me, there is no early-stopping for lbfgs so we are actually not testing anything. So my proposal was useless here.

Suggested change

MLPClassifier: {

"check_sample_weight_equivalence": [

dict(solver="lbfgs"),

dict(solver="lbfgs", early_stopping=True, n_iter_no_change=1, tol=1e-2),

]

},

MLPRegressor: {

"check_sample_weight_equivalence": [

dict(solver="sgd", tol=1e-2, random_state=42),

]

},

MLPClassifier: {

"check_sample_weight_equivalence": dict(solver="lbfgs")

},

MLPRegressor: {

"check_sample_weight_equivalence": dict(solver="sgd", tol=1e-2, random_state=42)

},

I'll provide a test that at least check that we pass sample-weight to the underlying scorer when computing early-stopping.

We could indeed have an estimator-specific test with a fixed random seed to check that it's doing something reasonable on a synthetic dataset crafted specifically to lead to very different predictions if the weights were not handled properly:

import numpy as np from sklearn.neural_network import MLPClassifier X = np.asarray([0, 0, 0, 1, 1, 1, 1]).reshape(-1, 1) y = np.asarray([0, 1, 1, 2, 2, 2, 1]) sw = np.asarray([2, 0, 0, 0.1, 0.1, 0.1, 3]) global_random_seed = None # TODO: set me when adapting this to pytest. solver = "sgd" # TODO: adapt to use pytest.parametrize instead. clf = MLPClassifier( random_state=global_random_seed, solver=solver, max_iter=10_000 ).fit(X, y, sample_weight=sw) np.testing.assert_array_equal(clf.predict([[0], [1]]), [0, 1])

I would expect this to pass for all solvers and for any value of global_random_seed.

For MLPRegressor it might be slightly trickier to write a similar test, but I suppose this is possible as well.

@glemaitre Do we really need that? You can read the code and convince yourself that sample weights are passed. I have the impression that we get neurotic if we go for that kind of rigor.
Such a test would also be a unprecedented, we don't have it for SGD nor GBT nor HGBT, ...

For me the benefit is twofold:

We had in the past in multiple estimators this bug where we don't pass the sample weight the evaluation internal metric (for early-stopping or internal CV). So there is not reason that this issue happen in the future if we don't have a non-regression test.

IMO, the test also serve as documentation: e.g. I could find about it when going through the test suite of the histogram gradient boosting.

So even though we might never need to touch this code and it might seem an overkill, I still find it useful.

I hesitate to commit this.

I would rather test values of best_validation_score_ than the monkeypatch solution.

I've to stop for today.

I would rather test values of best_validation_score_ than the monkeypatch solution.

This is a viable alternative as well. I might get some time to make a small test tonight inside this direction.

sklearn/neural_network/_multilayer_perceptron.py

ogrisel · 2024-11-07T14:13:00Z

sklearn/utils/_test_common/instance_generator.py

+    MLPClassifier: {
+        "check_sample_weight_equivalence": [
+            dict(solver="lbfgs"),
+            dict(solver="lbfgs", early_stopping=True, n_iter_no_change=1, tol=1e-2),
+        ]
+    },
+    MLPRegressor: {
+        "check_sample_weight_equivalence": [
+            dict(solver="sgd", tol=1e-2, random_state=42),
+        ]
+    },


We could indeed have an estimator-specific test with a fixed random seed to check that it's doing something reasonable on a synthetic dataset crafted specifically to lead to very different predictions if the weights were not handled properly:

import numpy as np from sklearn.neural_network import MLPClassifier X = np.asarray([0, 0, 0, 1, 1, 1, 1]).reshape(-1, 1) y = np.asarray([0, 1, 1, 2, 2, 2, 1]) sw = np.asarray([2, 0, 0, 0.1, 0.1, 0.1, 3]) global_random_seed = None # TODO: set me when adapting this to pytest. solver = "sgd" # TODO: adapt to use pytest.parametrize instead. clf = MLPClassifier( random_state=global_random_seed, solver=solver, max_iter=10_000 ).fit(X, y, sample_weight=sw) np.testing.assert_array_equal(clf.predict([[0], [1]]), [0, 1])

I would expect this to pass for all solvers and for any value of global_random_seed.

For MLPRegressor it might be slightly trickier to write a similar test, but I suppose this is possible as well.

ogrisel · 2024-11-08T11:15:17Z

Reposting an earlier suggestion that disappeared because it was posted on a thread for an outdated piece of code:

check_sample_weight_equivalence can currently only properly test weight semantics on estimators with hyperparameter settings that lead to a deterministic fit method.

So as a complement, we could test that sample_weight are respected when fitting with a stochastic solver with a dedicated test. This test could use fixed random seeds to check that the solver is converging to something reasonable on a synthetic dataset crafted specifically to lead to very different predictions if the weights were not handled properly:

import numpy as np
from sklearn.neural_network import MLPClassifier

X = np.asarray([0, 0, 0, 1, 1, 1, 1]).reshape(-1, 1)
y = np.asarray([0, 1, 1, 2, 2, 2, 1])
sw = np.asarray([2, 0, 0, 0.1, 0.1, 0.1, 3])

global_random_seed = None  # TODO: set me when adapting this to pytest.
solver = "sgd"  # TODO: adapt to use pytest.parametrize instead.
clf = MLPClassifier(
    random_state=global_random_seed, solver=solver, max_iter=10_000
).fit(X, y, sample_weight=sw)
np.testing.assert_array_equal(clf.predict([[0], [1]]), [0, 1])

I would expect this to pass for all solvers and for any value of global_random_seed.

For MLPRegressor it might be slightly trickier to write a similar test, but I suppose this is possible as well (or maybe even reusing this code but with an assert_allclose with suitable atol in the last line.

EDIT: rephrased and fixed meaningless long sentence in the second paragraph.

lorentzenchr · 2024-11-08T13:17:46Z

Since check_sample_weight_equivalence can currently only properly test weight semantics on estimator with hyperparameter settings that lead to a deterministic fit method could indeed have an estimator-specific test with a fixed random seed to check that it's doing something reasonable on a synthetic dataset crafted specifically to lead to very different predictions if the weights were not handled properly:

Firstly, I don't understand this long sentence. Secondly, yes we have stochastic solvers with check_sample_weight_equivalence as specified in PER_ESTIMATOR_CHECK_PARAMS. Thirdly, I don't get what your additional test should actually check (what is not yet tested).

doc/whats_new/upcoming_changes/sklearn.neural_network/30155.feature.rst

ogrisel · 2024-11-12T14:59:21Z

Firstly, I don't understand this long sentence.

Sorry, I must have been distracted when writing this comment. I fixed it by splitting in several sentences.

Secondly, yes we have stochastic solvers with check_sample_weight_equivalence as specified in PER_ESTIMATOR_CHECK_PARAMS.

I had not realized that the common estimator check would pass for MLPRegressor with solver="sgd". That's quite unexpected. I suppose this might be a bit brittle and might start to fail if we change check_sample_weight_equivalence to make it sensitive, as being explored in #30143.

Thirdly, I don't get what your additional test should actually check (what is not yet tested).

I agree that if the common check is stable enough to not be marked XFAIL, then it's good enough. Let's keep this PR as is for now and maybe reevaluate for alternative testing only when actually needed.

adamjstewart · 2025-01-06T15:24:22Z

Anything I can do to help get this merged? Not sure where it's stuck in the review process.

lorentzenchr · 2025-01-06T19:28:43Z

@adamjstewart You could review if you you judge yourself knowledgeable for this PR.

adamjstewart · 2025-01-07T09:04:38Z

I wouldn't exactly judge myself as knowledgeable of sklearn internals, but I have been using your branch locally for my experiments and haven't discovered any errors yet.

lorentzenchr · 2025-01-07T10:32:12Z

@adamjstewart A review would really help.

adamjstewart

Ran this and it didn't crash. Would love to see type hints, but the existing code is also missing type hints.

ogrisel

Thanks, @lorentzenchr. I finally reviewed this PR. Just a few minor renaming suggestions below. I can take care of them and merge if you wish.

sklearn/neural_network/_multilayer_perceptron.py

sklearn/utils/_test_common/instance_generator.py

lorentzenchr · 2025-01-16T13:52:08Z

I can take care of them and merge if you wish.

Yes, please - and thank you for the offer.

…_dense_data

ogrisel · 2025-01-16T14:25:07Z

BTW, I tried merging this branch into my local branch for #30143, and I confirmed that the refined equivalence check will fail for the SGD solver. Lowering the tol of the SGD solver, would be enough not make the check pass: a more costly statistical test over several random state values would be required to thoroughly check the equivalence with the stochastic gradient solver.

But let's do that in a follow-up PR (e.g. in #30143 itself or another one).

ENH add sample weight support to MLP

6fd6a4c

github-actions bot added the module:neural_network label Oct 26, 2024

lorentzenchr added 2 commits October 26, 2024 18:26

DOC add whatsnew entry

04334de

TST more tests

96e8f01

lorentzenchr mentioned this pull request Nov 3, 2024

feat: Support sample weight for MLP #25646

Closed

15 tasks

DOC add Zach Shu as author in whatsnew entry

a1c9e71