[MRG] Sample weights for ElasticNetCV #16449

lorentzenchr · 2020-02-16T12:49:19Z

Reference Issues/PRs

Partially solves #3702: Adds sample_weight to ElasticNetCV and LassoCV, but only for dense feature array X.
It is a follow up of PR #15436.

Any other comments?

DO NOT MERGE BEFORE #15436 as it is based on that branch.

Co-Authored-By: Alexandre Gramfort <alexandre.gramfort@m4x.org>

lorentzenchr · 2020-02-16T12:53:58Z

@agramfort How would you like to calculate the cross-validated mean square error? Weighted by sample_weight or unweighted. I think this was discussed somewhere else. I would prefer the weighted version.

Furthermore, just as info, the current approach to rescale X by sqrt(sw) might use more memory copies as the unweighted version.

rth · 2020-02-16T14:07:36Z

The diff on Github doesn't seem to take into account the merge #15436 PR, maybe merging master would help?

How would you like to calculate the cross-validated mean square error? Weighted by sample_weight or unweighted. I think this was discussed somewhere else. I would prefer the weighted version.

I couldn't find the corresponding issue, maybe it could be worth opening one. There was #15651 but it's a different topic.

In the short term, I think we want to be consistent with GridSearchCV(ElasticNet()).fit(X, y, sample_weight). As far as I can tell from

scikit-learn/sklearn/model_selection/_search.py

Line 682 in b194674

out = parallel(delayed(_fit_and_score)(clone(base_estimator),

and the corresponding _fit_and_score function, sample_weight is not used for scoring there. We could discuss what is the right thing to do (or whether there should be an option to allow taking into account sample weight for scoring) in a separate issue.

lorentzenchr · 2021-05-07T21:21:14Z

@jjerphan If you feel comfortable, I'd appreciate your review approval very much:smirk:

jjerphan

Hi @lorentzenchr,

Here are a few lasts suggestions before approval. 🙂

doc/whats_new/v1.0.rst

sklearn/linear_model/_coordinate_descent.py

sklearn/linear_model/tests/test_coordinate_descent.py

jjerphan

LGTM, thanks @lorentzenchr!

lorentzenchr · 2021-05-26T11:39:03Z

Decision in yesterday's dev meeting (2021-05-25): Remove use_weights_in_cv. This functionality will come with SLEP006 (hopefully) and makes this PR non-controversial.

lorentzenchr · 2021-05-29T10:04:09Z

To have it easily referenced, here is the code that is needed for use_weights_in_cv:

LinearModelCV

"""
use_weights_in_cv : bool, default=False
    If `True`, the MSE over test folds is calculated as a weighted average,
    weighted by the sum of `sample_weight` of each test fold. Here,
    `sample_weight=None` acts like `sample_weight=1`, which means the sum
    of weights in the test fold is the number of observations.
"""

fit

if not self.use_weights_in_cv:
    # The mean is computed over folds.
    mean_mse = np.mean(mse_paths, axis=1)
else:
    if sample_weight is None:
        # Note that both len(test) and sample_weight[test].sum() can
        # have different values for different folds.
        sw_paths = [len(test) for train, test in folds]
    else:
        sw_paths = [sample_weight[test].sum() for train, test in folds]
    # The average is computed over folds.
    mean_mse = np.average(mse_paths, axis=1, weights=sw_paths)

test_enet_cv_sample_weight_correctness

@pytest.mark.parametrize("use_weights_in_cv", [True, False])
@pytest.mark.parametrize("fit_intercept", [True, False])
def test_enet_cv_sample_weight_correctness(use_weights_in_cv, fit_intercept):
    """Test that ElasticNetCV with sample weights gives correct results."""
    ...
    if use_weights_in_cv:
      # Note: The order of groups from LeaveOneGroupOut is the same for both.
      assert_allclose(reg_sw.mse_path_, reg.mse_path_)

agramfort

besides LGTM

thx a lot @lorentzenchr for taking the time to dive deep into this

sklearn/linear_model/tests/test_coordinate_descent.py

lorentzenchr · 2021-06-13T10:04:32Z

@rth Do you want to merge? @agramfort and @jjerphan have already approved. This PR only adds sample_weight to some fit methods, nothing more API-wise. test_enet_cv_grid_search assures equivalency with GridSearchCV.

agramfort · 2021-06-26T07:25:03Z

@lorentzenchr you need to rebase this one

rth

Thanks a lot! LGTM. Please merge main in to resolve conflicts.

lorentzenchr · 2021-06-26T10:07:40Z

@rth Main merged and all CI green.

Co-authored-by: Alexandre Gramfort <alexandre.gramfort@m4x.org> Co-authored-by: Christian Lorentzen <lorentzen.ch@googlemail.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Christian Lorentzen and others added 27 commits November 6, 2019 18:12

MNT Simplify _check_sample_weight

f5fb1cc

TST Fix unsupported sample weights

70f5ffb

ENH Support sample weights for dense ElasticNet

d879789

TST raise ValueError when sparse enet with sample weights

c751b0e

Fix sparse logic in _rescale_data

2fd44e0

Remove check for negative sample weights

c83a4ca

TST test _rescale_data_ for sparse input

c088e79

Merge branch 'master' into enet_sample_weights

9ee0042

DOC align parameter doc of sample_weight

95c8e6c

DOC adds docstring to _rescale_data

2a970b2

Simplify _rescale_data with reviewer suggestion

6d77c55

TST invalid order argument in _rescale_data

1317b14

add doc and comment on order='F'

d0b0788

TST 2d target in _rescale_data

64ec8e4

Unsupported arg copy in asformat prior to scipy 1.1.0

810fb1b

Merge branch 'master' into enet_sample_weights

ed02723

Merge branch 'master' into enet_sample_weights

c34f557

TST test sample_weight=number same as sw=None

7bb1e6c

Merge branch 'master' into enet_sample_weights

5aefb7d

DOC fix a typo

056252b

Co-Authored-By: Alexandre Gramfort <alexandre.gramfort@m4x.org>

DOC correct a typo

43a65e7

Co-Authored-By: Alexandre Gramfort <alexandre.gramfort@m4x.org>

CLN remove unwanted line

3523741

MAINT use _astype_copy_false

0b03077

RFC private _set_order function

2ac24b1

TST remove sparse in enet with sample weights

610ed29

Merge branch 'master' into enet_sample_weights

c15d778

DOC what's new

7c875c9

Merge branch 'master' into enet_cv_sw

14c1920

lorentzenchr added 2 commits May 2, 2021 13:39

MNT restructure comment on rescaling sample_weight

d366a28

add comment about different len(test) per fold

d73048f

jjerphan requested changes May 8, 2021

View reviewed changes

address review comments

f6f375e

jjerphan approved these changes May 8, 2021

View reviewed changes

TST add test_enet_cv_grid_search

49a143b

lorentzenchr added 2 commits May 29, 2021 11:27

Merge branch 'main' into enet_cv_sw

0f120ab

Remove use_weights_in_cv

efe0f9a

lorentzenchr added 2 commits May 29, 2021 12:47

fix linter

10fc2b0

Merge branch 'main' into enet_cv_sw

bf11d4b

agramfort reviewed Jun 9, 2021

View reviewed changes

sklearn/linear_model/tests/test_coordinate_descent.py Outdated Show resolved Hide resolved

CLN remove unused parametrize

e4cabf3

lorentzenchr mentioned this pull request Jun 25, 2021

sample_weight for lasso, elastic etc #3702

Closed

3 tasks

rth approved these changes Jun 26, 2021

View reviewed changes

lorentzenchr and others added 5 commits June 26, 2021 10:53

Merge commit '0e7761cdc4f244adb4803f1a97f0a9fe4b365a99' into enet_cv_sw

71554dc

MAINT Adds target_version to black config (scikit-learn#20293)

b7aa297

apply black

fdbd013

Merge remote-tracking branch 'upstream/main' into enet_cv_sw

2c2a82e

CLN reinsert check_consistent_length

0e30c90

rth merged commit 0b1070c into scikit-learn:main Jun 26, 2021

lorentzenchr deleted the enet_cv_sw branch June 26, 2021 11:47

lorentzenchr mentioned this pull request Aug 27, 2021

Why doesn't LassoCV support the sample weight parameter? #20353

Closed

eddiebergman mentioned this pull request Nov 15, 2022

Update scikit learn 1.2 automl/auto-sklearn#1611

Closed

54 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Sample weights for ElasticNetCV #16449

[MRG] Sample weights for ElasticNetCV #16449

lorentzenchr commented Feb 16, 2020 •

edited

Loading

lorentzenchr commented Feb 16, 2020

rth commented Feb 16, 2020 •

edited

Loading

lorentzenchr commented May 7, 2021

jjerphan left a comment

jjerphan left a comment

lorentzenchr commented May 26, 2021 •

edited

Loading

lorentzenchr commented May 29, 2021

agramfort left a comment

lorentzenchr commented Jun 13, 2021

agramfort commented Jun 26, 2021

rth left a comment •

edited

Loading

lorentzenchr commented Jun 26, 2021

[MRG] Sample weights for ElasticNetCV #16449

[MRG] Sample weights for ElasticNetCV #16449

Conversation

lorentzenchr commented Feb 16, 2020 • edited Loading

Reference Issues/PRs

Any other comments?

lorentzenchr commented Feb 16, 2020

rth commented Feb 16, 2020 • edited Loading

lorentzenchr commented May 7, 2021

jjerphan left a comment

Choose a reason for hiding this comment

jjerphan left a comment

Choose a reason for hiding this comment

lorentzenchr commented May 26, 2021 • edited Loading

lorentzenchr commented May 29, 2021

agramfort left a comment

Choose a reason for hiding this comment

lorentzenchr commented Jun 13, 2021

agramfort commented Jun 26, 2021

rth left a comment • edited Loading

Choose a reason for hiding this comment

lorentzenchr commented Jun 26, 2021

lorentzenchr commented Feb 16, 2020 •

edited

Loading

rth commented Feb 16, 2020 •

edited

Loading

lorentzenchr commented May 26, 2021 •

edited

Loading

rth left a comment •

edited

Loading