Add sample_weight to the calculation of alphas in enet_path and LinearModelCV #22933

s-banach · 2022-03-24T02:00:07Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Modifies _alpha_grid function in linear_model._coordinate_descent to accept a sample_weight argument.
In addition to adding this argument, I have removed the argument copy_X, which no longer seems to be needed.

The function _alpha_grid is called in two places, enet_path and LinearModelCV.
The new sample_weight argument is not used by enet_path, but it is used by LinearModelCV.

Any other comments?

Thanks for your patience, this is my first PR and I'm trying my best.

Modifies _alpha_grid function in linear_model._coordinate_descent to accept a sample_weight argument. In addition to adding this argument, I have removed the argument copy_X, which no longer seems to be needed. The function _alpha_grid is called in two places, enet_path and LinearModelCV. The new sample_weight argument is not used by enet_path, but it is used by LinearModelCV.

The old formula for alpha_max was correct, I misunderstood the way it handles Multitask regression.

Try to deal with Multitarget and single target regression

s-banach · 2022-03-24T03:42:00Z

I have never heard of multi task regression before, so I didn't write the sample_weight code to take that into account. On the other hand, it doesn't seem like the MultiTask regression classes take a sample_weight parameter, so maybe it's not a problem?

) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

…ssian data (#20653) Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>

lorentzenchr · 2022-03-24T18:49:18Z

@s-banach Thank you very much for giving it a try: solving this bug as well as opening your first PR. I'll try to review as time permits.

lorentzenchr · 2022-03-25T07:02:15Z

It would be a good idea to write a test in sklearn/linear_model/test/test_coordinate_descent.py where we check that the largest generated alpha gives all coefficients equal to zero. I couldn't find such a test, so we need to write a new one.

…purity_decrease (#22915)

Another case I didn't think about because I never use multi target regression.

…negative (#19085) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>

I don't need to make the sparse diagonal matrix here.

I had this (1-mu) thing that I copied from somewhere but doesn't seem to make sense.

s-banach · 2022-03-25T12:59:29Z

Dr. Lorentzen.
I do apologize for making so many commits.
My formula was originally wrong, because I copied it from some stack exchange post.
I believe it is correct now.

I can try writing a test. (Or I can leave that to you, if you no longer trust my code after seeing this big mess I've made!)
I think sometimes the coefficient will be exactly zero, and sometimes the value of alpha I compute results in a coefficient on the order of 1e-16. Is it important that it be exactly zero, or just approximately zero?

If product is left as a scipy matrix, then product**2 will not be "element-wise power"

lorentzenchr · 2022-03-25T13:17:10Z

@s-banach No problem at all. In the end, you need to write such a test because without it, we can't merge this PR. It is good practice (e.g. test driven development) to write such a test early. Tests are there to proof that the implementation does exactly what it is supposed to do.

What I notice is that you changed to code quite a lot and I fear that this way, it might be hard to detect errors. I would suggest to make smaller changes, i.e. only introduce the additional functionality - and a test for it - nothing more.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Now the logic should be unchanged from original, if sample_weight is None

s-banach · 2022-03-25T14:39:58Z

@lorentzenchr
In the latest commit, I have followed your suggestion.
The code should be identical to the old code, except in the case (sample_weight is not None) and fit_intercept.
I will think about a test.

(P.S. In the end, _alpha_grid probably shouldn't have any of it's own scaling/centering logic. It should probably just involve a single call to the same _pre_fit function that ElasticNet itself uses. When _pre_fit is eventually rewritten to support sample_weight with sparse data, perhaps someone can take a look at this.)

…22677) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

…#22950) * ENH support Ridge(fit_intercept=True, solver="lsqr") for sparse input * DOC add whatsnew * FIX set solver to lsqr * DOC _get_rescaled_operator * DOC add more details

…#18975) Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

…2663) Co-authored-by: Jérémie du Boisberranger Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

Co-authored-by: Juan Gomez <jgomez75@gatech.edu>

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

* MNT replace pinvh by solve * DOC more info for svd solver * TST rewrite test_ridge * MNT remove test_ridge_singular * MNT restructure into several tests * MNT remove test_toy_ridge_object * MNT remove test_ridge_sparse_svd This is tested in test_ridge_fit_intercept_sparse_error. * TST exclude cholesky from singular problem * CLN two fixes * MNT parametrize test_ridge_sample_weights * MNT restructure test_ridge_sample_weights * CLN tighten tolerance for sag solver * CLN try to fix saga tolerance * CLN make test_ridge_sample_weights nicer * MNT remove test_ridge_regression_sample_weights * MNT rename to test_ridge_regression_sample_weights * CLN make test_ridge_regression_unpenalized pass for all random seeds * CLN make tests pass for all random seeds * DOC fix typos * TST skip cholesky for singular problems * MNT move up test_ridge_regression_sample_weights * CLN set skip reason as comment

* FIX ColumnTransformer.get_feature_names_out with string slices * Fixed black formatting * Added tests * Updated whats new 1.1 section * Simplified logic and added slices containing two elements * Fixed black linting * Linting cleanup, revert API change * Fixed linting

…ius neighborhood (#22829)

This reverts commit cc3fdbf.

* TST Adapt test_optics.py to test implementations on 32bit datasets * TST Use global_dtype * Apply review comments

…#22676)

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

It merely asserts that alpha_max is large enough to force the coefficients to zero. It doesn't test anything else about alpha_max or _alpha_grid.

s-banach · 2022-04-04T01:59:24Z

Closing because I copy-pasted from your repo into my repo and now it says I'm making 66 commits.

lorentzenchr · 2022-04-04T09:14:31Z

You could just restore commit c01e629, see e.g. https://stackoverflow.com/questions/4114095/how-do-i-revert-a-git-repository-to-a-previous-commit.

If you do so, this would need a force push which we usually avoid - but in this situation it might be ok.

I don't know which steps lead to the many commits. What we usually do is that we merge the branch main into the current PR's branch, but only if github shows merge conflicts.

s-banach · 2022-04-04T11:59:00Z

Thanks, I should learn more about using git.
Since I didn't know what to do, I opened a new PR #23045.
The code is simpler now, since you've added sample_weight to _preprocess_data.

github-actions bot added the module:linear_model label Mar 24, 2022

s-banach added 2 commits March 23, 2022 23:06

Revert formula for alpha_max

2f90771

The old formula for alpha_max was correct, I misunderstood the way it handles Multitask regression.

Reshape matrix

2915eee

Try to deal with Multitarget and single target regression

jeremiedbb and others added 4 commits March 24, 2022 09:23

TST Fix test failing scipy nightly (#22935)

9205fb4

TST replace pytest.warns(None) in metrics/test_classification.py (#22888

8d9c7a7

) Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

BLD Monkeypatch windows build to stablize build (#22693)

8a6cf1a

FIX PowerTransformer Yeo-Johnson auto-tuning on significantly non-Gau…

c3f81c1

…ssian data (#20653) Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com>

danifernandes-hub and others added 3 commits March 24, 2022 22:05

TST removed pytest.warns(None) in test_data.py (#22938)

14a4934

TST replace pytest.warns(None) in test_function_transformer.py (#22937)

f3f8967

MNT fix typo in tree test name (#22943)

f89a40b

adam2392 and others added 7 commits March 25, 2022 10:41

[MAINT] Separate unit tests in test_tree.py for pickling and min_im…

67eb4e5

…purity_decrease (#22915)

FIX Removes warning in HGBT when fitting on dataframes (#22908)

4408dfb

ENH add sample_weight to sparse coordinade descent (#22808)

bf0ece8

y.mean(axis=0)

e93a6b4

Another case I didn't think about because I never use multi target regression.

FIX Fix recall in multilabel classification when true labels are all …

ea0571f

…negative (#19085) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: jeremiedbb <jeremiedbb@yahoo.fr>

Fixed product formula.

a586ced

I don't need to make the sparse diagonal matrix here.

Change the formula again

2d3e26d

I had this (1-mu) thing that I copied from somewhere but doesn't seem to make sense.

Else product may be matrix

1a4c9d2

If product is left as a scipy matrix, then product**2 will not be "element-wise power"

GaelVaroquaux and others added 2 commits March 25, 2022 14:36

DOC Update comms team (#22942)

48f363f

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Update _coordinate_descent.py

c01e629

Now the logic should be unchanged from original, if sample_weight is None

jjerphan and others added 3 commits March 25, 2022 18:06

TST use global_dtype in feature_selection/tests/test_mutual_info.py (#…

e7d718c

…22677) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <jeremiedbb@users.noreply.github.com>

DOC Makes Sphinx reference to Bunch a class (#22948)

cb547ae

DOC fix typo (#22958)

511b5f5

jnothman and others added 26 commits March 29, 2022 14:42

DOC no longer funded by sydney university (#22980)

b039e8a

ENH enable LSQR solver with intercept term in Ridge with sparse input (…

8bd7503

…#22950) * ENH support Ridge(fit_intercept=True, solver="lsqr") for sparse input * DOC add whatsnew * FIX set solver to lsqr * DOC _get_rescaled_operator * DOC add more details

DOC Link directly developer docs in the navbar (#22550)

fab022e

[MRG] Refactor MiniBatchDictionaryLearning and add stopping criterion (…

a23c2ed

…#18975) Co-authored-by: Olivier Grisel <olivier.grisel@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

DOC: fix typo (#22994)

8b2d137

TST use global_dtype in sklearn/neighbors/tests/test_neighbors.py (#2…

7931262

…2663) Co-authored-by: Jérémie du Boisberranger Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

DOC Update notebook style for plot_bayesian_ridge_curvefit (#22916)

24ef388

Co-authored-by: Juan Gomez <jgomez75@gatech.edu>

DOC fix docstring of EllipticEnvelope.fit (#22997)

c04703e

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

MNT Fix pytest random seed pluging with vscode test discovery (#22976)

86c62cf

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

DOC Switch to gender neutral terms for sister function (#23003)

7e11c32

Rename triage team to contributor experience team (#22970)

7811a3a

DOC Ensures that homogeneity_score passes numpydoc validation (#23006)

4253eac

CI use circleci artifact redirector GH action (#22991)

9394a86

MNT remove artifact_path file now unused (#23012)

c61086b

MAINT Convert OpenMP scheduling to 'static' in pairwise distances rad…

6d17cd2

…ius neighborhood (#22829)

DOC precise stopping criteria for coordinate descent

cc3fdbf

MNT Revert "DOC precise stopping criteria for coordinate descent"

6931d91

This reverts commit cc3fdbf.

TST use global_dtype in sklearn/cluster/tests/test_optics.py (#22668)

5bd89ab

* TST Adapt test_optics.py to test implementations on 32bit datasets * TST Use global_dtype * Apply review comments

TST use global_dtype in sklearn/manifold/tests/test_locally_linear.py (…

5298b3a

…#22676)

DOC Ensure completeness_score passes numpydoc validation (#23016)

59428da

MNT ensure creation of dataset is deterministic in SGD (#19716)

b4da3b4

Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

catch up with scikit-learn main branch

c9cc67b

Merge branch 'main' of https://github.com/s-banach/scikit-learn

1fbe40a

Simple test for alpha_max with sample_weight

5da78f4

It merely asserts that alpha_max is large enough to force the coefficients to zero. It doesn't test anything else about alpha_max or _alpha_grid.

s-banach closed this Apr 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add sample_weight to the calculation of alphas in enet_path and LinearModelCV #22933

Add sample_weight to the calculation of alphas in enet_path and LinearModelCV #22933

Uh oh!

s-banach commented Mar 24, 2022

Uh oh!

s-banach commented Mar 24, 2022 •

edited

Loading

Uh oh!

lorentzenchr commented Mar 24, 2022

Uh oh!

lorentzenchr commented Mar 25, 2022

Uh oh!

s-banach commented Mar 25, 2022

Uh oh!

lorentzenchr commented Mar 25, 2022

Uh oh!

s-banach commented Mar 25, 2022 •

edited

Loading

Uh oh!

s-banach commented Apr 4, 2022

Uh oh!

lorentzenchr commented Apr 4, 2022

Uh oh!

s-banach commented Apr 4, 2022

Uh oh!

Uh oh!

Uh oh!

Add sample_weight to the calculation of alphas in enet_path and LinearModelCV #22933

Add sample_weight to the calculation of alphas in enet_path and LinearModelCV #22933

Uh oh!

Conversation

s-banach commented Mar 24, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

s-banach commented Mar 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Mar 24, 2022

Uh oh!

lorentzenchr commented Mar 25, 2022

Uh oh!

s-banach commented Mar 25, 2022

Uh oh!

lorentzenchr commented Mar 25, 2022

Uh oh!

s-banach commented Mar 25, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-banach commented Apr 4, 2022

Uh oh!

lorentzenchr commented Apr 4, 2022

Uh oh!

s-banach commented Apr 4, 2022

Uh oh!

Uh oh!

s-banach commented Mar 24, 2022 •

edited

Loading

s-banach commented Mar 25, 2022 •

edited

Loading