[WIP] RidgeGCV with sample weights is broken #4490

eickenberg · 2015-04-02T10:44:18Z

What the title says.

The way it is right now, the sample weights weight the eigenspaces of the Gram matrix, which doesn't seem sensible.

This change is

landscape-bot · 2015-04-02T10:50:12Z

Code quality remained the same when pulling 2496476 on eickenberg:ridge_gcv_sample_weights into 6e54079 on scikit-learn:master.

amueller · 2015-04-02T15:23:24Z

Should we add a common test that sample weights correspond to duplicated data points?

eickenberg · 2015-04-02T16:12:19Z

Good point! How global can this be made? (I am not sure what exceptions there are. Right now it looks to me that estimators finding an exact optimum generally qualify for this. SGDRegressor with its class weights is not and although repeating each sample class_weight times consecutively should yield the same result at batch_size=1, I think this type of estimator should be excluded. We should take this discussion somewhere else, I'll try to submit a PR of what I imagine it could look like.)

landscape-bot · 2015-04-02T18:09:56Z

Repository health decreased by 0.01% when pulling 89299dc on eickenberg:ridge_gcv_sample_weights into 6e54079 on scikit-learn:master.

4 new problems were found (including 0 errors and 2 code smells).
2 problems were fixed (including 0 errors and 0 code smells).

vene · 2015-04-02T18:46:57Z

sklearn/linear_model/tests/test_ridge.py

+def test_ridge_gcv_with_sample_weights():
+
+    n_samples, n_features, n_targets = 20, 5, 7
+    X, Y, W, _, _ = make_noisy_forward_data(n_samples, n_features, n_targets)


Couldn't this use sklearn.datasets.make_regression?

indeed, sorry. i copied this in from an ages old pr where i didn't know
about that ;)

will update.

On Thursday, April 2, 2015, Vlad Niculae notifications@github.com wrote:

In sklearn/linear_model/tests/test_ridge.py
#4490 (comment)
:

test = slice(n_train, None)

X = rng.randn(n_samples, n_features)

W = rng.randn(n_features, n_targets)

Y_clean = X.dot(W)

if noise_levels is None:

noise_levels = rng.randn(n_targets) *\* 2

noise_levels = np.atleast_1d(noise_levels) * np.ones(n_targets)

noise = rng.randn(*Y_clean.shape) * noise_levels * Y_clean.std(0)

Y = Y_clean + noise

return X, Y, W, train, test

+def test_ridge_gcv_with_sample_weights():
+

n_samples, n_features, n_targets = 20, 5, 7

X, Y, W, _, _ = make_noisy_forward_data(n_samples, n_features, n_targets)

Couldn't this use sklearn.datasets.make_regression?

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/4490/files#r27685564.

coveralls · 2015-04-02T20:31:47Z

Coverage increased (+0.04%) to 95.16% when pulling 5e6c2ea on eickenberg:ridge_gcv_sample_weights into 6e54079 on scikit-learn:master.

landscape-bot · 2015-04-02T23:27:48Z

Repository health decreased by 0.02% when pulling 5e6c2ea on eickenberg:ridge_gcv_sample_weights into 6e54079 on scikit-learn:master.

4 new problems were found (including 0 errors and 2 code smells).
2 problems were fixed (including 0 errors and 0 code smells).

jnothman · 2015-06-10T04:13:17Z

Should we add a common test that sample weights correspond to duplicated data points?

We have such invariance testing of sample weights in metrics (also scale invariance of weights, and that sample_weight=None <=> sample_weight=np.ones(...), but it's possible that there are algorithms, such as clusterers, where this is not true). I'd like to see more invariance testing of sample weights and class weights, the test from #4838 included.

amueller · 2015-10-14T18:40:47Z

I'm no expert here but this seems like something that would be good to have. Maybe not for the RC but for the release?

eickenberg · 2015-10-14T18:43:49Z

was thinking of sprinting on adding some more of these. is that too late?

On Wednesday, October 14, 2015, Andreas Mueller notifications@github.com
wrote:

I'm no expert here but this seems like something that would be good to
have. Maybe not for the RC but for the release?

—
Reply to this email directly or view it on GitHub
#4490 (comment)
.

amueller · 2015-10-14T18:51:26Z

some more tests? I guess only the RC will be before the sprint, so we can still cherry pick bugfixes from the sprint into the release.

eickenberg · 2015-10-19T10:03:55Z

have rebased this. Will attempt a common test for a certain number of classifiers / regressors concerning the effect of sample_weight in a different PR

amueller · 2015-10-21T12:25:44Z

sklearn/linear_model/tests/test_ridge.py

@@ -715,3 +716,57 @@ def test_ridge_fit_intercept_sparse():
    assert_warns(UserWarning, sparse.fit, X_csr, y)
    assert_almost_equal(dense.intercept_, sparse.intercept_)
    assert_array_almost_equal(dense.coef_, sparse.coef_)
+
+
+def make_noisy_forward_data(


the comment should be at the top and start with #

amueller · 2015-10-21T12:26:52Z

@agramfort do you want to review this?

ogrisel · 2015-10-21T16:21:20Z

The appveyor failure looks real:

======================================================================
FAIL: tests if the sag regressor performs well
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Python35-x64\lib\site-packages\nose\case.py", line 198, in runTest
    self.test(*self.arg)
  File "C:\Python35-x64\lib\site-packages\sklearn\utils\testing.py", line 317, in wrapper
    return fn(*args, **kwargs)
  File "C:\Python35-x64\lib\site-packages\sklearn\linear_model\tests\test_sag.py", line 442, in test_sag_regressor
    assert_greater(score2, 0.99)
AssertionError: 0.98925947044880735 not greater than 0.99
----------------------------------------------------------------------

ogrisel · 2015-10-21T16:24:49Z

sklearn/linear_model/tests/test_ridge.py

+    n_targets=10,
+    train_frac=.8,
+    noise_levels=None,
+    random_state=42):


please use a more classical formarting for the args. It's confusing this way.

For instance:

def make_noisy_forward_data(n_samples=100, n_features=200, n_targets=10, train_frac=.8, noise_levels=None, random_state=42): ...

rebase must have gone wrong. I had removed this in favor of a make_regression

jnothman · 2017-06-14T03:17:23Z

@eickenberg are you intending to complete this?

ogrisel · 2019-05-10T13:46:59Z

Fixed by #13350.

vene reviewed Apr 2, 2015
View reviewed changes

arjoly mentioned this pull request Jun 9, 2015

[MRG+1] Add sample_weight support to RidgeClassifier #4838

Merged

amueller added the Bug label Jun 9, 2015

amueller added this to the 0.17 milestone Jun 9, 2015

mtrbean mentioned this pull request Oct 13, 2015

RidgeCV and Ridge produce different results when fitted with sample_weight #5364

Closed

eickenberg added 3 commits October 19, 2015 10:37

[BUG] RidgeGCV with sample weights is broken

c932309

FIX fixed sample weights in ridge_gcv eigen

c055969

remove custom make_regression

d3a44e7

eickenberg force-pushed the ridge_gcv_sample_weights branch from 5e6c2ea to d3a44e7 Compare October 19, 2015 08:49

eickenberg mentioned this pull request Oct 21, 2015

[WIP] Common test for sample weight #5461

Closed

amueller reviewed Oct 21, 2015
View reviewed changes

ogrisel reviewed Oct 21, 2015
View reviewed changes

ogrisel changed the title ~~[BUG] RidgeGCV with sample weights is broken~~ [WIP] RidgeGCV with sample weights is broken Oct 22, 2015

ainafp mentioned this pull request Oct 22, 2015

[WIP] Sample weight consistency #5515

Closed

lesteve modified the milestones: 0.17, 0.18 Jul 27, 2016

lesteve modified the milestones: 0.18, 0.17 Jul 27, 2016

amueller modified the milestones: 0.18, 0.19 Sep 22, 2016

jnothman modified the milestones: 0.20, 0.19 Jun 14, 2017

glemaitre modified the milestones: 0.20, 0.21 Jun 13, 2018

jnothman added help wanted Stalled labels Apr 15, 2019

jnothman modified the milestones: 0.21, 0.22 Apr 15, 2019

jeromedockes mentioned this pull request May 2, 2019

[MRG] handle sparse x and intercept in _RidgeGCV #13350

Merged

ogrisel closed this May 10, 2019

Uh oh!

[WIP] RidgeGCV with sample weights is broken #4490

[WIP] RidgeGCV with sample weights is broken #4490

Uh oh!

Conversation

eickenberg commented Apr 2, 2015 • edited by amueller Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

landscape-bot commented Apr 2, 2015

Uh oh!

amueller commented Apr 2, 2015

Uh oh!

eickenberg commented Apr 2, 2015

Uh oh!

landscape-bot commented Apr 2, 2015

Uh oh!

vene Apr 2, 2015

Choose a reason for hiding this comment

Uh oh!

eickenberg Apr 2, 2015

Choose a reason for hiding this comment

Uh oh!

coveralls commented Apr 2, 2015

Uh oh!

landscape-bot commented Apr 2, 2015

Uh oh!

jnothman commented Jun 10, 2015

Uh oh!

amueller commented Oct 14, 2015

Uh oh!

eickenberg commented Oct 14, 2015

Uh oh!

amueller commented Oct 14, 2015

Uh oh!

eickenberg commented Oct 19, 2015

Uh oh!

amueller Oct 21, 2015

Choose a reason for hiding this comment

Uh oh!

amueller commented Oct 21, 2015

Uh oh!

ogrisel commented Oct 21, 2015

Uh oh!

ogrisel Oct 21, 2015

Choose a reason for hiding this comment

Uh oh!

eickenberg Oct 21, 2015

Choose a reason for hiding this comment

Uh oh!

jnothman commented Jun 14, 2017

Uh oh!

ogrisel commented May 10, 2019

Uh oh!

Uh oh!

eickenberg commented Apr 2, 2015 •

edited by amueller

Loading