-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] RidgeGCV with sample weights is broken #4490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] RidgeGCV with sample weights is broken #4490
Conversation
Should we add a common test that sample weights correspond to duplicated data points? |
Good point! How global can this be made? (I am not sure what exceptions there are. Right now it looks to me that estimators finding an exact optimum generally qualify for this. |
|
def test_ridge_gcv_with_sample_weights(): | ||
|
||
n_samples, n_features, n_targets = 20, 5, 7 | ||
X, Y, W, _, _ = make_noisy_forward_data(n_samples, n_features, n_targets) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't this use sklearn.datasets.make_regression
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed, sorry. i copied this in from an ages old pr where i didn't know
about that ;)
will update.
On Thursday, April 2, 2015, Vlad Niculae notifications@github.com wrote:
In sklearn/linear_model/tests/test_ridge.py
#4490 (comment)
:
- test = slice(n_train, None)
- X = rng.randn(n_samples, n_features)
- W = rng.randn(n_features, n_targets)
- Y_clean = X.dot(W)
- if noise_levels is None:
noise_levels = rng.randn(n_targets) *\* 2
- noise_levels = np.atleast_1d(noise_levels) * np.ones(n_targets)
- noise = rng.randn(*Y_clean.shape) * noise_levels * Y_clean.std(0)
- Y = Y_clean + noise
- return X, Y, W, train, test
+def test_ridge_gcv_with_sample_weights():
+
- n_samples, n_features, n_targets = 20, 5, 7
- X, Y, W, _, _ = make_noisy_forward_data(n_samples, n_features, n_targets)
Couldn't this use sklearn.datasets.make_regression?
—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/4490/files#r27685564.
|
We have such invariance testing of sample weights in metrics (also scale invariance of weights, and that |
I'm no expert here but this seems like something that would be good to have. Maybe not for the RC but for the release? |
was thinking of sprinting on adding some more of these. is that too late? On Wednesday, October 14, 2015, Andreas Mueller notifications@github.com
|
some more tests? I guess only the RC will be before the sprint, so we can still cherry pick bugfixes from the sprint into the release. |
5e6c2ea
to
d3a44e7
Compare
have rebased this. Will attempt a common test for a certain number of classifiers / regressors concerning the effect of |
@@ -715,3 +716,57 @@ def test_ridge_fit_intercept_sparse(): | |||
assert_warns(UserWarning, sparse.fit, X_csr, y) | |||
assert_almost_equal(dense.intercept_, sparse.intercept_) | |||
assert_array_almost_equal(dense.coef_, sparse.coef_) | |||
|
|||
|
|||
def make_noisy_forward_data( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the comment should be at the top and start with #
@agramfort do you want to review this? |
The appveyor failure looks real:
|
n_targets=10, | ||
train_frac=.8, | ||
noise_levels=None, | ||
random_state=42): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use a more classical formarting for the args. It's confusing this way.
For instance:
def make_noisy_forward_data(n_samples=100, n_features=200, n_targets=10,
train_frac=.8, noise_levels=None, random_state=42):
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rebase must have gone wrong. I had removed this in favor of a make_regression
@eickenberg are you intending to complete this? |
Fixed by #13350. |
What the title says.
The way it is right now, the sample weights weight the eigenspaces of the Gram matrix, which doesn't seem sensible.
This change is