-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Add sample_weight support to RidgeClassifier #4838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -487,6 +487,35 @@ def test_class_weights(): | |
assert_array_almost_equal(clf.intercept_, clfa.intercept_) | ||
|
||
|
||
def test_class_weight_vs_sample_weight(): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a nice test, but is more appropriate as a common test than one specific to Ridge, and I think it's time for there to be common tests regarding sample weight. They would best rely on:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed @jnothman , nothing special to Ridge here, purely a test for classifiers with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'd wondered whether it was copied from elsewhere. That's upsetting. On 10 June 2015 at 14:28, Trevor Stephens notifications@github.com wrote:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Upsetting how? That it's propagating outside of common tests? Best I can trace right now was @dsullivan7 's #3931 (somewhat similar test on multiplicative weights), then RF & D-Tree in #3961 .. Now here in Ridge. Cannot recall now the exact lineage I used from the RF/tree PR tests. Don't see any duplicated inline comments outside of RF/Tree/Ridge on git FWIW. Note that in tree-based classifier's I tested for equivalency in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Something similar in #4215 coming soon enough too, I imagine, for the rest of the ensemble tests. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Hence the use of assert_same_model On 10 June 2015 at 14:51, Joel Nothman joel.nothman@gmail.com wrote:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Gotcha There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are already common tests for class_weights, and special ones for linear classifiers: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L1035 https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/estimator_checks.py#L1087 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Adding one that checks if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True... as long as we don't have any classification-oriented transformers On 11 June 2015 at 03:46, Andreas Mueller notifications@github.com wrote:
|
||
"""Check class_weights resemble sample_weights behavior.""" | ||
for clf in (RidgeClassifier, RidgeClassifierCV): | ||
|
||
# Iris is balanced, so no effect expected for using 'balanced' weights | ||
clf1 = clf() | ||
clf1.fit(iris.data, iris.target) | ||
clf2 = clf(class_weight='balanced') | ||
clf2.fit(iris.data, iris.target) | ||
assert_almost_equal(clf1.coef_, clf2.coef_) | ||
|
||
# Inflate importance of class 1, check against user-defined weights | ||
sample_weight = np.ones(iris.target.shape) | ||
sample_weight[iris.target == 1] *= 100 | ||
class_weight = {0: 1., 1: 100., 2: 1.} | ||
clf1 = clf() | ||
clf1.fit(iris.data, iris.target, sample_weight) | ||
clf2 = clf(class_weight=class_weight) | ||
clf2.fit(iris.data, iris.target) | ||
assert_almost_equal(clf1.coef_, clf2.coef_) | ||
|
||
# Check that sample_weight and class_weight are multiplicative | ||
clf1 = clf() | ||
clf1.fit(iris.data, iris.target, sample_weight ** 2) | ||
clf2 = clf(class_weight=class_weight) | ||
clf2.fit(iris.data, iris.target, sample_weight) | ||
assert_almost_equal(clf1.coef_, clf2.coef_) | ||
|
||
|
||
def test_class_weights_cv(): | ||
# Test class weights for cross validated ridge classifier. | ||
X = np.array([[-1.0, -1.0], [-1.0, 0], [-.8, -1.0], | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also modify this line:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L285
so that
has_sw
isFalse
ifsample_weight == 1.0
.sample_weight
is currently implemented with a rescaling of the data. We can avoid copying the data in the casesample_weight == 1.0
.Another option is to check for
sample_weight == 1.0
directly in_rescale_data
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight == 1.0 is semantically equivalent to sample_weight.ndim == 0?
On 10 June 2015 at 16:11, Mathieu Blondel notifications@github.com wrote:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For scalar values other than 1, sample_weight has the same effect as changing C or 1/ alpha. So this is not very useful, we could potentially raise an exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added #4846 to track this issue.