-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] sample_weight support #1574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ndawe
commented
Jan 14, 2013
- metrics (see [MRG+1] sample_weight support in metrics #3043)
- scorer interface (see [WIP] scorer: add sample_weight support #3098 and continued in [MRG] scorer: add sample_weight support (+test) #3401)
- grid_search
- cross_validation
- learning_curve
- rfe
I'd really like to get in #1381 first, hopefully shortly after the release. Not sure how this fits into the interface implemented there. |
sklearn/grid_search.py
Outdated
sample_weight_test = sample_weight[safe_mask(sample_weight, test)] | ||
fit_params['sample_weight'] = sample_weight_train | ||
score_params['sample_weight'] = sample_weight_test | ||
|
||
if y is not None: | ||
y_test = y[safe_mask(y, test)] | ||
y_train = y[safe_mask(y, train)] | ||
clf.fit(X_train, y_train, **fit_params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there are sample weights, shouldn't we be training with them too? Or do we need separate parameters for fit_sample_weight
and score_sample_weight
??? (Perhaps such prefixing is a general solution to this trouble.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Especially when fit
is called with a sample_weight
arg for best_estimator_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jnothman yes, sample_weight
is used in both the calls to fit and score in fit_grid_point
. There is no need for separate sample_weight
arrays since it is treated the same way X
and y
are (split according to test and train subarrays). Also, if refit
is true the sample weights are again used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh. I missed the modification of fit_params
below.
sklearn/metrics/metrics.py
Outdated
if sample_weight is not None: | ||
tps = (y_true * sample_weight).cumsum()[threshold_idxs] | ||
else: | ||
tps = y_true.cumsum()[threshold_idxs] | ||
fps = 1 + threshold_idxs - tps |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this line must also take account for sample_weight
, or else fps
might be less than 0.
The test for this function will be that pr_curve
outputs values corresponding to naively calculating precision_recall_fscore_support
at each threshold. It would be a nice test to have anyway to check our implementation is sane.
It probably comes down to personal preference, but I generally dislike the metrics being calculated multiple times depending on the parameters. I.e. I would much rather something like this at the top, then treat having if sample_weight is None:
sample_weight = 1
total_weight = len(y)
else:
total_weight = np.sum(sample_weight) (assuming |
sklearn/grid_search.py
Outdated
@@ -190,7 +190,8 @@ def __len__(self): | |||
return self.n_iter | |||
|
|||
|
|||
def fit_grid_point(X, y, base_clf, clf_params, train, test, scorer, | |||
def fit_grid_point(X, y, sample_weight, base_clf, clf_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess for clean code, handling sample_weight
specially makes some sense, but I wonder: what other parameters will require being sliced per fold that users will expect similar support for?
Couple of questions:
|
In particular, how does weighting interact with multilabel outputs? I assume the weight is applied to the instance as a whole, spread uniformly over the proposed labels for that instance (where that's meaningful). |
You also need a more general invariance test: for any metric, integer array |
I wrote:
|
sklearn/grid_search.py
Outdated
@@ -683,8 +706,11 @@ def fit(self, X, y=None, **params): | |||
Target vector relative to X for classification; | |||
None for unsupervised learning. | |||
|
|||
sample_weight : array-like, shape = [n_samples], optional | |||
Sample weights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should specify that the weights are applied both in fitting and scoring, and that estimator.fit
and scoring
must support the sample_weight
parameter. (And same in RandomizedSearchCV
of course.)
This may be close to all the testing you need: def test_sample_weight_invariance():
y1, y2, _ = make_prediction(binary=True)
int_weights = np.randint(10, size=y1.shape)
for name, metric in METRICS_WITH_SAMPLE_WEIGHT:
unweighted_score = metric(y1, y2, sample_weight=None)
assert_equal(unweighted_score,
metric(y1, y2, sample_weight=np.ones(shape=y1.shape)),
msg='For %s sample_weight=None is not equivalent to '
'sample_weight=ones' % name)
weighted_score = metric(np.repeat(y1, int_weights),
np.repeat(y2, int_weights))
assert_not_equal(unweighted_score, weighted_score,
msg="Unweighted and weighted scores are unexpectedly "
"equal for %s" % name)
assert_equal(weighted_score,
metric(y1, y2, sample_weight=int_weights),
msg='Weighting %s is not equal to repeating samples'
% name)
for scaling in [2, 0.3]:
assert_equal(weighted_score,
metric(y1, y2, sample_weight=int_weights * scaling),
msg='%s sample_weight is not invariant under scaling'
% name)
assert_equal(weighted_score,
metric(y1, y2, sample_weight=list(scaling)),
msg='%s sample_weight is not invariant to list vs array'
% name) |
@jnothman thanks for your help! I finally had some time to get back to this. This PR should now be ready. Regarding your questions:
Yes. Performance on samples with larger weights matters more than performance on samples with smaller weights. So each sample should contribute by its weight to the overall score.
Probably, but as you can see in the PR the changes are quite trivial. |
In other words, I am unaware of existing implementations of all of these metrics that account for sample weights. That would be helpful though. But, again, the modifications required here are not too serious. In my field of research (high energy particle physics), samples are almost always weighted and the weights can vary a lot. So any classifier I use must account for the sample weights and any evaluation of a classifier's performance must also account for the sample weights. |
One thing that would be worth thinking is the integration with the new scorer interface. |
Hm we could have reviewed two pr. One with the metrics and one on top of the other with grid searching and cross validation score. |
sklearn/metrics/metrics.py
Outdated
@@ -1812,7 +1839,7 @@ def recall_score(y_true, y_pred, labels=None, pos_label=1, average='weighted'): | |||
|
|||
@deprecated("Function zero_one_score has been renamed to " | |||
'accuracy_score'" and will be removed in release 0.15.") | |||
def zero_one_score(y_true, y_pred): | |||
def zero_one_score(y_true, y_pred, sample_weight=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is deprecated and will be remove in the next release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you are looking for zero_one_loss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, thanks.
@arjoly yes, true. But, I need the weighted score metrics from this PR (for the grid_search unit tests), and I don't want to have a bunch of cherry-picked commits in the other PR. |
I see... would it be fine to keep rebasing a grid_search PR on this one? I can split them again... Although all I have left is to add sample_weight to the scorers and then add unit tests. |
For some metrics, I agree that taking into the weight is straightforward, such as the mean squared error. But for other metrics such as the |
@arjoly, I think definitions are pretty straight forward in terms of invariance with integer weights. |
@ndawe, is this still a WIP, or is it ready to be reviewed for merging? |
Changes Unknown when pulling 0ce2180 on ndawe:weighted_scores into * on scikit-learn:master*. |
Cleaned up the history. Will now start a PR for sample_weight in the scorer interface. |
The grid_search, rfe, learning_curve and cross_validation part of this PR should wait for #3340, which will conflict in many ways. |
Thanks Noel! I was thinking that the refactoring in #3340 might be a bit too big (and therefore slow to get in) while the remaining part of this PR are much more manageable, so I'd rather try to get #3340 broken into two and merge this faster. IMO these are important usability issues. We'll take this up tomorrow at the sprint. Ping @amueller, @arjoly and @GaelVaroquaux |
@jnothman, @ndawe I really need this right now because of sample groups (ie query_id). I would also prefer the convention of giving them names that start with sample: |
closing as superseeded by #13432. This code is too outdated to be useful, I think. |