[WIP] sample_weight support #1574

ndawe · 2013-01-14T00:25:37Z

metrics (see [MRG+1] sample_weight support in metrics #3043)
scorer interface (see [WIP] scorer: add sample_weight support #3098 and continued in [MRG] scorer: add sample_weight support (+test) #3401)
grid_search
cross_validation
learning_curve
rfe

amueller · 2013-01-15T21:44:10Z

I'd really like to get in #1381 first, hopefully shortly after the release. Not sure how this fits into the interface implemented there.

jnothman · 2013-06-01T23:14:32Z

sklearn/grid_search.py

+        sample_weight_test = sample_weight[safe_mask(sample_weight, test)]
+        fit_params['sample_weight'] = sample_weight_train
+        score_params['sample_weight'] = sample_weight_test
+
    if y is not None:
        y_test = y[safe_mask(y, test)]
        y_train = y[safe_mask(y, train)]
        clf.fit(X_train, y_train, **fit_params)


If there are sample weights, shouldn't we be training with them too? Or do we need separate parameters for fit_sample_weight and score_sample_weight??? (Perhaps such prefixing is a general solution to this trouble.)

Especially when fit is called with a sample_weight arg for best_estimator_

@jnothman yes, sample_weight is used in both the calls to fit and score in fit_grid_point. There is no need for separate sample_weight arrays since it is treated the same way X and y are (split according to test and train subarrays). Also, if refit is true the sample weights are again used.

Ahh. I missed the modification of fit_params below.

ndawe · 2013-06-02T11:54:27Z

@jnothman @amueller I will bring this PR back to life now. I need to rebase and resolve a few conflicts.

ndawe · 2013-06-02T13:38:39Z

@jnothman @amueller I've rebased and cleaned up a few things. A few of the metrics now support sample_weights but I think someone more knowledgeable of the other metrics should implement it there. This PR should otherwise be ready for review, particularly for the grid searching with sample weights.

jnothman · 2013-06-02T23:28:59Z

sklearn/metrics/metrics.py

+    if sample_weight is not None:
+        tps = (y_true * sample_weight).cumsum()[threshold_idxs]
+    else:
+        tps = y_true.cumsum()[threshold_idxs]
    fps = 1 + threshold_idxs - tps


I think this line must also take account for sample_weight, or else fps might be less than 0.

The test for this function will be that pr_curve outputs values corresponding to naively calculating precision_recall_fscore_support at each threshold. It would be a nice test to have anyway to check our implementation is sane.

jnothman · 2013-06-02T23:32:23Z

It probably comes down to personal preference, but I generally dislike the metrics being calculated multiple times depending on the parameters. I.e. I would much rather something like this at the top, then treat having sample_weight as the normal case:

if sample_weight is None:
    sample_weight = 1
    total_weight = len(y)
else:
    total_weight = np.sum(sample_weight)

(assuming sample_weight is being used as a multiplier; where it's passed to np.average, np.bincount, etc, it's fine to leave sample_weight=None.)

jnothman · 2013-06-02T23:34:15Z

sklearn/grid_search.py

@@ -190,7 +190,8 @@ def __len__(self):
        return self.n_iter


-def fit_grid_point(X, y, base_clf, clf_params, train, test, scorer,
+def fit_grid_point(X, y, sample_weight, base_clf, clf_params,


I guess for clean code, handling sample_weight specially makes some sense, but I wonder: what other parameters will require being sliced per fold that users will expect similar support for?

jnothman · 2013-06-02T23:39:33Z

Couple of questions:

Does wanting to weight samples when fitting necessarily imply weighting should be used in scoring, and vice-versa?
Are there references or reference implementations of the metrics with weights? Or are we following our intuitions?

jnothman · 2013-06-02T23:42:47Z

In particular, how does weighting interact with multilabel outputs? I assume the weight is applied to the instance as a whole, spread uniformly over the proposed labels for that instance (where that's meaningful).

jnothman · 2013-06-02T23:51:05Z

You also need a more general invariance test: for any metric, integer array sample_weight, and real k, the following should be equal: metric(y1, y2, sample_weight * k) == metric(np.repeat(y1, sample_weight), np.repeat(y2, sample_weight)).

jnothman · 2013-06-03T01:08:26Z

I wrote:

Does wanting to weight samples when fitting necessarily imply weighting should be used in scoring, and vice-versa?
Thinking further: assuming weighting at fit and score time is the most common use case, the user that really wants this flexibility can wrap their scorer or estimator to discard the sample_weight parameter.

jnothman · 2013-06-03T01:09:50Z

sklearn/grid_search.py

@@ -683,8 +706,11 @@ def fit(self, X, y=None, **params):
            Target vector relative to X for classification;
            None for unsupervised learning.

+        sample_weight : array-like, shape = [n_samples], optional
+            Sample weights.


This comment should specify that the weights are applied both in fitting and scoring, and that estimator.fit and scoring must support the sample_weight parameter. (And same in RandomizedSearchCV of course.)

jnothman · 2013-06-05T08:01:03Z

This may be close to all the testing you need:

def test_sample_weight_invariance():
    y1, y2, _ = make_prediction(binary=True)
    int_weights = np.randint(10, size=y1.shape)
    for name, metric in METRICS_WITH_SAMPLE_WEIGHT:
        unweighted_score = metric(y1, y2, sample_weight=None)
        assert_equal(unweighted_score,
                     metric(y1, y2, sample_weight=np.ones(shape=y1.shape)),
                     msg='For %s sample_weight=None is not equivalent to '
                         'sample_weight=ones' % name)

        weighted_score = metric(np.repeat(y1, int_weights),
                                np.repeat(y2, int_weights))
        assert_not_equal(unweighted_score, weighted_score,
                         msg="Unweighted and weighted scores are unexpectedly "
                             "equal for %s" % name)
        assert_equal(weighted_score,
                     metric(y1, y2, sample_weight=int_weights),
                     msg='Weighting %s is not equal to repeating samples'
                         % name)
        for scaling in [2, 0.3]:
            assert_equal(weighted_score,
                         metric(y1, y2, sample_weight=int_weights * scaling),
                         msg='%s sample_weight is not invariant under scaling'
                             % name)
        assert_equal(weighted_score,
                     metric(y1, y2, sample_weight=list(scaling)),
                     msg='%s sample_weight is not invariant to list vs array'
                         % name)

ndawe · 2013-08-22T19:09:20Z

@jnothman thanks for your help! I finally had some time to get back to this. This PR should now be ready.

Regarding your questions:

Does wanting to weight samples when fitting necessarily imply weighting should be used in scoring, and vice-versa?

Yes. Performance on samples with larger weights matters more than performance on samples with smaller weights. So each sample should contribute by its weight to the overall score.

Are there references or reference implementations of the metrics with weights? Or are we following our intuitions?

Probably, but as you can see in the PR the changes are quite trivial.

ndawe · 2013-08-22T19:19:27Z

In other words, I am unaware of existing implementations of all of these metrics that account for sample weights. That would be helpful though. But, again, the modifications required here are not too serious.

In my field of research (high energy particle physics), samples are almost always weighted and the weights can vary a lot. So any classifier I use must account for the sample weights and any evaluation of a classifier's performance must also account for the sample weights.

arjoly · 2013-08-23T08:12:09Z

One thing that would be worth thinking is the integration with the new scorer interface.

arjoly · 2013-08-23T09:07:50Z

Hm we could have reviewed two pr. One with the metrics and one on top of the other with grid searching and cross validation score.

arjoly · 2013-08-23T09:09:59Z

sklearn/metrics/metrics.py

@@ -1812,7 +1839,7 @@ def recall_score(y_true, y_pred, labels=None, pos_label=1, average='weighted'):

 @deprecated("Function zero_one_score has been renamed to "
            'accuracy_score'" and will be removed in release 0.15.")
-def zero_one_score(y_true, y_pred):
+def zero_one_score(y_true, y_pred, sample_weight=None):


This function is deprecated and will be remove in the next release.

Maybe you are looking for zero_one_loss.

Ah, thanks.

ndawe · 2013-08-23T09:11:33Z

@arjoly yes, true. But, I need the weighted score metrics from this PR (for the grid_search unit tests), and I don't want to have a bunch of cherry-picked commits in the other PR.

ndawe · 2013-08-23T09:13:21Z

I see... would it be fine to keep rebasing a grid_search PR on this one? I can split them again... Although all I have left is to add sample_weight to the scorers and then add unit tests.

arjoly · 2013-08-23T11:15:21Z

Are there references or reference implementations of the metrics with weights? Or are we following our intuitions?

Probably, but as you can see in the PR the changes are quite trivial.

For some metrics, I agree that taking into the weight is straightforward, such as the mean squared error. But for other metrics such as the roc_auc_score or r2_score, I would be glad to see some references.

jnothman · 2013-08-24T09:54:00Z

@arjoly, I think definitions are pretty straight forward in terms of invariance with integer weights.

coveralls · 2013-11-06T08:10:14Z

Coverage remained the same when pulling 1ea2c7e on ndawe:weighted_scores into 613cf8e on scikit-learn:master.

jnothman · 2013-11-22T04:33:38Z

@ndawe, is this still a WIP, or is it ready to be reviewed for merging?

coveralls · 2014-04-22T01:43:09Z

Changes Unknown when pulling 0ce2180 on ndawe:weighted_scores into * on scikit-learn:master*.

ndawe · 2014-04-22T01:56:03Z

Cleaned up the history. Will now start a PR for sample_weight in the scorer interface.

coveralls · 2014-05-14T20:32:56Z

Coverage remained the same when pulling bf4cd4c on ndawe:weighted_scores into d62971d on scikit-learn:master.

vene · 2014-07-16T15:40:56Z

The grid_search, rfe, learning_curve and cross_validation part of this PR should wait for #3340, which will conflict in many ways.

ndawe · 2014-07-16T19:47:44Z

Thanks @vene. Sounds good. I've updated the description with a link to your #3401

vene · 2014-07-16T21:20:47Z

Thanks Noel! I was thinking that the refactoring in #3340 might be a bit too big (and therefore slow to get in) while the remaining part of this PR are much more manageable, so I'd rather try to get #3340 broken into two and merge this faster. IMO these are important usability issues. We'll take this up tomorrow at the sprint. Ping @amueller, @arjoly and @GaelVaroquaux

vene · 2014-08-01T12:03:26Z

In the grid search code, would it be conceivable to use a simple convention: that all fit_params that start with the prefix sample_ need to be passed masked to the estimator to be tuned. This would allow to handle sample_weight and sample_group (for learning to rank).

The current convention in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cross_validation.py#L1184 is more general in assuming anything with length n_samples should be masked. The trickier part is differentially routing these to fit, decision_function and the metric.

@jnothman, @ndawe
Actually there's no question of routing: the convention only applies to the fit_params, which naturally should only be sent to fit. We would have a separate scorer_params or something: one for every function of an estimator that _fit_and_score can call.

I really need this right now because of sample groups (ie query_id).

I would also prefer the convention of giving them names that start with sample: sample_weights, sample_groups, etc, it's more explicit and less bug-prone (what if i make an off-by-one error when building the sample_weights vector?)

amueller · 2019-07-14T22:44:51Z

closing as superseeded by #13432. This code is too outdated to be useful, I think.

ndawe added a commit to ndawe/scikit-learn that referenced this pull request Jan 14, 2013

remove weighted_r2_score (leave for next PR scikit-learn#1574)

872ea4b

ndawe mentioned this pull request Jan 14, 2013

MRG: AdaBoost for regression and multi-class classification #522

Merged

amueller mentioned this pull request Feb 13, 2013

GridSearchCV: allow for passing extra vectors to score/loss function? #1179

Closed

jnothman mentioned this pull request Apr 16, 2013

ENH extensible parameter search results #1787

Closed

jnothman reviewed Jun 1, 2013
View reviewed changes

jnothman reviewed Jun 2, 2013
View reviewed changes

jnothman reviewed Jun 3, 2013
View reviewed changes

ndawe mentioned this pull request Aug 23, 2013

[WIP] grid_search: add sample_weight support #2382

Closed

arjoly reviewed Aug 23, 2013
View reviewed changes

ndawe mentioned this pull request Apr 6, 2014

[MRG+1] sample_weight support in metrics #3043

Closed

ndawe changed the title ~~[MRG] sample_weight support~~ [WIP] sample_weight support Apr 9, 2014

ndawe mentioned this pull request Apr 22, 2014

[WIP] scorer: add sample_weight support #3098

Closed

ndawe added 5 commits May 14, 2014 13:20

scorer: add sample_weight support

edbe7b7

grid_search: add sample_weight support

3da7fb7

cross_validation: add sample_weight support

10df65e

rfe: sample_weight support

135d7e1

learning_curve: sample_weight support

bf4cd4c

jnothman mentioned this pull request Jun 26, 2014

Added a note about the sample_weight parameter to GridSearchCV docs #3319

Closed

vene mentioned this pull request Jul 16, 2014

[MRG] scorer: add sample_weight support (+test) #3401

Merged

vene mentioned this pull request Aug 1, 2014

[WIP] scorer_params and sample_weight support #3524

Closed

4 tasks

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

arjoly mentioned this pull request Feb 27, 2015

Cross validation and tools for undersampling and oversampling #1362

Closed

raghavrv mentioned this pull request Mar 30, 2015

[MRG+1] Make cross-validators data independent + Reorganize grid_search, cross_validation and learning_curve into model_selection #4294

Merged

24 tasks

GaelVaroquaux mentioned this pull request Apr 2, 2015

[API] Consistent API for attaching properties to samples #4497

Closed

jnothman mentioned this pull request Apr 25, 2015

Should cross-validation scoring take sample-weights into account? #4632

Closed

g-rutter mentioned this pull request Sep 16, 2016

[MRG] Allow sample weights and other fit_params for RFE #7333

Closed

jnothman mentioned this pull request Aug 16, 2017

[WIP] Sample property routing #9566

Closed

11 tasks

jnothman mentioned this pull request Oct 15, 2018

Function to get scorers for task #12385

Open

jnothman mentioned this pull request Mar 12, 2019

Weighted scoring in cross validation (Closes #4632) #13432

Closed

amueller closed this Jul 14, 2019

[WIP] sample_weight support #1574

[WIP] sample_weight support #1574

Conversation

ndawe commented Jan 14, 2013

amueller commented Jan 15, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndawe commented Jun 2, 2013

ndawe commented Jun 2, 2013

Choose a reason for hiding this comment

jnothman commented Jun 2, 2013

Choose a reason for hiding this comment

jnothman commented Jun 2, 2013

jnothman commented Jun 2, 2013

jnothman commented Jun 2, 2013

jnothman commented Jun 3, 2013

Choose a reason for hiding this comment

jnothman commented Jun 5, 2013

ndawe commented Aug 22, 2013

ndawe commented Aug 22, 2013

arjoly commented Aug 23, 2013

arjoly commented Aug 23, 2013

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ndawe commented Aug 23, 2013

ndawe commented Aug 23, 2013

arjoly commented Aug 23, 2013

jnothman commented Aug 24, 2013

coveralls commented Nov 6, 2013

jnothman commented Nov 22, 2013

coveralls commented Apr 22, 2014

ndawe commented Apr 22, 2014

coveralls commented May 14, 2014

vene commented Jul 16, 2014

ndawe commented Jul 16, 2014

vene commented Jul 16, 2014

vene commented Aug 1, 2014

amueller commented Jul 14, 2019