Skip to content

[WIP] sample_weight support #1574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

ndawe
Copy link
Member

@ndawe ndawe commented Jan 14, 2013

@amueller
Copy link
Member

I'd really like to get in #1381 first, hopefully shortly after the release. Not sure how this fits into the interface implemented there.

sample_weight_test = sample_weight[safe_mask(sample_weight, test)]
fit_params['sample_weight'] = sample_weight_train
score_params['sample_weight'] = sample_weight_test

if y is not None:
y_test = y[safe_mask(y, test)]
y_train = y[safe_mask(y, train)]
clf.fit(X_train, y_train, **fit_params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are sample weights, shouldn't we be training with them too? Or do we need separate parameters for fit_sample_weight and score_sample_weight??? (Perhaps such prefixing is a general solution to this trouble.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Especially when fit is called with a sample_weight arg for best_estimator_

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jnothman yes, sample_weight is used in both the calls to fit and score in fit_grid_point. There is no need for separate sample_weight arrays since it is treated the same way X and y are (split according to test and train subarrays). Also, if refit is true the sample weights are again used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh. I missed the modification of fit_params below.

@ndawe
Copy link
Member Author

ndawe commented Jun 2, 2013

@jnothman @amueller I will bring this PR back to life now. I need to rebase and resolve a few conflicts.

@ndawe
Copy link
Member Author

ndawe commented Jun 2, 2013

@jnothman @amueller I've rebased and cleaned up a few things. A few of the metrics now support sample_weights but I think someone more knowledgeable of the other metrics should implement it there. This PR should otherwise be ready for review, particularly for the grid searching with sample weights.

if sample_weight is not None:
tps = (y_true * sample_weight).cumsum()[threshold_idxs]
else:
tps = y_true.cumsum()[threshold_idxs]
fps = 1 + threshold_idxs - tps
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line must also take account for sample_weight, or else fps might be less than 0.

The test for this function will be that pr_curve outputs values corresponding to naively calculating precision_recall_fscore_support at each threshold. It would be a nice test to have anyway to check our implementation is sane.

@jnothman
Copy link
Member

jnothman commented Jun 2, 2013

It probably comes down to personal preference, but I generally dislike the metrics being calculated multiple times depending on the parameters. I.e. I would much rather something like this at the top, then treat having sample_weight as the normal case:

if sample_weight is None:
    sample_weight = 1
    total_weight = len(y)
else:
    total_weight = np.sum(sample_weight)

(assuming sample_weight is being used as a multiplier; where it's passed to np.average, np.bincount, etc, it's fine to leave sample_weight=None.)

@@ -190,7 +190,8 @@ def __len__(self):
return self.n_iter


def fit_grid_point(X, y, base_clf, clf_params, train, test, scorer,
def fit_grid_point(X, y, sample_weight, base_clf, clf_params,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess for clean code, handling sample_weight specially makes some sense, but I wonder: what other parameters will require being sliced per fold that users will expect similar support for?

@jnothman
Copy link
Member

jnothman commented Jun 2, 2013

Couple of questions:

  1. Does wanting to weight samples when fitting necessarily imply weighting should be used in scoring, and vice-versa?
  2. Are there references or reference implementations of the metrics with weights? Or are we following our intuitions?

@jnothman
Copy link
Member

jnothman commented Jun 2, 2013

In particular, how does weighting interact with multilabel outputs? I assume the weight is applied to the instance as a whole, spread uniformly over the proposed labels for that instance (where that's meaningful).

@jnothman
Copy link
Member

jnothman commented Jun 2, 2013

You also need a more general invariance test: for any metric, integer array sample_weight, and real k, the following should be equal: metric(y1, y2, sample_weight * k) == metric(np.repeat(y1, sample_weight), np.repeat(y2, sample_weight)).

@jnothman
Copy link
Member

jnothman commented Jun 3, 2013

I wrote:

Does wanting to weight samples when fitting necessarily imply weighting should be used in scoring, and vice-versa?
Thinking further: assuming weighting at fit and score time is the most common use case, the user that really wants this flexibility can wrap their scorer or estimator to discard the sample_weight parameter.

@@ -683,8 +706,11 @@ def fit(self, X, y=None, **params):
Target vector relative to X for classification;
None for unsupervised learning.

sample_weight : array-like, shape = [n_samples], optional
Sample weights.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should specify that the weights are applied both in fitting and scoring, and that estimator.fit and scoring must support the sample_weight parameter. (And same in RandomizedSearchCV of course.)

@jnothman
Copy link
Member

jnothman commented Jun 5, 2013

This may be close to all the testing you need:

def test_sample_weight_invariance():
    y1, y2, _ = make_prediction(binary=True)
    int_weights = np.randint(10, size=y1.shape)
    for name, metric in METRICS_WITH_SAMPLE_WEIGHT:
        unweighted_score = metric(y1, y2, sample_weight=None)
        assert_equal(unweighted_score,
                     metric(y1, y2, sample_weight=np.ones(shape=y1.shape)),
                     msg='For %s sample_weight=None is not equivalent to '
                         'sample_weight=ones' % name)

        weighted_score = metric(np.repeat(y1, int_weights),
                                np.repeat(y2, int_weights))
        assert_not_equal(unweighted_score, weighted_score,
                         msg="Unweighted and weighted scores are unexpectedly "
                             "equal for %s" % name)
        assert_equal(weighted_score,
                     metric(y1, y2, sample_weight=int_weights),
                     msg='Weighting %s is not equal to repeating samples'
                         % name)
        for scaling in [2, 0.3]:
            assert_equal(weighted_score,
                         metric(y1, y2, sample_weight=int_weights * scaling),
                         msg='%s sample_weight is not invariant under scaling'
                             % name)
        assert_equal(weighted_score,
                     metric(y1, y2, sample_weight=list(scaling)),
                     msg='%s sample_weight is not invariant to list vs array'
                         % name)

@ndawe
Copy link
Member Author

ndawe commented Aug 22, 2013

@jnothman thanks for your help! I finally had some time to get back to this. This PR should now be ready.

Regarding your questions:

  1. Does wanting to weight samples when fitting necessarily imply weighting should be used in scoring, and vice-versa?

Yes. Performance on samples with larger weights matters more than performance on samples with smaller weights. So each sample should contribute by its weight to the overall score.

  1. Are there references or reference implementations of the metrics with weights? Or are we following our intuitions?

Probably, but as you can see in the PR the changes are quite trivial.

@ndawe
Copy link
Member Author

ndawe commented Aug 22, 2013

In other words, I am unaware of existing implementations of all of these metrics that account for sample weights. That would be helpful though. But, again, the modifications required here are not too serious.

In my field of research (high energy particle physics), samples are almost always weighted and the weights can vary a lot. So any classifier I use must account for the sample weights and any evaluation of a classifier's performance must also account for the sample weights.

@arjoly
Copy link
Member

arjoly commented Aug 23, 2013

One thing that would be worth thinking is the integration with the new scorer interface.

@arjoly
Copy link
Member

arjoly commented Aug 23, 2013

Hm we could have reviewed two pr. One with the metrics and one on top of the other with grid searching and cross validation score.

@@ -1812,7 +1839,7 @@ def recall_score(y_true, y_pred, labels=None, pos_label=1, average='weighted'):

@deprecated("Function zero_one_score has been renamed to "
'accuracy_score'" and will be removed in release 0.15.")
def zero_one_score(y_true, y_pred):
def zero_one_score(y_true, y_pred, sample_weight=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is deprecated and will be remove in the next release.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you are looking for zero_one_loss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks.

@ndawe
Copy link
Member Author

ndawe commented Aug 23, 2013

@arjoly yes, true. But, I need the weighted score metrics from this PR (for the grid_search unit tests), and I don't want to have a bunch of cherry-picked commits in the other PR.

@ndawe
Copy link
Member Author

ndawe commented Aug 23, 2013

I see... would it be fine to keep rebasing a grid_search PR on this one? I can split them again... Although all I have left is to add sample_weight to the scorers and then add unit tests.

@arjoly
Copy link
Member

arjoly commented Aug 23, 2013

Are there references or reference implementations of the metrics with weights? Or are we following our intuitions?

Probably, but as you can see in the PR the changes are quite trivial.

For some metrics, I agree that taking into the weight is straightforward, such as the mean squared error. But for other metrics such as the roc_auc_score or r2_score, I would be glad to see some references.

@jnothman
Copy link
Member

@arjoly, I think definitions are pretty straight forward in terms of invariance with integer weights.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 1ea2c7e on ndawe:weighted_scores into 613cf8e on scikit-learn:master.

@jnothman
Copy link
Member

@ndawe, is this still a WIP, or is it ready to be reviewed for merging?

@ndawe ndawe changed the title [MRG] sample_weight support [WIP] sample_weight support Apr 9, 2014
@coveralls
Copy link

Coverage Status

Changes Unknown when pulling 0ce2180 on ndawe:weighted_scores into * on scikit-learn:master*.

@ndawe
Copy link
Member Author

ndawe commented Apr 22, 2014

Cleaned up the history. Will now start a PR for sample_weight in the scorer interface.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling bf4cd4c on ndawe:weighted_scores into d62971d on scikit-learn:master.

@vene
Copy link
Member

vene commented Jul 16, 2014

The grid_search, rfe, learning_curve and cross_validation part of this PR should wait for #3340, which will conflict in many ways.

@ndawe
Copy link
Member Author

ndawe commented Jul 16, 2014

Thanks @vene. Sounds good. I've updated the description with a link to your #3401

@vene
Copy link
Member

vene commented Jul 16, 2014

Thanks Noel! I was thinking that the refactoring in #3340 might be a bit too big (and therefore slow to get in) while the remaining part of this PR are much more manageable, so I'd rather try to get #3340 broken into two and merge this faster. IMO these are important usability issues. We'll take this up tomorrow at the sprint. Ping @amueller, @arjoly and @GaelVaroquaux

@vene
Copy link
Member

vene commented Aug 1, 2014

In the grid search code, would it be conceivable to use a simple convention: that all fit_params that start with the prefix sample_ need to be passed masked to the estimator to be tuned. This would allow to handle sample_weight and sample_group (for learning to rank).

The current convention in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/cross_validation.py#L1184 is more general in assuming anything with length n_samples should be masked. The trickier part is differentially routing these to fit, decision_function and the metric.

@jnothman, @ndawe
Actually there's no question of routing: the convention only applies to the fit_params, which naturally should only be sent to fit. We would have a separate scorer_params or something: one for every function of an estimator that _fit_and_score can call.

I really need this right now because of sample groups (ie query_id).

I would also prefer the convention of giving them names that start with sample: sample_weights, sample_groups, etc, it's more explicit and less bug-prone (what if i make an off-by-one error when building the sample_weights vector?)

@amueller
Copy link
Member

closing as superseeded by #13432. This code is too outdated to be useful, I think.

@amueller amueller closed this Jul 14, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants