[WIP] Choose out of bag scoring metric. Fixes #3455 #3723

staple · 2014-09-30T05:53:36Z

Enhances the oob_score parameter of forest related predictors (e.g. RandomForestClassifier) to accept a string representing a scoring method or a callable scorer.

The existing out of bag scoring implementations use custom prediction logic in order to predict only out of bag input rows for each estimator. Additionally, per existing functionality the out of bag predictions are saved in documented fields of the predictor (oob_decision_function_, oob_prediction_). To simplify integration of this functionality with the existing scorer interface, the out of bag predictions are passed to a _DummyPredictor, which in turn makes them available to the scorer.

For backwards compatibility, the oob_score argument is not renamed, and it continues to accept True / False arguments (default still False). Passed True, the oob_score calculation is unchanged in the single output case (r2 for regression, accuracy for classification).

In the multi output case, the score calculations have changed. For regression, formerly individual r-squared metrics for each field were averaged across all output fields. Now a single r-squared metric is calculated across all outputs using the r2_score implementation. For classification, formerly the individual accuracy metrics were averaged across all output fields. Now a single accuracy metric is calculated based on complete accuracy of the set of outputs for a given input row, using the accuracy_score implementation.

This is a WIP, some known TODOS:

Is the proposed implementation using DummyPredictor reasonable?
Are the proposed scoring changes in the multi output case acceptable?
Ensure that the accuracy scorer handles combined multi class + multi output cases
Implement this for BaggingClassifier and BaggingRegressor as well?
Tests, example etc

coveralls · 2014-09-30T06:04:33Z

Coverage decreased (-0.0%) when pulling 265ffc3 on staple:3455 into 6ea371a on scikit-learn:master.

mblondel · 2014-09-30T07:20:44Z

sklearn/ensemble/forest.py

+class _DummyPredictor:
+    """ Private class returning a precomputed prediction. Used to provide out-
+    of-bag training predictions to a scorer.
+    """


IMO, the fact that you have to use this hack shows that we have a problem with our scorer API.

mblondel · 2014-09-30T07:28:18Z

You might be interested in my PR #3720 which introduces OOB-aware grid search.

For a question of separation of concerns, I would be inclined to just deprecate oob_score_. It would be enough to have just oob_prediction_ (which is missing in the classification case but can be added). Computing scores from the OOB predictions would be role of GridSearchOOB.

mblondel · 2014-09-30T07:46:25Z

In the classification case, the input type differs depending on the scorer: class predictions, decision function, probabilities. To handle this, a lot of scorer code will leak to the forest code. Hence my separation of concerns suggestion.

staple · 2014-09-30T21:16:47Z

@mblondel, thanks for your comments. For the sake of completeness, I expanded the existing WIP implementation to provide prediction probabilities and the decision function to the classification scorer. Thanks for pointing out the additional data required by some classification scorers. The forest estimators now expose all the oob prediction data externally as well, via attributes. The implementation here is still using a ‘DummyPredictor’ to integrate with the scoring interface, which currently works with Predictors rather than precomputed values.

Since this is my first exposure to the scikit-learn codebase, I don’t really feel qualified to weigh in on the question of computing oob_score_ in a forest estimator vs exposing oob_prediction_ along with decision function and probabilities so that scoring can be computed externally. Is the oob score typically going to be used by an external component such as your GridSearchOOB, or do you think human users are interested in accessing this score easily as well? Would you be willing to discuss this design decision with @arjoly in the original ticket? (#3455)

coveralls · 2014-09-30T21:24:08Z

Coverage decreased (-0.01%) when pulling e179c76 on staple:3455 into 6ea371a on scikit-learn:master.

mblondel · 2014-10-01T01:48:20Z

@arjoly @glouppe What's your opinion on this?

mblondel · 2014-10-01T01:49:56Z

@staple I don't think it is difficult to do r2_score(y, clf.oob_prediction_) in user-land code.

staple · 2014-10-01T02:07:16Z

@mblondel, Sure, understood about r2_score. I would guess that the metrics currently implemented using _ThresholdScorer (such as roc_auc) would be slightly more work for the user to figure out, though definitely still doable.

glouppe · 2014-10-04T08:30:27Z

I like the idea of removing the scorer specific code in forests and rely instead on our scorer API.

To make user experience a bit easier though, we could add a OOB equivalent to cross_val_scores. Something like oob_score(estimator, X, y, scoring=...) provided the estimator provides clf.oob_predictions_ or something like that. This would replicate the functionality of oob_score_ but in a more flexible and integrated way.

mblondel · 2014-10-04T14:23:23Z

+1 to @glouppe 's proposal.

One problem is that our current scorer API doesn't let us use pre-computed predictions (it recomputes the predictions based on estimator, X and y). I suggest we add a new function get_score along these lines:

def get_score(scoring, y_true, y_pred=None, y_decision=None, y_proba=None):
    if scoring == "roc_auc":
        if y_decision is None and y_proba is None:
            raise ValueError("roc_auc needs either y_decision or y_proba.")
        if y_decision is not None:
            return roc_auc_score(y_true, y_decision)
        else:
            return roc_auc_score(y_true, y_proba)
[...]

The advantage of this approach is that it raises a clear error message when the wrong type of prediction is provided.

staple · 2014-10-07T03:32:01Z

Hi, thanks for the suggestions. I removed oob_score from forest, added an independent oob_score and also get_score, and refactored scoring a bit to facilitate get_score. Still in the sketch / POC phase. Please let me know what you think.

staple · 2014-10-07T03:34:25Z

@mblondel The get_score I added tries to reuse the existing scoring infrastructure. If this approach looks promising, I can try to add more specific error messages per your suggestion. But let me know what you think about the overall approach, thanks.

glouppe · 2014-10-07T06:33:58Z

sklearn/ensemble/bagging.py

+        Out of bag score of the estimator.
+    """
+
+    estimator.oob_predict = True


Use the set_params API.

mblondel · 2014-10-07T09:40:27Z

I need more time to think about this design issue. One thing to keep in mind is that getting scores from pre-computed predictions is also useful for multiple-metric grid search (see #2759).

Perhaps @jnothman, @ogrisel, @larsmans and @amueller would like to pitch in.

staple · 2014-10-09T04:02:10Z

@mblondel

Would it make sense for me to create a separate PR specifically for looking at scorer api changes, to facilitate discussion of these changes?

To summarize the situation, the current _BaseScorer interface accepts a predictor and some prediction data points X. The various _BaseScorer implementations compute a score by calling predict(X), predict_proba(X), or decision_function(X) on the predictor as needed and pass the results to a score function with possible transformations.

In the development of out of bag scoring and multiple-metric grid search we’ve seen that predictions may be made via special procedures that do not strictly adhere to the predictor interface. We would like to be able to score these externally computed predictions.

As I see it there are 5 choices for implementing scoring of externally computed predictions.

Add a new implementation that’s completely separate from the existing _BaseScorer class.
Use the existing _BaseScorer without changes. This means abusing the predictor interface and creating a DummyPredictor (or similar) that ignores X and returns the externally computed predictions - predictions not inherently based on the X variable, but externally computed based on X.
Add a private api to _BaseScorer for scoring externally computed predictions. The private api can be called by a public helper function in scorer.py
Change the public api of _BaseScorer to make scoring of externally computed predictions a first class operation along with the existing functionality. Also possibly rename _BaseScorer => BaseScorer.
Change the public api of _BaseScorer so that it only handles externally computed predictions. The existing functionality would be implemented by the caller (as a callback, since the required type of prediction data is not known in advance).

Let me know if I missed any options :) In the most recent patch I went with #3. To my mind it might make sense to decide on which of these approaches 1-5 is most appropriate, and proceed with design from there.

staple · 2014-10-15T19:34:45Z

It seems like the scorer design question will need to be answered in order to implement this oob scoring feature. Is there anything else I can do to help make progress on this issue, per my previous comment?

mblondel · 2014-10-22T15:07:37Z

@staple It seems to me that option 4) would be the best solution if we want to add more flexibility while remaining backward compatible.

Perhaps bringing this discussion to the mailing-list will give it more attention. In particular, I would like to have @larsmans and @amueller's opinion, as they are the original designers of the scorer API.

eickenberg · 2014-10-27T10:57:20Z

Can't comment on the globality of this topic, because it is too vast and central to this software project, and I don't have the full overview.

What I can point out is:

In linear_model we have the analytic LOO prediction in ridge regression as another example where predictions come from "elsewhere" (in this case through a shortcut via an analytic formula). It had seemed like an exception to me, but I guess this is now another example for the same issue.

As can be seen in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L792 it is solved via 2) and makes 2) look like quite an ugly hack ...

As for 5), up to the proposed callback structure, this is almost a reversion to the score_func type API. The question would be whether adding the callback structure solves the issues the old API had, whether it feels like it is the simple solution, and also whether such a "change of mind" would be perceived as indecisive "zig-zagging".

jnothman · 2016-05-18T09:59:35Z

I agree with #3723 (comment) that this comes down to a separate issue about scoring API with precalculated predictions. I'm not sure what the correct solution is, but I personally don't see much point in this PR and the issue it is trying to solve (#3723), and would consider their closure.

haiatn · 2023-07-29T12:15:39Z

I agree with #3723 (comment) that this comes down to a separate issue about scoring API with precalculated predictions. I'm not sure what the correct solution is, but I personally don't see much point in this PR and the issue it is trying to solve (#3723), and would consider their closure.

I agree

adrinjalali · 2023-07-30T10:10:05Z

Those parts of the code have changed a lot, a fresh look might be a better approach than trying to continue this one.

staple mentioned this pull request Sep 30, 2014

Allow to choose the out-of-bag scoring metric #3455

Closed

mblondel reviewed Sep 30, 2014
View reviewed changes

staple added 3 commits October 6, 2014 16:19

[WIP] Choose out of bag scoring metric. Fixes scikit-learn#3455

b502e59

Provide prediction probabilities and decision function to scorer.

9f17373

Move oob_score out of forest, add get_score w/ refactoring.

221532a

staple force-pushed the 3455 branch from e179c76 to 221532a Compare October 7, 2014 03:29

glouppe reviewed Oct 7, 2014
View reviewed changes

sklearn/ensemble/bagging.py

Out of bag score of the estimator.

"""

estimator.oob_predict = True

Copy link

Contributor

glouppe Oct 7, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the set_params API.

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

amueller added the API label Aug 5, 2019

amueller added the Waiting for Reviewer label Aug 5, 2019

github-actions bot added module:ensemble module:metrics labels Mar 2, 2020

cmarmo added Stalled Needs Decision Requires decision and removed Waiting for Reviewer labels Sep 17, 2020

Base automatically changed from master to main January 22, 2021 10:48

adrinjalali closed this Jul 30, 2023

Uh oh!

[WIP] Choose out of bag scoring metric. Fixes #3455 #3723

[WIP] Choose out of bag scoring metric. Fixes #3455 #3723

Uh oh!

Conversation

staple commented Sep 30, 2014

Uh oh!

coveralls commented Sep 30, 2014

Uh oh!

mblondel Sep 30, 2014

Choose a reason for hiding this comment

Uh oh!

mblondel commented Sep 30, 2014

Uh oh!

mblondel commented Sep 30, 2014

Uh oh!

staple commented Sep 30, 2014

Uh oh!

coveralls commented Sep 30, 2014

Uh oh!

mblondel commented Oct 1, 2014

Uh oh!

mblondel commented Oct 1, 2014

Uh oh!

staple commented Oct 1, 2014

Uh oh!

glouppe commented Oct 4, 2014

Uh oh!

mblondel commented Oct 4, 2014

Uh oh!

staple commented Oct 7, 2014

Uh oh!

staple commented Oct 7, 2014

Uh oh!

glouppe Oct 7, 2014

Choose a reason for hiding this comment

Uh oh!

mblondel commented Oct 7, 2014

Uh oh!

staple commented Oct 9, 2014

Uh oh!

staple commented Oct 15, 2014

Uh oh!

mblondel commented Oct 22, 2014

Uh oh!

eickenberg commented Oct 27, 2014

Uh oh!

jnothman commented May 18, 2016

Uh oh!

haiatn commented Jul 29, 2023

Uh oh!

adrinjalali commented Jul 30, 2023

Uh oh!

Uh oh!