-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Choose out of bag scoring metric. Fixes #3455 #3723
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/ensemble/forest.py
Outdated
class _DummyPredictor: | ||
""" Private class returning a precomputed prediction. Used to provide out- | ||
of-bag training predictions to a scorer. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, the fact that you have to use this hack shows that we have a problem with our scorer API.
You might be interested in my PR #3720 which introduces OOB-aware grid search. For a question of separation of concerns, I would be inclined to just deprecate |
In the classification case, the input type differs depending on the scorer: class predictions, decision function, probabilities. To handle this, a lot of scorer code will leak to the forest code. Hence my separation of concerns suggestion. |
@mblondel, thanks for your comments. For the sake of completeness, I expanded the existing WIP implementation to provide prediction probabilities and the decision function to the classification scorer. Thanks for pointing out the additional data required by some classification scorers. The forest estimators now expose all the oob prediction data externally as well, via attributes. The implementation here is still using a ‘DummyPredictor’ to integrate with the scoring interface, which currently works with Predictors rather than precomputed values. Since this is my first exposure to the scikit-learn codebase, I don’t really feel qualified to weigh in on the question of computing oob_score_ in a forest estimator vs exposing oob_prediction_ along with decision function and probabilities so that scoring can be computed externally. Is the oob score typically going to be used by an external component such as your GridSearchOOB, or do you think human users are interested in accessing this score easily as well? Would you be willing to discuss this design decision with @arjoly in the original ticket? (#3455) |
@staple I don't think it is difficult to do |
@mblondel, Sure, understood about r2_score. I would guess that the metrics currently implemented using _ThresholdScorer (such as roc_auc) would be slightly more work for the user to figure out, though definitely still doable. |
I like the idea of removing the scorer specific code in forests and rely instead on our scorer API. To make user experience a bit easier though, we could add a OOB equivalent to cross_val_scores. Something like |
+1 to @glouppe 's proposal. One problem is that our current scorer API doesn't let us use pre-computed predictions (it recomputes the predictions based on def get_score(scoring, y_true, y_pred=None, y_decision=None, y_proba=None):
if scoring == "roc_auc":
if y_decision is None and y_proba is None:
raise ValueError("roc_auc needs either y_decision or y_proba.")
if y_decision is not None:
return roc_auc_score(y_true, y_decision)
else:
return roc_auc_score(y_true, y_proba)
[...] The advantage of this approach is that it raises a clear error message when the wrong type of prediction is provided. |
Hi, thanks for the suggestions. I removed oob_score from forest, added an independent oob_score and also get_score, and refactored scoring a bit to facilitate get_score. Still in the sketch / POC phase. Please let me know what you think. |
@mblondel The get_score I added tries to reuse the existing scoring infrastructure. If this approach looks promising, I can try to add more specific error messages per your suggestion. But let me know what you think about the overall approach, thanks. |
Out of bag score of the estimator. | ||
""" | ||
|
||
estimator.oob_predict = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the set_params
API.
Would it make sense for me to create a separate PR specifically for looking at scorer api changes, to facilitate discussion of these changes? To summarize the situation, the current _BaseScorer interface accepts a predictor and some prediction data points X. The various _BaseScorer implementations compute a score by calling predict(X), predict_proba(X), or decision_function(X) on the predictor as needed and pass the results to a score function with possible transformations. In the development of out of bag scoring and multiple-metric grid search we’ve seen that predictions may be made via special procedures that do not strictly adhere to the predictor interface. We would like to be able to score these externally computed predictions. As I see it there are 5 choices for implementing scoring of externally computed predictions.
Let me know if I missed any options :) In the most recent patch I went with #3. To my mind it might make sense to decide on which of these approaches 1-5 is most appropriate, and proceed with design from there. |
It seems like the scorer design question will need to be answered in order to implement this oob scoring feature. Is there anything else I can do to help make progress on this issue, per my previous comment? |
@staple It seems to me that option 4) would be the best solution if we want to add more flexibility while remaining backward compatible. Perhaps bringing this discussion to the mailing-list will give it more attention. In particular, I would like to have @larsmans and @amueller's opinion, as they are the original designers of the scorer API. |
Can't comment on the globality of this topic, because it is too vast and central to this software project, and I don't have the full overview. What I can point out is: In As can be seen in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py#L792 it is solved via 2) and makes 2) look like quite an ugly hack ... As for 5), up to the proposed callback structure, this is almost a reversion to the |
I agree with #3723 (comment) that this comes down to a separate issue about scoring API with precalculated predictions. I'm not sure what the correct solution is, but I personally don't see much point in this PR and the issue it is trying to solve (#3723), and would consider their closure. |
I agree |
Those parts of the code have changed a lot, a fresh look might be a better approach than trying to continue this one. |
Enhances the oob_score parameter of forest related predictors (e.g. RandomForestClassifier) to accept a string representing a scoring method or a callable scorer.
The existing out of bag scoring implementations use custom prediction logic in order to predict only out of bag input rows for each estimator. Additionally, per existing functionality the out of bag predictions are saved in documented fields of the predictor (oob_decision_function_, oob_prediction_). To simplify integration of this functionality with the existing scorer interface, the out of bag predictions are passed to a _DummyPredictor, which in turn makes them available to the scorer.
For backwards compatibility, the oob_score argument is not renamed, and it continues to accept True / False arguments (default still False). Passed True, the oob_score calculation is unchanged in the single output case (r2 for regression, accuracy for classification).
In the multi output case, the score calculations have changed. For regression, formerly individual r-squared metrics for each field were averaged across all output fields. Now a single r-squared metric is calculated across all outputs using the r2_score implementation. For classification, formerly the individual accuracy metrics were averaged across all output fields. Now a single accuracy metric is calculated based on complete accuracy of the set of outputs for a given input row, using the accuracy_score implementation.
This is a WIP, some known TODOS: