-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA add TunedThresholdClassifier meta-estimator to post-tune the cut-off threshold #26120
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA add TunedThresholdClassifier meta-estimator to post-tune the cut-off threshold #26120
Conversation
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
…_classifier_again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some more feedback.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
"max_recall_at_precision_constraint", | ||
} | ||
), | ||
StrOptions(set(get_scorer_names())), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should limit to classification metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm not sure how we should handle that. We don't have any public mechanism available (only present in the test). I recall to have open such a useful PR at some point in time: #17930
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's figure this out separately then :)
objective_metric : {"max_tpr_at_tnr_constraint", "max_tnr_at_tpr_constraint", \ | ||
"max_precision_at_recall_constraint, "max_recall_at_precision_constraint"} \ | ||
, str, dict or callable, default="balanced_accuracy" | ||
objective_metric : str, dict or callable, default="balanced_accuracy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be called scoring
the same as *SeachCV
? We're optimizing using a cv object optimizing for one or more metrics, exactly like *SearchCV
, would be nice to have a consistent name for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it makes the API more consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to find where we got the discussion in the past but I could not find it.
I think that the fact that we had constraint metric was different enough from the "scoring" parameter that we decided to change the name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty optimistic that we can find an API which we can share with *SearchCV
which is both user friendly and capable of handling usecases we want. And as @jeremiedbb says in #26120 (review) , we're not far from nicely supporting it. So I'd say we can safely rename the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now I have a question: do we need pos_label
and response_method
parameters. Those two can be delegated to make_scorer
: response_method
is a parameter of this function while pos_label
is a parameter of the function provided to make_scorer
and forwarded to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside is coming to "how many people are using make_scorer
the people who don't are fine with the defaults
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, if I want to tune a LogisticRegression with a grid search and recall as scoring, either I pass scoring="recall"
and rely on the defaults, or I pass a scorer made out of recall_score
if I want to change the pos label. I think it's fine to mimic this behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But they will be requested for the FixedThresholdClassifier because we don't have any metric.
For this one we need to keep them indeed I think because if a user passes a threshold of say 0.3, we need to know if it's for a proba or for a decision function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the response_method of the estimator and the response_method for the scorer are not the same, so we need to keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed. response_method
is dispatched to _CurveScorer
and not the scorer. It would be weird to request our user passing make_scorer(balanced_accuracy_score, response_method="predict_proba")
while the scorer alone does not make any sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a quick look and references to constrained metrics seem to have all been removed.
Also, I think the use case of constrained metric is still doable with the current API. I expect that something like the following works:
def max_tpr_at_tnr_constraint_score(y_true, y_pred, max_tnr):
tpr = tpr(y_true, y_pred)
tnr = tnr(y_true, y_pred)
if tnr > max_tnr:
return -np.inf
return tpr
my_scorer = make_scorer(max_tpr_at_tnr_constraint_score, max_tnr=0.5)
I think that this is good for another round of review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Deja-vu :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Could you please open separate issues for remaining work? (some refactoring and some doc concerns seem to be open).
🎉 |
The issue about refactoring is already open here: #28941. For the documentation, I'll open one issue to see how to articulate the constrained part. The comment raised by @amueller is not anymore meaningful since we don't really allow to choose a point on the PR or ROC curve in a straight forward manner. I'll also revive #17930 |
🚀 @glemaitre 🎉 In a way, I like the explicit |
Maybe but that would entail adding at least 2 new constructor parameters to all classifiers in scikit-learn that implement |
superseded #16525
closes #16525
closes #8614
closes #10117
supersedes #10117
build upon #26037
relates to #4813
Summary
We introduce a
TunedThresholdClassifier
that intends to post-tune the cut-off points to convert a soft decision of thedecision_function
orpredict_proba
to a hard decision provided bypredict
.Important features to have in mind:
objective_metric
: the objective metric is set to either a metric to be maximized or a pair of metrics, one to be optimized under the constraint of the other (to find a trade-off). Additionally, we can pass a cost/gain-matrix that could be used to optimize a business metric. For this case, we are limited to constant cost/gain. In the future, we can think of cost/gain that depends on the matrixX
but we would need to be able to forward meta-data to the scorer (a good additional use case for SLEP006 @adrinjalali).cv
andrefit
: we provide some flexibility to pass refitted model and single train/test split. We add limitations and documentation to caveats with an example.Point to discuss
TunedThresholdClassifier
? Shall instead have something about "threshold" (e.g.ThresholdTuner
)?objective_metric
,constraint_value
andobjective_score
. Is the naming fine? An alternative to "objective" might be "utility"?Further work
I implemented currently a single example that shows the feature in the context of post-tuning of the decision threshold.
The current example is using a single train/test split for the figure and I think it would be nice to have some ROC/precision-recall curve obtained from cross-validation to be complete. However, we need some new features to be implemented.
I also am planning to analyse the usage of this feature on the problem of calibration on imbalanced classification problems. The feeling on this topic is that resampling strategies involved an implicit tuning of the decision threshold at the cost of a badly calibrated model. It might be better to learn a model on the imbalanced problem directly, making sure that it is well calibrated and then post-tune the decision threshold for "hard" prediction. In this case, you get the best of two worlds: a calibrated model if the output of
predict_proba
is important to you and an optimum hard predictor for your specific utility metric. However, this is going to need some investigation and will be better suited for another PR.