Skip to content

FEA add TunedThresholdClassifier meta-estimator to post-tune the cut-off threshold #26120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 228 commits into from
May 3, 2024

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Apr 7, 2023

superseded #16525
closes #16525
closes #8614
closes #10117
supersedes #10117

build upon #26037

relates to #4813

Summary

We introduce a TunedThresholdClassifier that intends to post-tune the cut-off points to convert a soft decision of the decision_function or predict_proba to a hard decision provided by predict.

Important features to have in mind:

objective_metric: the objective metric is set to either a metric to be maximized or a pair of metrics, one to be optimized under the constraint of the other (to find a trade-off). Additionally, we can pass a cost/gain-matrix that could be used to optimize a business metric. For this case, we are limited to constant cost/gain. In the future, we can think of cost/gain that depends on the matrix X but we would need to be able to forward meta-data to the scorer (a good additional use case for SLEP006 @adrinjalali).

cv and refit: we provide some flexibility to pass refitted model and single train/test split. We add limitations and documentation to caveats with an example.

Point to discuss

  • Are we fine with the name TunedThresholdClassifier? Shall instead have something about "threshold" (e.g. ThresholdTuner)?
  • We are using the term objective_metric, constraint_value and objective_score. Is the naming fine? An alternative to "objective" might be "utility"?

Further work

I implemented currently a single example that shows the feature in the context of post-tuning of the decision threshold.

The current example is using a single train/test split for the figure and I think it would be nice to have some ROC/precision-recall curve obtained from cross-validation to be complete. However, we need some new features to be implemented.

I also am planning to analyse the usage of this feature on the problem of calibration on imbalanced classification problems. The feeling on this topic is that resampling strategies involved an implicit tuning of the decision threshold at the cost of a badly calibrated model. It might be better to learn a model on the imbalanced problem directly, making sure that it is well calibrated and then post-tune the decision threshold for "hard" prediction. In this case, you get the best of two worlds: a calibrated model if the output of predict_proba is important to you and an optimum hard predictor for your specific utility metric. However, this is going to need some investigation and will be better suited for another PR.

@glemaitre glemaitre changed the title Cutoff classifier again FEA add CutOffClassifier meta-estimator to post-tune the decision threshold Apr 7, 2023
@glemaitre glemaitre marked this pull request as draft April 7, 2023 12:38
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more feedback.

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
"max_recall_at_precision_constraint",
}
),
StrOptions(set(get_scorer_names())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should limit to classification metrics

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm not sure how we should handle that. We don't have any public mechanism available (only present in the test). I recall to have open such a useful PR at some point in time: #17930

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's figure this out separately then :)

objective_metric : {"max_tpr_at_tnr_constraint", "max_tnr_at_tpr_constraint", \
"max_precision_at_recall_constraint, "max_recall_at_precision_constraint"} \
, str, dict or callable, default="balanced_accuracy"
objective_metric : str, dict or callable, default="balanced_accuracy"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't this be called scoring the same as *SeachCV? We're optimizing using a cv object optimizing for one or more metrics, exactly like *SearchCV, would be nice to have a consistent name for this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it makes the API more consistent

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to find where we got the discussion in the past but I could not find it.
I think that the fact that we had constraint metric was different enough from the "scoring" parameter that we decided to change the name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty optimistic that we can find an API which we can share with *SearchCV which is both user friendly and capable of handling usecases we want. And as @jeremiedbb says in #26120 (review) , we're not far from nicely supporting it. So I'd say we can safely rename the parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now I have a question: do we need pos_label and response_method parameters. Those two can be delegated to make_scorer: response_method is a parameter of this function while pos_label is a parameter of the function provided to make_scorer and forwarded to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside is coming to "how many people are using make_scorer

the people who don't are fine with the defaults

Copy link
Member

@jeremiedbb jeremiedbb May 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, if I want to tune a LogisticRegression with a grid search and recall as scoring, either I pass scoring="recall" and rely on the defaults, or I pass a scorer made out of recall_score if I want to change the pos label. I think it's fine to mimic this behavior

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But they will be requested for the FixedThresholdClassifier because we don't have any metric.

For this one we need to keep them indeed I think because if a user passes a threshold of say 0.3, we need to know if it's for a proba or for a decision function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the response_method of the estimator and the response_method for the scorer are not the same, so we need to keep it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. response_method is dispatched to _CurveScorer and not the scorer. It would be weird to request our user passing make_scorer(balanced_accuracy_score, response_method="predict_proba") while the scorer alone does not make any sense.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a quick look and references to constrained metrics seem to have all been removed.

Also, I think the use case of constrained metric is still doable with the current API. I expect that something like the following works:

def max_tpr_at_tnr_constraint_score(y_true, y_pred, max_tnr):
    tpr = tpr(y_true, y_pred)
    tnr = tnr(y_true, y_pred)
    if tnr > max_tnr:
        return -np.inf
    return tpr

my_scorer = make_scorer(max_tpr_at_tnr_constraint_score, max_tnr=0.5)

@glemaitre
Copy link
Member Author

I think that this is good for another round of review.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre
Copy link
Member Author

LGTM

Deja-vu :)

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you please open separate issues for remaining work? (some refactoring and some doc concerns seem to be open).

@adrinjalali adrinjalali enabled auto-merge (squash) May 3, 2024 16:11
@adrinjalali adrinjalali merged commit 1e49c34 into scikit-learn:main May 3, 2024
28 checks passed
@jeremiedbb
Copy link
Member

🎉

@glemaitre
Copy link
Member Author

The issue about refactoring is already open here: #28941.

For the documentation, I'll open one issue to see how to articulate the constrained part. The comment raised by @amueller is not anymore meaningful since we don't really allow to choose a point on the PR or ROC curve in a straight forward manner.

I'll also revive #17930

@lorentzenchr
Copy link
Member

🚀 @glemaitre 🎉
Thanks for this great addition many have been longing for.

In a way, I like the explicit FixedThresholdClassifier. Out of curiosity: is there a path forward that this could end up as mixing class in every classifier? Without the need to meta-class-wrap it by the user?

@ogrisel
Copy link
Member

ogrisel commented May 6, 2024

Out of curiosity: is there a path forward that this could end up as mixing class in every classifier? Without the need to meta-class-wrap it by the user?

Maybe but that would entail adding at least 2 new constructor parameters to all classifiers in scikit-learn that implement predict_proba or decision_function. That's kind of an invasive API change...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add wrapper class that changes threshold value for predict