FEA add TunedThresholdClassifier meta-estimator to post-tune the cut-off threshold #26120

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

adrinjalali merged 228 commits into scikit-learn:main from glemaitre:cutoff_classifier_again

May 3, 2024

Member

glemaitre commented Apr 7, 2023 •

edited

Loading

superseded #16525
closes #16525
closes #8614
closes #10117
supersedes #10117

build upon #26037

relates to #4813

Summary

We introduce a TunedThresholdClassifier that intends to post-tune the cut-off points to convert a soft decision of the decision_function or predict_proba to a hard decision provided by predict.

Important features to have in mind:

objective_metric: the objective metric is set to either a metric to be maximized or a pair of metrics, one to be optimized under the constraint of the other (to find a trade-off). Additionally, we can pass a cost/gain-matrix that could be used to optimize a business metric. For this case, we are limited to constant cost/gain. In the future, we can think of cost/gain that depends on the matrix X but we would need to be able to forward meta-data to the scorer (a good additional use case for SLEP006 @adrinjalali).

cv and refit: we provide some flexibility to pass refitted model and single train/test split. We add limitations and documentation to caveats with an example.

Point to discuss

Are we fine with the name TunedThresholdClassifier? Shall instead have something about "threshold" (e.g. ThresholdTuner)?
We are using the term objective_metric, constraint_value and objective_score. Is the naming fine? An alternative to "objective" might be "utility"?

Further work

I implemented currently a single example that shows the feature in the context of post-tuning of the decision threshold.

The current example is using a single train/test split for the figure and I think it would be nice to have some ROC/precision-recall curve obtained from cross-validation to be complete. However, we need some new features to be implemented.

I also am planning to analyse the usage of this feature on the problem of calibration on imbalanced classification problems. The feeling on this topic is that resampling strategies involved an implicit tuning of the decision threshold at the cost of a badly calibrated model. It might be better to learn a model on the imbalanced problem directly, making sure that it is well calibrated and then post-tune the decision threshold for "hard" prediction. In this case, you get the best of two worlds: a calibrated model if the output of predict_proba is important to you and an optimum hard predictor for your specific utility metric. However, this is going to need some investigation and will be better suited for another PR.

glemaitre and others added 11 commits

March 31, 2023 17:14


          MAINT refactor scorer using _get_response_values

b44dd9d


          Add __name__ for method of Mock

516f62f


          remove multiclass issue

d2fbee0


          make response_method a mandatory arg

29e5e87


          Update sklearn/metrics/_scorer.py

b645ade

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>


          apply jeremie comments

3397c56


          Merge branch 'main' into is/18589_restart

092689a


          Merge remote-tracking branch 'origin/main' into cutoff_classifier_again

200ec31


          iter

e871558


          Merge remote-tracking branch 'glemaitre/is/18589_restart' into cutoff…

31aa1c0

…_classifier_again


          FEA add CutOffClassifier to post-tune prediction threshold

74614e8

github-actions bot added module:metrics module:utils labels

glemaitre changed the title ~~Cutoff classifier again~~ FEA add CutOffClassifier meta-estimator to post-tune the decision threshold


          DOC add changelog entry

27713af

glemaitre marked this pull request as draft

April 7, 2023 12:38

glemaitre added 8 commits

April 7, 2023 16:27


          refresh implementation

ed1d9b3


          add files


          remove random state for the moment

c7d1fe4


          TST make sure to pass the common test

c9d7a22


          TST metaestimator sample_weight

9981f3a


          API add prediction functions

b9c9d5e


          TST bypass the test for classification

588f1c4


          iter before another bug

243d173

glemaitre mentioned this pull request

Thresholds can exceed 1 in roc_curve while providing probability estimate #26193

Closed

glemaitre added 5 commits

April 18, 2023 18:26


          iter

883e929


          TST add test for _fit_and_score

69333ed


          iter

8616da1


          integrate refit

99a10b3


          TST more test

0f6dce2

glemaitre added 5 commits

May 2, 2024 15:46


          factorize plotting into a function

eb0defc


          fix typo in code

ffd5669


          use proper scoring rule and robust estimator to scale

18abafe


          improve narrative

ce9464c


          use grid-search

89d67cf

ogrisel reviewed

View reviewed changes

Member

ogrisel left a comment

Some more feedback.

sklearn/model_selection/_classification_threshold.py Outdated Show resolved Hide resolved

sklearn/model_selection/_classification_threshold.py Outdated Show resolved Hide resolved

sklearn/model_selection/_classification_threshold.py Outdated Show resolved Hide resolved

sklearn/model_selection/_classification_threshold.py Outdated Show resolved Hide resolved

examples/model_selection/plot_tuned_decision_threshold.py Outdated Show resolved Hide resolved


          Apply suggestions from code review

db3360b

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

glemaitre mentioned this pull request

MAINT create a specific scorer base class for curve metrics #28941

Open


          remove constrainted metrics option

1789cc0

jeremiedbb reviewed

View reviewed changes

sklearn/model_selection/_classification_threshold.py Outdated Show resolved Hide resolved

sklearn/model_selection/_classification_threshold.py

-                                  "max_recall_at_precision_constraint",
-                              }
-                          ),
+                          StrOptions(set(get_scorer_names())),

Member

jeremiedbb May 3, 2024

We should limit to classification metrics

Member Author

glemaitre May 3, 2024

Uhm not sure how we should handle that. We don't have any public mechanism available (only present in the test). I recall to have open such a useful PR at some point in time: #17930

Member

jeremiedbb May 3, 2024

let's figure this out separately then :)

sklearn/model_selection/_classification_threshold.py Outdated Show resolved Hide resolved


          partial review

e7c31b9

adrinjalali reviewed

View reviewed changes

sklearn/model_selection/_classification_threshold.py Outdated

-                  objective_metric : {"max_tpr_at_tnr_constraint", "max_tnr_at_tpr_constraint", \
-                          "max_precision_at_recall_constraint, "max_recall_at_precision_constraint"} \
-                          , str, dict or callable, default="balanced_accuracy"
+                  objective_metric : str, dict or callable, default="balanced_accuracy"

Member

adrinjalali May 3, 2024

shouldn't this be called scoring the same as *SeachCV? We're optimizing using a cv object optimizing for one or more metrics, exactly like *SearchCV, would be nice to have a consistent name for this.

Member

jeremiedbb May 3, 2024

I agree that it makes the API more consistent

Member Author

glemaitre May 3, 2024

I was trying to find where we got the discussion in the past but I could not find it.
I think that the fact that we had constraint metric was different enough from the "scoring" parameter that we decided to change the name.

Member

adrinjalali May 3, 2024

I'm pretty optimistic that we can find an API which we can share with *SearchCV which is both user friendly and capable of handling usecases we want. And as @jeremiedbb says in #26120 (review) , we're not far from nicely supporting it. So I'd say we can safely rename the parameter.

Member Author

glemaitre May 3, 2024

Now I have a question: do we need pos_label and response_method parameters. Those two can be delegated to make_scorer: response_method is a parameter of this function while pos_label is a parameter of the function provided to make_scorer and forwarded to it.

Member

jeremiedbb May 3, 2024

The downside is coming to "how many people are using make_scorer

the people who don't are fine with the defaults

Member

jeremiedbb May 3, 2024 •

edited

Loading

Currently, if I want to tune a LogisticRegression with a grid search and recall as scoring, either I pass scoring="recall" and rely on the defaults, or I pass a scorer made out of recall_score if I want to change the pos label. I think it's fine to mimic this behavior

Member

jeremiedbb May 3, 2024

But they will be requested for the FixedThresholdClassifier because we don't have any metric.

For this one we need to keep them indeed I think because if a user passes a threshold of say 0.3, we need to know if it's for a proba or for a decision function.

Member

jeremiedbb May 3, 2024

Actually, the response_method of the estimator and the response_method for the scorer are not the same, so we need to keep it.

Member Author

glemaitre May 3, 2024

Indeed. response_method is dispatched to _CurveScorer and not the scorer. It would be weird to request our user passing make_scorer(balanced_accuracy_score, response_method="predict_proba") while the scorer alone does not make any sense.

jeremiedbb reviewed

View reviewed changes

Member

jeremiedbb left a comment •

edited

Loading

I took a quick look and references to constrained metrics seem to have all been removed.

Also, I think the use case of constrained metric is still doable with the current API. I expect that something like the following works:

def max_tpr_at_tnr_constraint_score(y_true, y_pred, max_tnr):
    tpr = tpr(y_true, y_pred)
    tnr = tnr(y_true, y_pred)
    if tnr > max_tnr:
        return -np.inf
    return tpr

my_scorer = make_scorer(max_tpr_at_tnr_constraint_score, max_tnr=0.5)

doc/modules/classification_threshold.rst Outdated Show resolved Hide resolved

glemaitre added 3 commits

May 3, 2024 16:14


          rename objective_metric to scoring

0fd667c


          fix typo

07e4387


          remove pos_label and delegate to make_scorer

9bd68e6

Member Author

glemaitre commented May 3, 2024

I think that this is good for another round of review.

jeremiedbb approved these changes

View reviewed changes

Member

jeremiedbb left a comment

LGTM

Member Author

glemaitre commented May 3, 2024

LGTM

Deja-vu :)

adrinjalali approved these changes

View reviewed changes

Member

adrinjalali left a comment

LGTM. Could you please open separate issues for remaining work? (some refactoring and some doc concerns seem to be open).

adrinjalali enabled auto-merge (squash)

May 3, 2024 16:11

adrinjalali merged commit 1e49c34 into scikit-learn:main

Member

jeremiedbb commented May 3, 2024

🎉

Member Author

glemaitre commented May 3, 2024

The issue about refactoring is already open here: #28941.

For the documentation, I'll open one issue to see how to articulate the constrained part. The comment raised by @amueller is not anymore meaningful since we don't really allow to choose a point on the PR or ROC curve in a straight forward manner.

I'll also revive #17930

glemaitre mentioned this pull request

DOC add an example on how to optimize a metric with a constraint in TunedThresholdClassifierCV #28944

Open

Member

lorentzenchr commented May 3, 2024

🚀 @glemaitre 🎉
Thanks for this great addition many have been longing for.

In a way, I like the explicit FixedThresholdClassifier. Out of curiosity: is there a path forward that this could end up as mixing class in every classifier? Without the need to meta-class-wrap it by the user?

Member

ogrisel commented May 6, 2024

Out of curiosity: is there a path forward that this could end up as mixing class in every classifier? Without the need to meta-class-wrap it by the user?

Maybe but that would entail adding at least 2 new constructor parameters to all classifiers in scikit-learn that implement predict_proba or decision_function. That's kind of an invasive API change...

lorentzenchr mentioned this pull request

predict ought to have an optional threshold argument #4813

Closed

lorentzenchr mentioned this pull request

Allow for multiclass cost matrix in FixedThresholdClassifier and TunedThresholdClassifierCV #30970

Open

ogrisel mentioned this pull request

DOC Remove old section _fit_and_score_over_thresholds #31339

Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

jnothman jnothman left review comments

ogrisel ogrisel left review comments

amueller amueller left review comments

betatim betatim left review comments

thomasjpfan thomasjpfan left review comments

solegalli solegalli left review comments

lorentzenchr lorentzenchr left review comments

ArturoAmorQ ArturoAmorQ left review comments

StefanieSenger StefanieSenger left review comments

adrinjalali adrinjalali approved these changes

jeremiedbb jeremiedbb approved these changes

Labels

module:metrics module:utils