Skip to content

TunedThresholdClassifierCV: add other metrics #29061

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
koaning opened this issue May 21, 2024 · 10 comments
Closed

TunedThresholdClassifierCV: add other metrics #29061

koaning opened this issue May 21, 2024 · 10 comments
Labels
Needs Triage Issue requires triage New Feature

Comments

@koaning
Copy link

koaning commented May 21, 2024

Describe the workflow you want to enable

I figured that I might use the new tuned thresholder to turn code like this into something that's a bit more like gridsearch with all the parallism benefits.

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, f1_score, precision_score, recall_score
from sklearn.model_selection import FixedThresholdClassifier, train_test_split
from tqdm import trange


X, y = make_classification(
    n_samples=10_000, weights=[0.9, 0.1], class_sep=0.8, random_state=42
)

X_train, X_test, y_train, y_test = train_test_split(
    X, y, stratify=y, random_state=42
)

classifier = LogisticRegression(random_state=0).fit(X_train, y_train)

n_steps = 200
metrics = []
for i in trange(1, n_steps):
    classifier_other_threshold = FixedThresholdClassifier(
        classifier, threshold=i/n_steps, response_method="predict_proba"
    ).fit(X_train, y_train)
    
    y_pred = classifier_other_threshold.predict(X_train)
    metrics.append({
        'threshold': i/n_steps,
        'f1': f1_score(y_train, y_pred),
        'precision': precision_score(y_train, y_pred),
        'recall': recall_score(y_train, y_pred),
        'accuracy': accuracy_score(y_train, y_pred)
    })

This data can give me a very pretty plot with a lot of information.

CleanShot 2024-05-21 at 10 18 42

But I think I can't make this chart with the new tuned thresholder in the 1.5 release candidate.

I can do this:

from sklearn.model_selection import TunedThresholdClassifierCV
from sklearn.metrics import make_scorer

classifier_other_threshold = TunedThresholdClassifierCV(
    classifier,  
    scoring=make_scorer(f1_score), 
    response_method="predict_proba", 
    thresholds=200, 
    n_jobs=-1, 
    store_cv_results=True
)
classifier_other_threshold.fit(X_train, y_train)

And this gives me data for a pretty plot as well, but it only contains the f1 score. There is no way to add extra metrics in the current implementation.

CleanShot 2024-05-21 at 10 20 49

Describe your proposed solution

Maybe it makes sense to also allow a metrics input to the tuned cv estimator. That way, it can still collect any extra metrics that we might be interested in.

Describe alternatives you've considered, if relevant

The aforementioned code shows the chart that I am mainly interested in. A single metric never tells me the full story and the extra metrics help prevent me overfit on a single variable. Reality tends to be more complex than what a single metric can provide, so I like to nudge extra context with some (custom) metrics.

Additional context

No response

@koaning koaning added Needs Triage Issue requires triage New Feature labels May 21, 2024
@glemaitre
Copy link
Member

Duplicate of #21391
The metric is started to be implemented in #25639
Then, the next step will be to implement the display that will call the metric under the hood.

@glemaitre
Copy link
Member

glemaitre commented May 21, 2024

Maybe it makes sense to also allow a metrics input to the tuned cv estimator. That way, it can still collect any extra metrics that we might be interested in.

I would be against this API. We should limit the internal attribute to the modelling only (currently optimizing a single metric). However, if we have a display, then we can input the model and compute and store the metric. So in terms of usability I think the display should be in charge of storing the metric log.

@glemaitre
Copy link
Member

@koaning Do you think this is fine to close this issue in favor of #21391?

@koaning
Copy link
Author

koaning commented May 21, 2024

Oh, just to be clear, I am not worried about the display/chart. This is more just an example of a manual chart that I might be interested in making. I am mainly concerned with being able to dive into the effect that a different threshold may have. There can be multiple concerns, and a single metrics usually doesn't capture that.

What I want is the ability to add metrics, some of which are custom, that can help ensure that optimizing for one thing, say f1 score, does not cause issues else where, say a fairness metric. I am totally fine with optimizing a single metric, that totally makes sense! But I would prefer to also be able to report on other metrics while doing so.

@glemaitre
Copy link
Member

glemaitre commented May 21, 2024

What I want is the ability to add metrics, some of which are custom, that can help ensure that optimizing for one thing, say f1 score, does not cause issues else where, say a fairness metric. I am totally fine with optimizing a single metric, that totally makes sense! But I would prefer to also be able to report on other metrics while doing so.

But once you have the model, this is just an evaluation issue. So by providing a function sklearn.metrics.decision_threshold_curve that take any callable (maybe even a list of callable) should allow for custom and already defined metric.

However, this require an additional line of code because it is not called within the TunedThresholdClassifierCV but afterwards:

model = TunedThresholdClassifierCV(.., scoring=business_metrics, ...).fit(X_train, y_train)
metrics_log = decision_threshold_curve(
    y_test, model.predict_proba(X_test), scoring=[business_metrics, f1_score, ...]
)

The advantage here is that you can compute the score on a provided dataset and not only an internal validation set. So you will be able to compare train/test or run it through cross-validation.

@koaning
Copy link
Author

koaning commented May 21, 2024

I was not aware of the decision_threshold_curve but I also could not find it. Just to make sure, is that a typo?

CleanShot 2024-05-21 at 10 48 23

@glemaitre
Copy link
Member

I was not aware of the decision_threshold_curve but I also could not find it. Just to make sure, is that a typo?

Nop, this is my proposal in #25639

@koaning
Copy link
Author

koaning commented May 21, 2024

Ahhhh sorry, now I see. Yeah, ok, with a visualisation feature like that I suppose you could always do that exercise without using an automated tuner and I also see how that is a separate problem.

Fair enough. Closing this one! Thanks for the response.

@koaning koaning closed this as completed May 21, 2024
@skanskan
Copy link

Is TunedThresholdClassifierCV with refit=True the same than GridSearchCV using as hyperparameter the threshold?
Do they refit all parameters in the model at the same time than the hyperparameters?

@glemaitre
Copy link
Member

glemaitre commented Sep 17, 2024

refit=True means that once you picked up the set of best parameters, you retrain the underlying estimator on the full dataset with selected parameters thus similar to the GridSearchCV policy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage Issue requires triage New Feature
Projects
None yet
Development

No branches or pull requests

3 participants