Skip to content

[WIP] ENH create a generator of applicable metrics depending on the target y #17889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: main
Choose a base branch
from

Conversation

glemaitre
Copy link
Member

closes #12385
build on #15126

Create a generator of possible metrics to be used for a problem inferred from the target y.
The dictionary can be passed directly to the GridSearchCV.
Additional parameters can be passed to the utility which will be used when generating the metrics.

@glemaitre glemaitre changed the title ENH create a generator of applicable metrics depending on the target y [WIP] ENH create a generator of applicable metrics depending on the target y Jul 10, 2020
@jnothman
Copy link
Member

Thanks, @glemaitre, I have also considered working on this of late. I think it has the potential to provide many benefits to usability. Thanks a lot for getting it started!

If the helper returns a dict mapping names to scorers, #15126 is not actually needed. #15126 allows you to provide a callable which returns a dict once called. This has the advantage that it could compute multiple score results efficiently, such as each cell of a confusion matrix. So you need to decide which approach you are taking.

I think we need to separate classification and regression (and perhaps other) metrics into different factories. Initially, I think we should disregard multioutput classification (and maybe other cases). Indeed, it might be easiest to focus on regression initially, although we get more benefit for classification.

Classification comes with many challenges. I want to avoid returning "precision", "recall" and "f1" unqualified; they needs to say which class is being evaluated, since they are asymmetric binary metrics. I don't mind if we return these metrics considering each class as the positive, or just one specified by the user. I think we should return each cell of a confusion matrix as a different scorer, as these can be valuable diagnostics. We need to make sure that classification metrics are passed labels when possible. But I know that these ideas are leaning towards over-engineering, so feel free to try to make it simpler. :)

It also might be easiest to implement this as a callable class, rather than a function.

@glemaitre
Copy link
Member Author

To be honest, I still feel unfamiliar with the _search internals :)

If the helper returns a dict mapping names to scorers, #15126 is not actually needed. #15126 allows you to provide a callable which returns a dict once called. This has the advantage that it could compute multiple score results efficiently, such as each cell of a confusion matrix. So you need to decide which approach you are taking.

One thing that I am not sure to get is why both PR would be exclusive. One could use a callable as in #15126 and pass this callable (which for now is a function)?

I think we need to separate classification and regression (and perhaps other) metrics into different factories. Initially, I think we should disregard multioutput classification (and maybe other cases). Indeed, it might be easiest to focus on regression initially, although we get more benefit for classification.

I agree. Regarding multioutput classification, do you refer to multioutput-multiclass classification (for which we don't have metric) or multilabel classification?

@glemaitre
Copy link
Member Author

Classification comes with many challenges. I want to avoid returning "precision", "recall" and "f1" unqualified; they needs to say which class is being evaluated, since they are asymmetric binary metrics. I don't mind if we return these metrics considering each class as the positive, or just one specified by the user. I think we should return each cell of a confusion matrix as a different scorer, as these can be valuable diagnostics. We need to make sure that classification metrics are passed labels when possible. But I know that these ideas are leaning towards over-engineering, so feel free to try to make it simpler. :)
It also might be easiest to implement this as a callable class, rather than a function.

A class callable seems better indeed.

I am thinking that we could pass a list of metrics to filter. In case, we don't have any parameters pos_label, average, etc ... we should probably return all possible qualified metric and restrain to the one possible when passing the parameters to the constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Function to get scorers for task
3 participants