ENH allow to pass str or scorer to make_scorer #18141

glemaitre · 2020-08-11T16:21:55Z

This PR proposes to add the option to pass a scoring function name or a scorer in order to build a scorer via make_scorer.
In some way, it would simplify the scorer API. For instance, the following example is raising an error because the class 9 is not present in the fold during cross-validation:

import sklearn.datasets
import sklearn.linear_model
import numpy as np

X, y = sklearn.datasets.load_digits(return_X_y=True)

n9 = 2

y9 = y[y == 9]
X9 = X[y == 9]

X = X[y != 9]
y = y[y != 9]

X = np.vstack([X, X9[0:n9]])
y = np.hstack([y, y9[0:n9]])

multi_class = "multinomial"
scoring = "neg_log_loss"
random_state = 1

model = sklearn.linear_model.LogisticRegressionCV(
    multi_class=multi_class, scoring=scoring, random_state=random_state
)
model.fit(X, y)

The way to solve it is to use a scorer. However, one needs to know what all the make_scorer parameters mean:

    multi_class = "multinomial"
    labels = np.unique(y)
    scoring = make_scorer(
        log_loss, greater_is_better=False, labels=labels, needs_proba=True
    )
    random_state = 1

    model = LogisticRegressionCV(
        multi_class=multi_class, scoring=scoring, random_state=random_state
    )
    model.fit(X, y)

I am thinking that the following would be simpler where some parameters can be inferred from the base scorer.

    multi_class = "multinomial"
    labels = np.unique(y)
    scoring = make_scorer("neg_log_loss", labels=labels)
    random_state = 1

    model = LogisticRegressionCV(
        multi_class=multi_class, scoring=scoring, random_state=random_state
    )
    model.fit(X, y)

thomasjpfan

With this update, is there any reason to keep get_scorer as public api?

glemaitre · 2020-08-12T06:08:39Z

With this update, is there any reason to keep get_scorer as public api?

This is true that it could only be private then.

jnothman · 2020-08-12T10:54:51Z

If this is only used to modify make_scorer's kwargs (and not its more fundamental params), then wouldn't get_scorer(name, **kwargs) be more natural?

glemaitre · 2020-08-12T11:41:57Z

If this is only used to modify make_scorer's kwargs (and not its more fundamental params), then wouldn't get_scorer(name, **kwargs) be more natural?

Indeed, it makes the parameters bypassing less odd I think.

My only concern is about the documentation and how we can do to make this part understandable.
My proposal would be to:

if you wish to get a predefined scorer, use get_scorer;
if you get an error because you need to pass extra parameter within a CV object, use get_scorer and pass the parameters;
if you want to go free solo designing your own scorer from a callable, use make_scorer.

For the second point, I think that we should raise a better error message. Usually, a user will use a CV object and get something that he needs to explicitly pass some parameters. I don't know if we can easily detect who is calling the scoring function (a CV object or the user directly) but we could at least mention that if you use a CV object, use get_scorer (or currently make_scorer).

thomasjpfan · 2020-08-13T20:46:32Z

I think make_scorer(..., **kwargs) is a little more natural when compared to get_scorer(..., **kwargs). When providing **kwargs, we are making a "new scorer" and not "getting a preexisting one".

jnothman · 2020-08-13T22:31:07Z

I think allowing make_scorer to take a string, and nominally to modify whether greater is better when you do so is super awkward. So if we really need another verb like customize_scorer, so be it (I'd rather not), but I suspect I'd vote against allowing a string in make_scorer as giving it too much mixed responsibility.

thomasjpfan · 2020-08-13T22:48:20Z

So if we really need another verb like customize_scorer, so be it (I'd rather not)

I agree to not have another function.

I think allowing make_scorer to take a string, and nominally to modify
whether greater is better when you do so is super awkward.

Thinking it over, passing make_scorer another scorer is a bit strange. (In this case the scorer is passed as a string.)

@glemaitre Should we close this and try to move forward with #17962?

glemaitre · 2020-08-14T08:15:20Z

@glemaitre Should we close this and try to move forward with #17962?

I am fine closing. But this is still not clear where to we go forward. Are we ok with:

Add get_params and set_params to the scorer API. If so, these methods should only update the scoring function parameters (i.e. the one in scorer._kwargs) or the scorer parameters (i.e. scorer._scoring_func, scorer._sign, and scorer._kwargs). Here, I would be in favour of the former.
Making the scorer mutable, we need to make get_scorer returning a copy.
(optional maybe) accept strings in get_scorer or we are fine with getting scorer followed by scorer.set_params()?

thomasjpfan · 2020-08-14T13:14:31Z

Adding get_params and set_params elevates the scorer object to be a little more than a "callable'. This would help resolve other types of issues as noted by @jnothman #17962 (comment) I am a concerned with how users will provide their own custom scorers. Would their scorers need to have the *_params interface?

As for this specific issue with "adjusting a str represented by a scorer", adding **kwargs to get_scorer would be okay as a user facing feature. It is still a little weird that nothing happens with the **kwargs if a callable is passed to get_scorer.

jnothman · 2020-08-15T11:11:06Z

Would we also get the list of params from the signature of the underlying function? We'll have to test if that works nicely with partials etc.

thomasjpfan · 2022-04-24T13:50:31Z

Now that we have #22866 and get_scorer returns a copy, I think we can update this PR to get_scorer(name, **kwargs).

Would we also get the list of params from the signature of the underlying
function? We'll have to test if that works nicely with partials etc.

If it is a callable, I see two options:

get_scorer(callable, **kwargs) map directly to partial(callable, **kwargs) and do no checking.
Do nothing with **kwargs. (I prefer this option since I prefer to deprecate callable support in get_scorer)

From an API point of view, I prefer to only support strings in get_scorer and error for keys that is not in _SCORERS.

glemaitre · 2024-05-18T09:16:15Z

So 2 years later, looking at all the different changes I have the following opinion:

_Scorer could be made public through a Scorer object and we could have 2 factory methods from_function and from_string to create 2 scorers. This would the proper OOP way.
If we feel that the above is too much OOP for scikit-learn, it only means that we need to have two functions to call the private factories and do not expose _Scorer.

So I assume that this should be part of the discussion of the potential future scoring API change: #28995

glemaitre added 2 commits August 11, 2020 18:16

ENH allow to pass str or scorer to make_scorer

11b77ec

update whats new

2a3de12

github-actions bot added the module:metrics label Aug 11, 2020

glemaitre added 2 commits August 11, 2020 18:25

STY

d0e4d2a

DOC improve example docstring

7ee4b23

thomasjpfan reviewed Aug 11, 2020

View reviewed changes

DOC improve user guide

cea5ad8

FIX set random_state

f7f8746

glemaitre mentioned this pull request Aug 13, 2020

EHN allow scorers to set addtional parameter of scoring function #17962

Closed

Base automatically changed from master to main January 22, 2021 10:53

glemaitre closed this May 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH allow to pass str or scorer to make_scorer #18141

ENH allow to pass str or scorer to make_scorer #18141

glemaitre commented Aug 11, 2020

thomasjpfan left a comment

glemaitre commented Aug 12, 2020

jnothman commented Aug 12, 2020

glemaitre commented Aug 12, 2020 •

edited

Loading

thomasjpfan commented Aug 13, 2020

jnothman commented Aug 13, 2020 via email

thomasjpfan commented Aug 13, 2020

glemaitre commented Aug 14, 2020

thomasjpfan commented Aug 14, 2020 •

edited

Loading

jnothman commented Aug 15, 2020 via email

thomasjpfan commented Apr 24, 2022

glemaitre commented May 18, 2024

ENH allow to pass str or scorer to make_scorer #18141

ENH allow to pass str or scorer to make_scorer #18141

Conversation

glemaitre commented Aug 11, 2020

thomasjpfan left a comment

Choose a reason for hiding this comment

glemaitre commented Aug 12, 2020

jnothman commented Aug 12, 2020

glemaitre commented Aug 12, 2020 • edited Loading

thomasjpfan commented Aug 13, 2020

jnothman commented Aug 13, 2020 via email

thomasjpfan commented Aug 13, 2020

glemaitre commented Aug 14, 2020

thomasjpfan commented Aug 14, 2020 • edited Loading

jnothman commented Aug 15, 2020 via email

thomasjpfan commented Apr 24, 2022

glemaitre commented May 18, 2024

glemaitre commented Aug 12, 2020 •

edited

Loading

thomasjpfan commented Aug 14, 2020 •

edited

Loading