EHN allow scorers to set addtional parameter of scoring function #17962

glemaitre · 2020-07-21T09:01:27Z

Add a parameter copy to get_scorer to whether or not make a copy of a custom metric.
In addition, internal scikit-learn metrics will be copied.

glemaitre · 2020-07-21T09:07:15Z

ping @adrinjalali @thomasjpfan

I think this is a compromise that is still allowing to not make a copy for a custom scorer.

glemaitre · 2020-07-21T09:07:42Z

I will check the different call to get_scorer internally and make sure that we trigger a copy.

thomasjpfan

Can we consider this a bug so we do not need to introduce the copy parameter?

get_scorer can remain a noop if it is a callable.

sklearn/metrics/_scorer.py

glemaitre · 2020-07-22T09:15:56Z

It would be fine with me.

thomasjpfan · 2020-07-27T12:12:37Z

I was thinking doing deepcopy for only the scorers:

scorer = deepcopy(SCORERS[scoring])

and a noop for callables.

thomasjpfan

One minor comment

Otherwise LGTM!

sklearn/linear_model/_logistic.py

jnothman · 2020-08-01T23:12:54Z

Please update the title.
I'm +0.5 for this. I don't see why a user should be getting and modifying a scorer. Though I guess this reduces surprising bugs if they do.

glemaitre · 2020-08-03T06:59:45Z

I'm +0.5 for this. I don't see why a user should be getting and modifying a scorer. Though I guess this reduces surprising bugs if they do.

I don't like to not have a full +1 from @jnothman :)

More seriously, in our internal use case, we would have a user giving us roc_auc in GridSearchCV which would pick the roc_auc_scorer. However, to be sure that everything goes fine, we would need to set pos_label (and find the right columns in the predictions). So we would like to modify the scorer. Another use case (again internally), when developing the convenience functions to return the set of possible scorers, we will need to modify some parameters and modify them.

So some potential alternative:

let get_scorer returning a copy;
make the deep copy at the level where we will need to modify the scorer;
to have a make_scorer that can take a scorer already and returned a new object with modified parameters.

jnothman · 2020-08-03T12:07:58Z

As far as I can tell, the use cases you're talking about involve modifying the private attributes of an object. The fact that you want to continue doing that signals to me that something's wrong with this design... While we often don't follow it in Scikit-learn for the sake of simplicity, good OOP design says that if we want to allow the modification of a scorer, we should facilitate that with a public API. We could have a method like set_kwargs, in which case get_scorer should copy so that the setting can happen in-place, or a method like copy_with_kwargs. This change makes no sense to me because it encourages a breach of the API.

…

glemaitre · 2020-08-03T12:21:43Z

OK this seems reasonable. I will propose an alternative that will expose a public API.

glemaitre · 2020-08-04T17:48:00Z

sklearn/metrics/_scorer.py

@@ -165,6 +167,38 @@ def _factory_args(self):
        """Return non-default make_scorer arguments for repr."""
        return ""

+    def set_kwargs(self, **kwargs):


@jnothman is it this type of interface that you had in mind?

adrinjalali · 2020-08-04T19:52:08Z

Please update the title.
I'm +0.5 for this. I don't see why a user should be getting and modifying a scorer. Though I guess this reduces surprising bugs if they do.

With sample props, the scorer defines which parameter should be passed to it, and that's something which is changed by the user and it shouldn't be changed on the singleton object.

jnothman

A few comments here.

Yes, you're right @adrinjalali, that's a valid use-case for get_scorer copying. But it's not a current use-case.

I'm not yet sure if get_scorer(name, **kw) is motivated. When do we need it? Also, using ** blocks the option to extend get_scorer in other ways, so I'd be careful about this particular signature.

And, while I didn't think of it previously, a minimum sufficient change to the scorer objects would have been to rename the _kwargs attribute to kwargs.

glemaitre · 2020-08-04T20:12:30Z

I'm not yet sure if get_scorer(name, **kw) is motivated. When do we need it? Also, using ** blocks the option to extend get_scorer in other ways, so I'd be careful about this particular signature.

I need to go back to where I need it. I think that I am getting confused :)

And, while I didn't think of it previously, a minimum sufficient change to the scorer objects would have been to rename the _kwargs attribute to kwargs.

True.

glemaitre · 2020-08-04T22:16:35Z

OK let me put the steps that make me think that we need to have a copy (I am struggling myself to follow them):

A user will pass scoring='roc_auc' in the GridSearchCV.
This roc_auc will call the scorer roc_auc_scorer created as make_scorer(roc_auc_score, ...).
This scorer is a _ThresholdScorer which might call y_pred and because roc_auc_score takes only a single column, it will need to infer the positive label which will be y_pred[:, 1].
However, in the case of passing y_true as a string dtype, we will need to binarize it (which is currently using np.unique).

At this stage, there is a potential bug because the encoding of y_true is not necessarily corresponding to the encoding induced by y_pred. The ambiguity could be removed by passing a parameter pos_label. It is what we try there: #17704

However, we are lucky because roc_auc_score is a symmetric metric. Even if pos_label is not we can manage to compute the score. We just need to be sure to select the column of y_pred which corresponds to the positive label used to encode y_true.

So #17704 can resolve this issue as follow:
Detect that we have symmetric scores (maybe using some score properties metadata). In this case, _ThresholdScorer and _ProbaScorer can safely modify their attribute self._kwargs by imposing a pos_label, and ensuring the right encoding of y_true by passing pos_label to self._score_func.

However, by self-mutating, we are actually mutating SCORER["roc_auc"] and thus sklearn.metrics.roc_auc_scorer.

We actually have this issue when running the test suite: we import roc_auc_scorer and mutate it for one of the dataset and it will be failing when using in another test because pos_label will be set (and the debugging was not fun :)).

When I try to resolve these problems, I have the impression to be at the front of a chicken-egg problem (with many chickens and many eggs :S):

Solving the issue with pos_label in roc_auc_score: this one is easy because we can just introduce the parameter and raise an error when we the pos_label should be provided.
Solving the previous issue will introduce a regression since we are currently supporting (sometimes wrongly) roc_auc without specifying pos_label in GridSearchCV. If we want to support it, we will need to: (i) detect if a score is symmetric and (ii) self-mutate a scorer.
The previous self-mutation will be an issue if we don't copy when we get the scorer with get_scorer.
I am not sure what would be the best way to add metadata to scores such that we know some of the properties (I was going to propose something like this: https://github.com/scikit-learn/scikit-learn/pull/17930/files#diff-e907207584273858caf088b432e38d04R772)

Sorry for the long post but I tried to summarize the best I could. I already annoyed @thomasjpfan with a call where we kind of concluded that the issues could be decoupled but it does not seem to be that easy :) It seems that we should solve the issue in the reverse order that I pointed out. Since solving these issues might introduce some new API changes, I would really appreciate any advice and thoughts regarding these problems :)

jnothman · 2020-08-05T00:11:56Z

Firstly: I remain unconvinced that there is a problem with users getting incorrect roc auc results when using the standard scorer: our convention is clearly to encode probabilities to match classes_, which should be sorted. I think that the need to allow for pos_label in roc_auc_score, where we do not explicitly require the input to come from a scikit-learn-compatible classifier is reasonable and separate. What you did not clarify is that in also testing the ability to use pos_label in a custom roc_auc scorer in #17704, you took the approach of getting the scorer named 'roc_auc' and modified it, rather than the documented approach of using make_scorer with kwargs. @adrinjalali, I had similarly assumed that the convention in implementing SLEP06 was to follow the assumption that scorers were immutable, and hence put the metadata request into make_scorer in https://github.com/scikit-learn/enhancement_proposals/blob/677293140f8b6961ca274d0dbc6bc99e34934b50/slep006/cases_opt4.py#L14, not as a method of the scorer. The fact that both of you thought scorers should be mutable says something. It indeed seems easier for a user to get a scorer and modify it (either as a mutable object by setting state, which is more in line with scikit-learn estimators, or as an immutable object by creating a modified copy), than to construct it from scratch with make_scorer. I suspect, then, that the goal of this PR should be: allow get_scorer to be used to reconfigure a scorer. This implies making the scorer mutable and exposing attributes or methods to mutate it, and copying in get_scorer; alternatively it implies making the scorer more explicit immutable by having configuration methods that create a new object (like python's string method). In any case, this change needs to be substantiated by updating the documentation to demonstrate this as a friendly alternative to make_scorer.

thomasjpfan · 2020-08-05T18:04:02Z

Personally, I would prefer to continue the assumption that scorers were immutable (if possible).

If the goal is to resolve #17704, we do not need to modify get_scorers. In _score, we can create a copy of kwargs in _score, add pos_label to it, and then pass it along to self._score_func. This way the scorer object renames unchanged.

There may be another way to resolve #17704 while keeping get_scorer the same.

adrinjalali · 2020-08-12T15:59:29Z

I see scorers as an object which have a state, and the sate can also be data dependent, especially with custom scorers which deal with column or sample meta-data.

I agree one solution is to make everybody to use make_scorer to get a scorer, but I don't find it intuitive and I believe it'll result in silent or loud bugs/issues by the users.

And I know this is about get_scorer, but I do remember people hacking the dictionary and adding custom scorers to the dict to make their code look nicer.

On the other hand, could you (@thomasjpfan , @jnothman ) explain why immutability of scorers is so important?

jnothman · 2020-08-12T21:06:46Z

I don't understand what you mean by scorers having state. I think you mean that they are configured individually but "having state" ordinarily means to me that during use of the object it changes the data stored in it. In this sense, scorers are stateless, and it would be problematic if they were not: if they were to have state modified after construction, it would imply inconsistent measures of performance across evaluations. This is clearly not what you mean by having a state, hence the notion that they store configuration but are stateless. "Hacking the dictionary" in any other framework would be called "registering an extension" or plugin. I have no problem providing this functionality more formally. For me the only reason to allow the user to mutate the configuration of a scorer is that this is consistent with BaseEstimator.set_params.

glemaitre · 2020-08-12T21:16:19Z

Would it make sense to inherit the BaseScorer from the BaseEstimator? Since we don't have `fit` as an abstract method, we can take profit from the `set_params` and `get_params`. And the scorer would follow our estimator API.

…

On Wed, 12 Aug 2020 at 23:07, Joel Nothman ***@***.***> wrote: I don't understand what you mean by scorers having state. I think you mean that they are configured individually but "having state" ordinarily means to me that during use of the object it changes the data stored in it. In this sense, scorers are stateless, and it would be problematic if they were not: if they were to have state modified after construction, it would imply inconsistent measures of performance across evaluations. This is clearly not what you mean by having a state, hence the notion that they store configuration but are stateless. "Hacking the dictionary" in any other framework would be called "registering an extension" or plugin. I have no problem providing this functionality more formally. For me the only reason to allow the user to mutate the configuration of a scorer is that this is consistent with BaseEstimator.set_params. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#17962 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABY32P7TDBPR4KPENU66DILSAL77LANCNFSM4PDK4P3Q> .

-- Guillaume Lemaitre Scikit-learn @ Inria Foundation https://glemaitre.github.io/

jnothman · 2020-08-12T21:33:35Z

I think there would be benefit in allowing scorers to be fitted on a training set, to set labels or valid ranges. I think that's a slightly separate issue, since it assumes that the configuration of the metric can be estimated from data, rather than the user modifying the parameterisation of an existing scorer explicitly. There are nice things about that approach, and I'd be open to discussing it more explicitly.

jnothman · 2020-08-12T21:41:54Z

Although for consistent scorers in a nested cv setting you want them to be fitted on the entire dataset, not a training sample. I think what your talking about here should be framed not as making them estimators, but making them parameterized like estimators and kernels.

glemaitre · 2020-08-13T06:29:42Z

Indeed, get_params and set_params would be enough.

glemaitre · 2020-08-13T06:54:57Z

So, we could have set_params and get_params to deal with the scoring_func parameters. I don't think these 2 methods should allow changing either the scoring_func neither the sign. Changing those should require a call to make_scorer.

I'm not yet sure if get_scorer(name, **kw) is motivated. When do we need it? Also, using ** blocks the option to extend get_scorer in other ways, so I'd be careful about this particular signature.

From what I was drafting in #18141, it seems that there is a use case in order to be more convenient:

    multi_class = "multinomial"
    labels = np.unique(y)
    scoring = get_scorer("neg_log_loss", labels=labels)
    random_state = 1

    model = LogisticRegressionCV(
        multi_class=multi_class, scoring=scoring, random_state=random_state
    )
    model.fit(X, y)

if we don't allow **kwargs one needs to have an extra line to call set_params on the returned scorer.
Would it not still be possible to extend the get_scorer using keywords-only argument when introducing new arguments?

adrinjalali · 2020-08-13T08:22:00Z

I think what your talking about here should be framed not as making them
estimators, but making them parameterized like estimators and kernels.

Parameterized is a much better way of conveying what I meant by "having a state".
Does that mean you're leaning towards accepting this PR or a variation of it?

thomasjpfan · 2021-04-13T16:08:54Z

Going back to get this, I agree with #17962 (comment). This allows for a easy transition from using a string for scoring and then creating your own custom scorer.

If a user were to use make_scorer, I think it would be okay to have them call make_scorer again with the extra kwargs. This way we do not need the set_kwargs which keeps the scorers immutable.

adrinjalali · 2024-04-15T09:43:55Z

@glemaitre what do we need to do here? in the meantime, we've also deprecated SCORERS and users always get a copy, and set_score_request mutates self.

glemaitre · 2024-05-18T09:05:45Z

I'm not sure anymore if I agree with myself of 4 years ago. It seems that I expected get_scorer to do the job of make_scorer in some way but without having to pass the function but only using the string.

I think we can close it this PR since the state of the scorers is better nowadays. We probably more work on the API scorer side.

glemaitre · 2024-05-18T09:09:41Z

Basically, I think it was the idea that I had in #18141

glemaitre added 2 commits July 21, 2020 10:59

FIX/ENH add copy parameter to get_scorer

4de8eac

DOC update pr number

34e9bf0

github-actions bot added the module:metrics label Jul 21, 2020

glemaitre added 7 commits July 21, 2020 11:18

FIX make sure we do a copy of the scorer in LogisticRegressionCV

a46e123

FIX make sure check_scoring is also making a copy

c56aad3

DOC update whats new

4e179dc

FIX add accuracy as default

b7e703b

typo

1756e8b

TST fix matching string

b6b9b60

iter

5bf7691

thomasjpfan reviewed Jul 22, 2020

View reviewed changes

sklearn/metrics/_scorer.py Outdated Show resolved Hide resolved

glemaitre added 2 commits July 22, 2020 11:32

FIX address thomas comments

a1e5680

FIX remove copy in LRCV

765646f

glemaitre added 2 commits July 30, 2020 15:49

apply review from thomas

b1c7889

TST revert assert with customized scorer

0a752cb

thomasjpfan previously approved these changes Jul 31, 2020

View reviewed changes

sklearn/linear_model/_logistic.py Show resolved Hide resolved

glemaitre added 2 commits July 31, 2020 17:01

apply thomas suggestions

6f1641c

Merge remote-tracking branch 'glemaitre/is/17942' into is/17942

75f549a

glemaitre changed the title ~~FIX/ENH add copy parameter to get_scorer~~ FIX make sure get_scorer returns a deepcopy of internal sklearn scorers Aug 3, 2020

glemaitre added 3 commits August 4, 2020 16:00

Merge remote-tracking branch 'origin/master' into is/17942

08d14eb

ENH/TST add set_kwargs to scorer

3c7b4a8

DOC add documentation for new kwargs

53a79ff

glemaitre added 2 commits August 4, 2020 19:35

DOC add some documentation in whats new

0392b77

DOC update whats new

e5b7848

glemaitre changed the title ~~FIX make sure get_scorer returns a deepcopy of internal sklearn scorers~~ EHN allow scorers to set addtional parameter of scoring function Aug 4, 2020

glemaitre commented Aug 4, 2020

View reviewed changes

jnothman reviewed Aug 4, 2020

View reviewed changes

thomasjpfan mentioned this pull request Aug 13, 2020

ENH allow to pass str or scorer to make_scorer #18141

Closed

Base automatically changed from master to main January 22, 2021 10:52

thomasjpfan mentioned this pull request Apr 13, 2021

Accept SLEP006 scikit-learn/enhancement_proposals#52

Closed

jnothman mentioned this pull request Jul 12, 2021

sample-props alternate implementation #20350

Closed

adrinjalali mentioned this pull request Mar 17, 2022

Should get_scorer return a deep copy of the scorer object #17942

Closed

glemaitre closed this May 18, 2024

Uh oh!

EHN allow scorers to set addtional parameter of scoring function #17962

EHN allow scorers to set addtional parameter of scoring function #17962

Uh oh!

Conversation

glemaitre commented Jul 21, 2020

Uh oh!

glemaitre commented Jul 21, 2020

Uh oh!

glemaitre commented Jul 21, 2020

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Jul 22, 2020

Uh oh!

thomasjpfan commented Jul 27, 2020

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jnothman commented Aug 1, 2020

Uh oh!

glemaitre commented Aug 3, 2020

Uh oh!

jnothman commented Aug 3, 2020 via email

Uh oh!

glemaitre commented Aug 3, 2020

Uh oh!

glemaitre Aug 4, 2020

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Aug 4, 2020

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Aug 4, 2020

Uh oh!

glemaitre commented Aug 4, 2020

Uh oh!

jnothman commented Aug 5, 2020 via email

Uh oh!

thomasjpfan commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Aug 12, 2020

Uh oh!

jnothman commented Aug 12, 2020 via email

Uh oh!

glemaitre commented Aug 12, 2020 via email

Uh oh!

jnothman commented Aug 12, 2020 via email

Uh oh!

jnothman commented Aug 12, 2020 via email

Uh oh!

glemaitre commented Aug 13, 2020

Uh oh!

glemaitre commented Aug 13, 2020

Uh oh!

adrinjalali commented Aug 13, 2020

Uh oh!

thomasjpfan commented Apr 13, 2021

Uh oh!

adrinjalali commented Apr 15, 2024

Uh oh!

glemaitre commented May 18, 2024

Uh oh!

glemaitre commented May 18, 2024

Uh oh!

Uh oh!

thomasjpfan commented Aug 5, 2020 •

edited

Loading