-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Adds _MultimetricScorer for Optimized Scoring #14593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Adds _MultimetricScorer for Optimized Scoring #14593
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I think this could use a few more comments to describe the logic.
I'm not a huge fan of using None
as _method_cacher
for it to revert to the default method cacher of _BaseScorer
. (Maybe with my suggestions I'd find it clearer, IDK)
Maybe add a sanity check that makes sure that passing another X
gives different results even when caching is involved.
sklearn/metrics/scorer.py
Outdated
scorers = {"score": check_scoring(estimator, scoring=scoring)} | ||
return scorers, False | ||
return _MultimetricScorer(**scorers), False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why return a MultiMetricScorer
when there is only one scorer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_check_multimetric_scoring
always returned a multimetric scorer. (On master it returned a dictionary which was the data structure used to denote "mutlimetric scoring".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, a dict with only one key (as here) denotes a single metric scorer. That's the reason is_multimetric
is False here.
Since no caching happens with a single-metric scorer, I think we should not change this part and still return the dict, to avoid the confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or else, MultiMetricScorer
should have a whole different name. It doesn't make sense to return a MultiMetricScorer
instance while is_multimetric
is False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A user can pass a dictionary to scoring
with one key and is_multimetric
will be true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do plan on removing is_multimetric
. And have "anything that returns a dictionary" as multimetric.
'll2': 'neg_log_loss', | ||
'ra1': 'roc_auc', | ||
'ra2': 'roc_auc' | ||
}, 1, 1, 1), (['roc_auc', 'accuracy'], 1, 0, 1)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for readability maybe separate both cases with a line break
@@ -543,3 +544,41 @@ def test_scoring_is_not_metric(): | |||
Ridge(), r2_score) | |||
assert_raises_regexp(ValueError, 'make_scorer', check_scoring, | |||
KMeans(), cluster_module.adjusted_rand_score) | |||
|
|||
|
|||
@pytest.mark.parametrize("scorers,predicts,predict_probas,decision_funcs", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected_predict_count
, expected_predict_proba_count
, ... ?
Long names, I know :/
scorer, _ = _check_multimetric_scoring(LogisticRegression(), scorers) | ||
scores = scorer(mock_est, X, y) | ||
|
||
assert set(scorers) == set(scores) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because I was slightly confused at first:
assert set(scorers) == set(scores) | |
assert set(scorers) == set(scores) # compare dict keys |
sklearn/metrics/scorer.py
Outdated
return True | ||
|
||
if counter[_ThresholdScorer] > 0 and (counter[_PredictScorer] or | ||
counter[_ThresholdScorer]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is equivalent to
counter[_ThresholdScorer] and (counter[_PredictScorer]
(the or
isn't useful)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have been:
if counter[_ThresholdScorer] and (counter[_PredictScorer] or
counter[_ProbaScorer]):
sklearn/metrics/scorer.py
Outdated
return scores | ||
|
||
def _use_cache(self): | ||
"""Return True if using a cache is desired.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Short description of "desired" please ;)
sklearn/metrics/scorer.py
Outdated
return self._score(estimator, X, y_true, sample_weight=sample_weight) | ||
|
||
def _method_cacher(self, estimator, method, *args, **kwargs): | ||
"""Call estimator directly.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Call estimator directly.""" | |
"""Call estimator's method directly, without caching.""" |
sklearn/metrics/scorer.py
Outdated
@@ -44,7 +45,54 @@ | |||
from ..base import is_regressor | |||
|
|||
|
|||
class _BaseScorer(metaclass=ABCMeta): | |||
class _MultimetricScorer(dict): | |||
"""Callable dictionary for multimetric scoring.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please briefly describe keys being strings and values being instances of _BaseScorer
I first thought (without looking) that this was also a _BaseScorer
instance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would have been nice, the passthrough scorer and custom scorers may have weird interfaces, so _MultimetricScorer.__call__
needed to be as generic as possible.
sklearn/metrics/scorer.py
Outdated
""" | ||
return self._score(estimator, X, y_true, sample_weight=sample_weight) | ||
|
||
def _method_cacher(self, estimator, method, *args, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused that this is called _method_cacher
. Makes me things that it overrides the MultimetricScorer
's _method_cacher
, but the logic is slightly different.
Call this _passthrough_method_cacher
?
fit_and_score_args = [None, None, None, two_params_scorer] | ||
|
||
scorer = _MultimetricScorer(score=two_params_scorer) | ||
fit_and_score_args = [None, None, None, scorer] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the original list [None, None, None, two_params_scorer]
still be tested?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is testing a private method _score
. On master, a multimetric scoring was represented with a dictionary, which _score
used to call the scorers independently. This PR moves this responsibility from _score
to _MultimetricScorer
. Now _score
only needs to call _MultimetricScorer.__call__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly nitpicks about comments but LGTM, thanks @thomasjpfan
The whole scoring logic is becoming quite convoluted by now... Might be worth some re-thinking one day.
sklearn/metrics/scorer.py
Outdated
|
||
`_MultimetricScorer` will return a dictionary of scores corresponding to | ||
the scorers in the dictionary. Note `_MultimetricScorer` can be created | ||
with a dictionary with one key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with a dictionary with one key. | |
with a dictionary with one key (i.e. only one actual scorer). |
sklearn/metrics/scorer.py
Outdated
return scores | ||
|
||
def _use_cache(self, estimator): | ||
"""Return True if using a cache it is beneficial. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Return True if using a cache it is beneficial. | |
"""Return True if using a cache is beneficial. |
sklearn/metrics/scorer.py
Outdated
- `decision_function` and `predict_proba` is called. | ||
|
||
""" | ||
if len(self) == 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if len(self) == 1: | |
if len(self) == 1: # Only one scorer |
sklearn/metrics/scorer.py
Outdated
score : float | ||
Score function applied to prediction of estimator on X. | ||
""" | ||
return self._score(partial(_method_caller, None), estimator, X, y_true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self._score(partial(_method_caller, None), estimator, X, y_true, | |
return self._score(partial(_method_caller, cache=None), estimator, X, y_true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since cache
is a positional argument, partial
needs to accept it as a positional argument.
sklearn/metrics/scorer.py
Outdated
"""Evaluate predicted target values for X relative to y_true. | ||
|
||
Parameters | ||
---------- | ||
method_caller: callable | ||
Call estimator with method and args and kwargs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Call estimator with method and args and kwargs. | |
Call estimator's method with args and kwargs, potentially caching results. |
Or anything else that indicates this is used for caching
if is_multimetric: | ||
return _multimetric_score(estimator, X_test, y_test, scorer) | ||
def _score(estimator, X_test, y_test, scorer): | ||
"""Compute the score(s) of an estimator on a given test set.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep the comment about what is returned.
IIUC a dict is returned iff scorer
is a dict?
|
||
scorer_dict, _ = _check_multimetric_scoring(LogisticRegression(), scorers) | ||
scorer = _MultimetricScorer(**scorer_dict) | ||
scores = scorer(mock_est, X, y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is possible but it'd be cool to assert that the cache only exists during __call__()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it is scoped in __call__
I do not think it is possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nitpicks but this looks great!
sklearn/metrics/scorer.py
Outdated
to `predict_proba`, `predict`, and `decision_function`. | ||
|
||
`_MultimetricScorer` will return a dictionary of scores corresponding to | ||
the scorers in the dictionary. Note `_MultimetricScorer` can be created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that?
sklearn/metrics/scorer.py
Outdated
- `_ThresholdScorer` and `_PredictScorer` are called and | ||
estimator is a regressor. | ||
- `_ThresholdScorer` and `_ProbaScorer` are called and | ||
estimator does not have `decision_function` an attribute. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
estimator does not have `decision_function` an attribute. | |
estimator does not have a `decision_function` attribute. |
scorer = _MultimetricScorer(**scorer_dict) | ||
scores = scorer(mock_est, X, y) | ||
|
||
assert set(scorers) == set(scores) # compare dict keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe add assert set(scorers) == set(scorer)
?
I find this hard to read btw because we have scorer
, scorers
and scores
which have very small levinshtein distance, and scorer_dict
, which is not very helpful, since the other three things are also dicts.
assert predict_proba_call_cnt == 1 | ||
|
||
|
||
def test_multimetric_scorer_calls_method_once_regressos_threshold(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def test_multimetric_scorer_calls_method_once_regressos_threshold(): | |
def test_multimetric_scorer_calls_method_once_regressor_threshold(): |
clf.fit(X, y) | ||
|
||
# regression metric that needs "threshold" which calls predict | ||
r2_threshold = make_scorer(r2_score, needs_threshold=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this would be nicer with an actual ranking metric, like auc
?
score1 = scorer(clf, X1, y1) | ||
score2 = scorer(clf, X2, y2) | ||
assert set(score1) == set(score2) # compare dict keys | ||
assert score1 != score2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this test? object identity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad test is bad. I redid this test to manually call scorers as suggested in your other comment.
scorer_dict, _ = _check_multimetric_scoring(clf, scorers) | ||
scorer = _MultimetricScorer(**scorer_dict) | ||
|
||
score1 = scorer(clf, X1, y1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we maybe manually call the scorers in the scorer_dict and see that the results are correct for each of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated test to do this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this design of a callable dict that generates a dict uncomfortable. That duplicate use of dicts makes the documentation confusing, apart from anything else.
Is it really justified to make _MultimetricScorer
a dict? What functionality of a dict is used? I understand that this may reduce the amount of code here, but I suspect it makes it a little more obfuscated.
The dict feature was needed when PR was updated such that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a whatsnew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM
|
||
|
||
def test_multimetric_scorer_sanity_check(): | ||
# scoring dictionary returned is the same as calling each scroer seperately |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# scoring dictionary returned is the same as calling each scroer seperately | |
# scoring dictionary returned is the same as calling each scorer seperately |
if not isinstance(score, numbers.Number): | ||
raise ValueError(error_msg % (score, type(score), name)) | ||
scores[name] = score | ||
else: # scaler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else: # scaler | |
else: # scalar |
|
||
error_msg = ("scoring must return a number, got %s (%s) " | ||
"instead. (scorer=%s)") | ||
if isinstance(scores, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can return a number or a dict. Can we make all cases return a dict, and delete some code paths? we could just use:
if not isinstance(scores, 'dict'):
scores = {'score': scores}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've looked into this and it might be better to consider this as a separate clean-up change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This type of change would reduce quite a few code paths. (It would most likely make it nicer to support custom callables that return dictionaries.
doc/whats_new/v0.22.rst
Outdated
@@ -257,6 +257,11 @@ Changelog | |||
- |Enhancement| Allow computing averaged metrics in the case of no true positives. | |||
:pr:`14595` by `Andreas Müller`_. | |||
|
|||
- |Enhancement| Improved performance of multimetric scoring in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can use |Efficiency|
?
|
||
error_msg = ("scoring must return a number, got %s (%s) " | ||
"instead. (scorer=%s)") | ||
if isinstance(scores, dict): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I've looked into this and it might be better to consider this as a separate clean-up change.
oh nice, this is even cleaner :) still lgtm from my side. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan please address minor typos so we can merge ;)
@thomasjpfan can you fix the merge conflicts again? |
Thank you @thomasjpfan!! I look forward to some of the things this enables along the lines of #12385 |
Awesome! |
Reference Issues/PRs
Fixes #10802
Alternative to #10979
What does this implement/fix? Explain your changes.
This PR creates a
_MultimetricScorer
that subclasses dict which is used to reduce the number of calls topredict
,predict_proba
, anddecision_function
.The public interface of objects and functions using
scoring
are unchanged.The cache is only used when it is beneficial to use, as defined in
_MultimetricScorer._use_cache
.Users can not create a
_MultimetricScorer
and pass it intoscoring
.Any other comments?
I do have plans to support custom callables that return dictionaries from the user. This was not included in this PR to narrow the scope of this PR to
_MultimetricScorer
.