[MRG] Adds _MultimetricScorer for Optimized Scoring #14593

thomasjpfan · 2019-08-07T20:35:07Z

Reference Issues/PRs

Fixes #10802
Alternative to #10979

What does this implement/fix? Explain your changes.

This PR creates a _MultimetricScorer that subclasses dict which is used to reduce the number of calls to predict, predict_proba, and decision_function.
The public interface of objects and functions using scoring are unchanged.
The cache is only used when it is beneficial to use, as defined in _MultimetricScorer._use_cache.
Users can not create a _MultimetricScorer and pass it into scoring.

Any other comments?

I do have plans to support custom callables that return dictionaries from the user. This was not included in this PR to narrow the scope of this PR to _MultimetricScorer.

…c_no_dict

NicolasHug

Looks good, I think this could use a few more comments to describe the logic.

I'm not a huge fan of using None as _method_cacher for it to revert to the default method cacher of _BaseScorer. (Maybe with my suggestions I'd find it clearer, IDK)

Maybe add a sanity check that makes sure that passing another X gives different results even when caching is involved.

NicolasHug · 2019-08-13T15:00:17Z

sklearn/metrics/scorer.py

        scorers = {"score": check_scoring(estimator, scoring=scoring)}
-        return scorers, False
+        return _MultimetricScorer(**scorers), False


Why return a MultiMetricScorer when there is only one scorer?

_check_multimetric_scoring always returned a multimetric scorer. (On master it returned a dictionary which was the data structure used to denote "mutlimetric scoring".

I disagree, a dict with only one key (as here) denotes a single metric scorer. That's the reason is_multimetric is False here.

Since no caching happens with a single-metric scorer, I think we should not change this part and still return the dict, to avoid the confusion.

Or else, MultiMetricScorer should have a whole different name. It doesn't make sense to return a MultiMetricScorer instance while is_multimetric is False

A user can pass a dictionary to scoring with one key and is_multimetric will be true.

I do plan on removing is_multimetric. And have "anything that returns a dictionary" as multimetric.

NicolasHug · 2019-08-13T15:01:42Z

sklearn/metrics/tests/test_score_objects.py

+                             'll2': 'neg_log_loss',
+                             'ra1': 'roc_auc',
+                             'ra2': 'roc_auc'
+                         }, 1, 1, 1), (['roc_auc', 'accuracy'], 1, 0, 1)],


for readability maybe separate both cases with a line break

NicolasHug · 2019-08-13T15:02:34Z

sklearn/metrics/tests/test_score_objects.py

@@ -543,3 +544,41 @@ def test_scoring_is_not_metric():
                         Ridge(), r2_score)
    assert_raises_regexp(ValueError, 'make_scorer', check_scoring,
                         KMeans(), cluster_module.adjusted_rand_score)
+
+
+@pytest.mark.parametrize("scorers,predicts,predict_probas,decision_funcs",


expected_predict_count, expected_predict_proba_count, ... ?

Long names, I know :/

NicolasHug · 2019-08-13T15:07:48Z

sklearn/metrics/tests/test_score_objects.py

+    scorer, _ = _check_multimetric_scoring(LogisticRegression(), scorers)
+    scores = scorer(mock_est, X, y)
+
+    assert set(scorers) == set(scores)


Just because I was slightly confused at first:

Suggested change

assert set(scorers) == set(scores)

assert set(scorers) == set(scores) # compare dict keys

NicolasHug · 2019-08-13T15:31:03Z

sklearn/metrics/scorer.py

+                return True
+
+        if counter[_ThresholdScorer] > 0 and (counter[_PredictScorer] or
+                                              counter[_ThresholdScorer]):


This is equivalent to

counter[_ThresholdScorer] and (counter[_PredictScorer]

(the or isn't useful)

This should have been:

if counter[_ThresholdScorer] and (counter[_PredictScorer] or counter[_ProbaScorer]):

NicolasHug · 2019-08-13T15:32:40Z

sklearn/metrics/scorer.py

+        return scores
+
+    def _use_cache(self):
+        """Return True if using a cache is desired."""


Short description of "desired" please ;)

NicolasHug · 2019-08-13T15:36:05Z

sklearn/metrics/scorer.py

+        return self._score(estimator, X, y_true, sample_weight=sample_weight)
+
+    def _method_cacher(self, estimator, method, *args, **kwargs):
+        """Call estimator directly."""


Suggested change

"""Call estimator directly."""

"""Call estimator's method directly, without caching."""

NicolasHug · 2019-08-13T15:39:42Z

sklearn/metrics/scorer.py

@@ -44,7 +45,54 @@
 from ..base import is_regressor


-class _BaseScorer(metaclass=ABCMeta):
+class _MultimetricScorer(dict):
+    """Callable dictionary for multimetric scoring."""


Please briefly describe keys being strings and values being instances of _BaseScorer

I first thought (without looking) that this was also a _BaseScorer instance

Would have been nice, the passthrough scorer and custom scorers may have weird interfaces, so _MultimetricScorer.__call__ needed to be as generic as possible.

NicolasHug · 2019-08-13T15:41:11Z

sklearn/metrics/scorer.py

+        """
+        return self._score(estimator, X, y_true, sample_weight=sample_weight)
+
+    def _method_cacher(self, estimator, method, *args, **kwargs):


I'm confused that this is called _method_cacher. Makes me things that it overrides the MultimetricScorer's _method_cacher, but the logic is slightly different.

Call this _passthrough_method_cacher?

NicolasHug · 2019-08-13T15:44:57Z

sklearn/model_selection/tests/test_validation.py

-    fit_and_score_args = [None, None, None, two_params_scorer]
+
+    scorer = _MultimetricScorer(score=two_params_scorer)
+    fit_and_score_args = [None, None, None, scorer]


Shouldn't the original list [None, None, None, two_params_scorer] still be tested?

This is testing a private method _score. On master, a multimetric scoring was represented with a dictionary, which _score used to call the scorers independently. This PR moves this responsibility from _score to _MultimetricScorer. Now _score only needs to call _MultimetricScorer.__call__.

…c_no_dict

NicolasHug

Mostly nitpicks about comments but LGTM, thanks @thomasjpfan

The whole scoring logic is becoming quite convoluted by now... Might be worth some re-thinking one day.

NicolasHug · 2019-08-16T15:10:01Z

sklearn/metrics/scorer.py

+
+    `_MultimetricScorer` will return a dictionary of scores corresponding to
+    the scorers in the dictionary. Note `_MultimetricScorer` can be created
+    with a dictionary with one key.


Suggested change

with a dictionary with one key.

with a dictionary with one key (i.e. only one actual scorer).

NicolasHug · 2019-08-16T15:10:47Z

sklearn/metrics/scorer.py

+        return scores
+
+    def _use_cache(self, estimator):
+        """Return True if using a cache it is beneficial.


Suggested change

"""Return True if using a cache it is beneficial.

"""Return True if using a cache is beneficial.

NicolasHug · 2019-08-16T15:12:54Z

sklearn/metrics/scorer.py

+          - `decision_function` and `predict_proba` is called.
+
+        """
+        if len(self) == 1:


Suggested change

if len(self) == 1:

if len(self) == 1: # Only one scorer

NicolasHug · 2019-08-16T15:13:20Z

sklearn/metrics/scorer.py

+        score : float
+            Score function applied to prediction of estimator on X.
+        """
+        return self._score(partial(_method_caller, None), estimator, X, y_true,


Suggested change

return self._score(partial(_method_caller, None), estimator, X, y_true,

return self._score(partial(_method_caller, cache=None), estimator, X, y_true,

Since cache is a positional argument, partial needs to accept it as a positional argument.

NicolasHug · 2019-08-16T15:14:45Z

sklearn/metrics/scorer.py

        """Evaluate predicted target values for X relative to y_true.

        Parameters
        ----------
+        method_caller: callable
+            Call estimator with method and args and kwargs.


Suggested change

Call estimator with method and args and kwargs.

Call estimator's method with args and kwargs, potentially caching results.

Or anything else that indicates this is used for caching

NicolasHug · 2019-08-16T15:24:50Z

sklearn/model_selection/_validation.py

-    if is_multimetric:
-        return _multimetric_score(estimator, X_test, y_test, scorer)
+def _score(estimator, X_test, y_test, scorer):
+    """Compute the score(s) of an estimator on a given test set."""


Let's keep the comment about what is returned.

IIUC a dict is returned iff scorer is a dict?

sklearn/model_selection/_validation.py

NicolasHug · 2019-08-16T15:31:00Z

sklearn/metrics/tests/test_score_objects.py

+
+    scorer_dict, _ = _check_multimetric_scoring(LogisticRegression(), scorers)
+    scorer = _MultimetricScorer(**scorer_dict)
+    scores = scorer(mock_est, X, y)


I don't think this is possible but it'd be cool to assert that the cache only exists during __call__().

Since it is scoped in __call__ I do not think it is possible.

…c_no_dict

amueller

minor nitpicks but this looks great!

amueller · 2019-08-22T17:20:52Z

sklearn/metrics/scorer.py

+    to `predict_proba`, `predict`, and `decision_function`.
+
+    `_MultimetricScorer` will return a dictionary of scores corresponding to
+    the scorers in the dictionary. Note `_MultimetricScorer` can be created


amueller · 2019-08-22T17:21:03Z

sklearn/metrics/scorer.py

+          - `_ThresholdScorer` and `_PredictScorer` are called and
+             estimator is a regressor.
+          - `_ThresholdScorer` and `_ProbaScorer` are called and
+             estimator does not have `decision_function` an attribute.


Suggested change

estimator does not have `decision_function` an attribute.

estimator does not have a `decision_function` attribute.

amueller · 2019-08-22T17:28:35Z

sklearn/metrics/tests/test_score_objects.py

+    scorer = _MultimetricScorer(**scorer_dict)
+    scores = scorer(mock_est, X, y)
+
+    assert set(scorers) == set(scores)  # compare dict keys


maybe add assert set(scorers) == set(scorer)?
I find this hard to read btw because we have scorer, scorers and scores which have very small levinshtein distance, and scorer_dict, which is not very helpful, since the other three things are also dicts.

amueller · 2019-08-22T17:31:32Z

sklearn/metrics/tests/test_score_objects.py

+    assert predict_proba_call_cnt == 1
+
+
+def test_multimetric_scorer_calls_method_once_regressos_threshold():


Suggested change

def test_multimetric_scorer_calls_method_once_regressos_threshold():

def test_multimetric_scorer_calls_method_once_regressor_threshold():

amueller · 2019-08-22T17:34:57Z

sklearn/metrics/tests/test_score_objects.py

+    clf.fit(X, y)
+
+    # regression metric that needs "threshold" which calls predict
+    r2_threshold = make_scorer(r2_score, needs_threshold=True)


I feel like this would be nicer with an actual ranking metric, like auc?

amueller · 2019-08-22T17:36:22Z

sklearn/metrics/tests/test_score_objects.py

+    score1 = scorer(clf, X1, y1)
+    score2 = scorer(clf, X2, y2)
+    assert set(score1) == set(score2)  # compare dict keys
+    assert score1 != score2


what does this test? object identity?

Bad test is bad. I redid this test to manually call scorers as suggested in your other comment.

amueller · 2019-08-22T17:36:41Z

sklearn/metrics/tests/test_score_objects.py

+    scorer_dict, _ = _check_multimetric_scoring(clf, scorers)
+    scorer = _MultimetricScorer(**scorer_dict)
+
+    score1 = scorer(clf, X1, y1)


can we maybe manually call the scorers in the scorer_dict and see that the results are correct for each of them?

Updated test to do this.

jnothman

I find this design of a callable dict that generates a dict uncomfortable. That duplicate use of dicts makes the documentation confusing, apart from anything else.

Is it really justified to make _MultimetricScorer a dict? What functionality of a dict is used? I understand that this may reduce the amount of code here, but I suspect it makes it a little more obfuscated.

thomasjpfan · 2019-08-25T00:35:42Z

The dict feature was needed when _check_multimetric_scoring returned a _MultimetricScorer. Since this was removed, it is not needed anymore.

PR was updated such that _MultimetricScorer is not a dict.

jnothman

Please add a whatsnew

jnothman

Otherwise LGTM

jnothman · 2019-08-25T10:57:37Z

sklearn/metrics/tests/test_score_objects.py

+
+
+def test_multimetric_scorer_sanity_check():
+    # scoring dictionary returned is the same as calling each scroer seperately


Suggested change

# scoring dictionary returned is the same as calling each scroer seperately

# scoring dictionary returned is the same as calling each scorer seperately

jnothman · 2019-08-25T11:06:31Z

sklearn/model_selection/_validation.py

+            if not isinstance(score, numbers.Number):
+                raise ValueError(error_msg % (score, type(score), name))
+            scores[name] = score
+    else:  # scaler


Suggested change

else: # scaler

else: # scalar

jnothman · 2019-08-25T11:09:50Z

sklearn/model_selection/_validation.py

+
+    error_msg = ("scoring must return a number, got %s (%s) "
+                 "instead. (scorer=%s)")
+    if isinstance(scores, dict):


This can return a number or a dict. Can we make all cases return a dict, and delete some code paths? we could just use:

if not isinstance(scores, 'dict'): scores = {'score': scores}

Okay, I've looked into this and it might be better to consider this as a separate clean-up change.

This type of change would reduce quite a few code paths. (It would most likely make it nicer to support custom callables that return dictionaries.

jnothman · 2019-08-25T11:14:41Z

doc/whats_new/v0.22.rst

@@ -257,6 +257,11 @@ Changelog
 - |Enhancement| Allow computing averaged metrics in the case of no true positives.
  :pr:`14595` by `Andreas Müller`_.

+- |Enhancement| Improved performance of multimetric scoring in


Can use |Efficiency|?

jnothman · 2019-08-26T02:45:15Z

sklearn/model_selection/_validation.py

+
+    error_msg = ("scoring must return a number, got %s (%s) "
+                 "instead. (scorer=%s)")
+    if isinstance(scores, dict):


Okay, I've looked into this and it might be better to consider this as a separate clean-up change.

amueller · 2019-08-26T15:12:06Z

oh nice, this is even cleaner :) still lgtm from my side.

amueller · 2019-08-26T16:49:24Z

fixes #10823, closes #9326

NicolasHug

@thomasjpfan please address minor typos so we can merge ;)

sklearn/metrics/scorer.py

…c_no_dict

amueller · 2019-09-10T20:26:06Z

@thomasjpfan can you fix the merge conflicts again?
@jnothman does this still look good? I'd love to mere this.

…c_no_dict

jnothman · 2019-09-10T22:08:46Z

Thank you @thomasjpfan!!

I look forward to some of the things this enables along the lines of #12385

amueller · 2019-09-11T19:46:56Z

Awesome!

thomasjpfan added 22 commits August 5, 2019 07:55

ENH Adds multimetric scorer

6c190c5

WIP

608ab48

ENH Sublcass dict

64941c1

STY Flake8

ddc6adb

Merge remote-tracking branch 'upstream/master' into scorer_multimetric

2af99c3

REV Less diffs

b5a2521

REV Less diffs

1ddd4c3

REV Less diffs

e62f27a

WIP

960b95b

WIP Failing

aee5032

WIP validation failing

792d0c4

CLN _fit_and_score now returns dicts

da384f2

CLN Calls scroers

d819b7a

REV Returns dicts again

74bb101

Merge remote-tracking branch 'upstream/master' into scorer_multimetric

c7d9781

STY flake8

b7269c6

Merge remote-tracking branch 'upstream/master' into scorer_multimetric

27e9bc8

CLN Less diffs

de4e953

Merge remote-tracking branch 'upstream/master' into scorer_multimetri…

748128d

…c_no_dict

ENH Uses dict to become a scorer

2d685bf

ENH Adds smarter caching

114e31d

DOC Adds dev docstrings

5bd576b

thomasjpfan mentioned this pull request Aug 7, 2019

[MRG] Adds _MultimetricScorer for Optimized Scoring #14484

Closed

Merge remote-tracking branch 'upstream/master' into scorer_multimetri…

e560368

…c_no_dict

NicolasHug reviewed Aug 13, 2019

View reviewed changes

thomasjpfan added 4 commits August 15, 2019 15:43

CLN Simple example

d600ab4

Merge remote-tracking branch 'upstream/master' into scorer_multimetri…

c8caa1a

…c_no_dict

REV Lowers diffs

754c07c

STY flake8

01ef1f4

NicolasHug approved these changes Aug 16, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into scorer_multimetri…

556ad06

…c_no_dict

amueller approved these changes Aug 22, 2019

View reviewed changes

CLN Addresses comments

982eac2

jnothman reviewed Aug 24, 2019

View reviewed changes

ENH Multimetric scoring is not a dict

bf0ccf4

STY Rename

d3aa642

jnothman reviewed Aug 25, 2019

View reviewed changes

DOC Adds whatsnew

12fac3b

jnothman reviewed Aug 25, 2019

View reviewed changes

DOC Add docs

3fd5710

jnothman approved these changes Aug 26, 2019

View reviewed changes

DOC Uses Efficiency

9c97e45

amueller added the High Priority High priority issues and pull requests label Aug 26, 2019

NicolasHug reviewed Aug 27, 2019

View reviewed changes

sklearn/metrics/scorer.py Show resolved Hide resolved

thomasjpfan added 2 commits August 27, 2019 12:13

DOC Typos

59bb17f

Merge remote-tracking branch 'upstream/master' into scorer_multimetri…

892716c

…c_no_dict

Merge remote-tracking branch 'upstream/master' into scorer_multimetri…

d157c04

…c_no_dict

jnothman merged commit fbb2c7c into scikit-learn:master Sep 10, 2019

This was referenced Sep 11, 2019

[MRG] Improve multi-metric scorer speed #10979

Closed

Multi-metric scoring with pipelines repeats transform for each metric prediction #10823

Closed

[MRG] Multimetric GridSearch - Memoize prediction results (and address some previous comments) #9326

Closed

jnothman mentioned this pull request Sep 19, 2019

Allow scoring callable to return a dict {name: score} #15021

Closed

thomasjpfan mentioned this pull request Sep 28, 2019

MAINT Calls scorer directly skorch-dev/skorch#541

Merged

	assert set(scorers) == set(scores)
	assert set(scorers) == set(scores) # compare dict keys

	"""Call estimator directly."""
	"""Call estimator's method directly, without caching."""

	with a dictionary with one key.
	with a dictionary with one key (i.e. only one actual scorer).

	"""Return True if using a cache it is beneficial.
	"""Return True if using a cache is beneficial.

	return self._score(partial(_method_caller, None), estimator, X, y_true,
	return self._score(partial(_method_caller, cache=None), estimator, X, y_true,

	Call estimator with method and args and kwargs.
	Call estimator's method with args and kwargs, potentially caching results.

	estimator does not have `decision_function` an attribute.
	estimator does not have a `decision_function` attribute.

		assert predict_proba_call_cnt == 1


		def test_multimetric_scorer_calls_method_once_regressos_threshold():



		def test_multimetric_scorer_sanity_check():
		# scoring dictionary returned is the same as calling each scroer seperately

Uh oh!

[MRG] Adds _MultimetricScorer for Optimized Scoring #14593

[MRG] Adds _MultimetricScorer for Optimized Scoring #14593

Uh oh!

Conversation

thomasjpfan commented Aug 7, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Aug 15, 2019 •

edited

Loading