Skip to content

WIP new, simpler scorer API #2123

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -671,9 +671,9 @@ Model Selection Interface
-------------------------
.. autosummary::
:toctree: generated/
:template: class_with_call.rst
:template: function.rst

metrics.Scorer
metrics.make_scorer

Classification metrics
----------------------
Expand Down
28 changes: 12 additions & 16 deletions doc/modules/model_evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -943,16 +943,16 @@ Creating scoring objects from score functions
If you want to use a scoring function that takes additional parameters, such as
:func:`fbeta_score`, you need to generate an appropriate scoring object. The
simplest way to generate a callable object for scoring is by using
:class:`Scorer`.
:class:`Scorer` converts score functions as above into callables that can be
:func:`make_scorer`.
That function converts score functions as above into callables that can be
used for model evaluation.

One typical use case is to wrap an existing scoring function from the library
with non default value for its parameters such as the beta parameter for the
:func:`fbeta_score` function::

>>> from sklearn.metrics import fbeta_score, Scorer
>>> ftwo_scorer = Scorer(fbeta_score, beta=2)
>>> from sklearn.metrics import fbeta_score, make_scorer
>>> ftwo_scorer = make_scorer(fbeta_score, beta=2)
>>> from sklearn.grid_search import GridSearchCV
>>> from sklearn.svm import LinearSVC
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=ftwo_scorer)
Expand All @@ -964,10 +964,10 @@ from a simple python function::
... diff = np.abs(ground_truth - predictions).max()
... return np.log(1 + diff)
...
>>> my_custom_scorer = Scorer(my_custom_loss_func, greater_is_better=False)
>>> my_custom_scorer = make_scorer(my_custom_loss_func, greater_is_better=False)
>>> grid = GridSearchCV(LinearSVC(), param_grid={'C': [1, 10]}, scoring=my_custom_scorer)

:class:`Scorer` takes as parameters the function you want to use, whether it is
:func:`make_scorer` takes as parameters the function you want to use, whether it is
a score (``greater_is_better=True``) or a loss (``greater_is_better=False``),
whether the function you provided takes predictions as input
(``needs_threshold=False``) or needs confidence scores
Expand All @@ -978,22 +978,18 @@ the previous example.
Implementing your own scoring object
------------------------------------
You can generate even more flexible model scores by constructing your own
scoring object from scratch, without using the :class:`Scorer` helper class.
The requirements that a callable can be used for model selection are as
follows:
scoring object from scratch, without using the :func:`make_scorer` factory.
For a callable to be a scorer, it needs to meet the protocol specified by
the following two rules:

- It can be called with parameters ``(estimator, X, y)``, where ``estimator``
it the model that should be evaluated, ``X`` is validation data and ``y`` is
the ground truth target for ``X`` (in the supervised case) or ``None`` in the
unsupervised case.

- The call returns a number indicating the quality of estimator.

- The callable has a boolean attribute ``greater_is_better`` which indicates whether
high or low values correspond to a better estimator.

Objects that meet those conditions as said to implement the sklearn Scorer
protocol.
- It returns either a floating point number (the score), or a tuple, the
first element of which is a float. The additional values are used by the
``report`` method on ``GridSearchCV`` and ``RandomizedSearchCV``.


.. _dummy_estimators:
Expand Down
23 changes: 14 additions & 9 deletions examples/grid_search_text_feature_extraction.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,18 +29,16 @@
'vect__max_features': (None, 5000, 10000, 50000)}
done in 1737.030s

Best score: 0.940
Best score: 0.923
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did that change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the scoring from accuracy (the default) to F1 score to demo and test the structured return values from f_scorer and F1 score ≤ accuracy. This is also why the best parameter set changed.

Best parameters set:
clf__alpha: 9.9999999999999995e-07
clf__n_iter: 50
clf__penalty: 'elasticnet'
tfidf__use_idf: True
vect__max_n: 2
vect__max_df: 0.75
vect__max_features: 50000
clf__alpha: 1e-06
clf__penalty: 'l2'
vect__max_df: 1.0
vect__ngram_range: (1, 2)

"""


# Author: Olivier Grisel <olivier.grisel@ensta.org>
# Peter Prettenhofer <peter.prettenhofer@gmail.com>
# Mathieu Blondel <mathieu@mblondel.org>
Expand All @@ -49,6 +47,7 @@
from __future__ import print_function

from pprint import pprint
import sys
from time import time
import logging

Expand Down Expand Up @@ -111,7 +110,8 @@

# find the best parameters for both the feature extraction and the
# classifier
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1)
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1,
scoring="f1")

print("Performing grid search...")
print("pipeline:", [name for name, _ in pipeline.steps])
Expand All @@ -127,3 +127,8 @@
best_parameters = grid_search.best_estimator_.get_params()
for param_name in sorted(parameters.keys()):
print("\t%s: %r" % (param_name, best_parameters[param_name]))

# Uncomment the following line to get a detailed (and long!) report
# about the cross-validation results, including precision and recall
# per fold for all settings.
#grid_search.report(sys.stdout)
6 changes: 3 additions & 3 deletions sklearn/cross_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
from .utils.fixes import unique
from .externals.joblib import Parallel, delayed
from .externals.six import string_types, with_metaclass
from .metrics import SCORERS, Scorer
from .metrics import make_scorer, SCORERS

__all__ = ['Bootstrap',
'KFold',
Expand Down Expand Up @@ -1136,7 +1136,7 @@ def cross_val_score(estimator, X, y=None, scoring=None, cv=None, n_jobs=1,
warnings.warn("Passing function as ``score_func`` is "
"deprecated and will be removed in 0.15. "
"Either use strings or score objects.", stacklevel=2)
scorer = Scorer(score_func)
scorer = make_scorer(score_func)
elif isinstance(scoring, string_types):
scorer = SCORERS[scoring]
else:
Expand Down Expand Up @@ -1299,7 +1299,7 @@ def permutation_test_score(estimator, X, y, scoring=None, cv=None,
warnings.warn("Passing function as ``score_func`` is "
"deprecated and will be removed in 0.15. "
"Either use strings or score objects.")
scorer = Scorer(score_func)
scorer = make_scorer(score_func)
elif isinstance(scoring, string_types):
scorer = SCORERS[scoring]
else:
Expand Down
70 changes: 53 additions & 17 deletions sklearn/grid_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
# License: BSD 3 clause

from abc import ABCMeta, abstractmethod
from collections import Mapping, namedtuple, Sized
from collections import Mapping, namedtuple, Sequence, Sized
from functools import partial, reduce
from itertools import product
import numbers
Expand All @@ -28,7 +28,7 @@
from .externals import six
from .utils import safe_mask, check_random_state
from .utils.validation import _num_samples, check_arrays
from .metrics import SCORERS, Scorer
from .metrics import make_scorer, SCORERS


__all__ = ['GridSearchCV', 'ParameterGrid', 'fit_grid_point',
Expand Down Expand Up @@ -316,8 +316,10 @@ def fit_grid_point(X, y, base_estimator, parameters, train, test, scorer,
else:
this_score = clf.score(X_test)

if not isinstance(this_score, numbers.Number):
raise ValueError("scoring must return a number, got %s (%s)"
if not isinstance(this_score, numbers.Number) \
and not (isinstance(this_score, Sequence)
and isinstance(this_score[0], numbers.Number)):
raise ValueError("scoring must return a number or tuple, got %s (%s)"
" instead." % (str(this_score), type(this_score)))

if verbose > 2:
Expand Down Expand Up @@ -364,10 +366,17 @@ class _CVScoreTuple (namedtuple('_CVScoreTuple',

def __repr__(self):
"""Simple custom repr to summarize the main info"""
std = np.std([sc if isinstance(sc, numbers.Number) else sc[0]
for sc in self.cv_validation_scores])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two remarks independent from this PR but that I think should be addressed now (i.e. before merge):

  • cv_validation_scores should have an underscore after
  • The repr should not be modified from the default. The str should (guidelines for repr is that it is the information required to recreate the object, see numpy arrays for instance)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state of affairs in master is that repr is overloaded. Btw., the user is not supposed to recreate objects of this class.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current state of affairs in master is that repr is overloaded.

I know, and I think that it is wrong.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cv_validation_scores should have an underscore after

I don't think so. It's not an attrib of an estimator, but an attrib of an object returned by an underscored attrib of an estimator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. It's not an attrib of an estimator, but an attrib of an
object returned by an underscored attrib of an estimator.

Fair enough. But I still think that it would be good (not mandatory,
though).


return "mean: {0:.5f}, std: {1:.5f}, params: {2}".format(
self.mean_validation_score,
np.std(self.cv_validation_scores),
self.parameters)
self.mean_validation_score, std, self.parameters)

def __str__(self):
"""More extensive reporting than from repr."""
per_fold = ("\n fold {0}: {1}".format(i, sc)
for i, sc in enumerate(self.cv_validation_scores))
return repr(self) + "".join(per_fold)


class BaseSearchCV(six.with_metaclass(ABCMeta, BaseEstimator,
Expand All @@ -392,6 +401,33 @@ def __init__(self, estimator, scoring=None, loss_func=None,
self.pre_dispatch = pre_dispatch
self._check_estimator()

def report(self, file=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by the format of this. Do we really need a report function that's little different from pprint(search.cv_scores_)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or, indeed which is identical to print(*search.cv_scores_, file=file, sep='\n')?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be much more useful to output something like a CSV, but that requires interpreting the data more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a proof of concept. I wanted to make clear in some way that just print(cv_scores_) doesn't give all the information. If you know a better solution (e.g. document the pprint trick?), I'm up for suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by the format of this. Do we really need a report function
that's little different from pprint(search.cv_scores_)?

In the long run, we might want such features, but in the short run, I'd
rather avoid.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. document the pprint trick?)

I think that teaching people to use pprint is a good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think pprint is wonderful either. Afaik it only knows about the basic standard collections (list, tuple, dict) and reprs everything else, including namedtuples, defaultdicts, arrays, etc, on the basis that its output should be evalable (except that most repr implementations don't support that).

"""Generate a report of the scores achieved.

Reports on the scores achieved across the folds for the various
parameter settings tried. This also prints the additional information
reported by some scorers, such as "f1", which tracks precision and
recall as well.

Parameters
----------
file : file-like, optional
File to which the report is written. If None or not given, the
report is returned as a string.
"""
if not hasattr(self, "cv_scores_"):
raise AttributeError("no cv_scores_ found; run fit first")

return_string = (file is None)
if return_string:
file = six.StringIO()

for cvs in self.cv_scores_:
print(cvs, file=file)

if return_string:
return file.getvalue()

def score(self, X, y=None):
"""Returns the score on the given test data and labels, if the search
estimator has been refit. The ``score`` function of the best estimator
Expand Down Expand Up @@ -465,13 +501,13 @@ def _fit(self, X, y, parameter_iterable):
"deprecated and will be removed in 0.15. "
"Either use strings or score objects."
"The relevant new parameter is called ''scoring''. ")
scorer = Scorer(self.loss_func, greater_is_better=False)
scorer = make_scorer(self.loss_func, greater_is_better=False)
elif self.score_func is not None:
warnings.warn("Passing function as ``score_func`` is "
"deprecated and will be removed in 0.15. "
"Either use strings or score objects."
"The relevant new parameter is called ''scoring''.")
scorer = Scorer(self.score_func)
scorer = make_scorer(self.score_func)
elif isinstance(self.scoring, six.string_types):
scorer = SCORERS[self.scoring]
else:
Expand Down Expand Up @@ -507,7 +543,7 @@ def _fit(self, X, y, parameter_iterable):
for parameters in parameter_iterable
for train, test in cv)

# Out is a list of triplet: score, estimator, n_test_samples
# Out is a list of triples: score, estimator, n_test_samples
n_fits = len(out)
n_folds = len(cv)

Expand All @@ -519,7 +555,11 @@ def _fit(self, X, y, parameter_iterable):
all_scores = []
for this_score, parameters, this_n_test_samples in \
out[grid_start:grid_start + n_folds]:
all_scores.append(this_score)
full_info = this_score
if isinstance(this_score, Sequence):
# Structured score.
this_score = this_score[0]
all_scores.append(full_info)
if self.iid:
this_score *= this_n_test_samples
n_test_samples += this_n_test_samples
Expand All @@ -530,18 +570,14 @@ def _fit(self, X, y, parameter_iterable):
score /= float(n_folds)
scores.append((score, parameters))
# TODO: shall we also store the test_fold_sizes?
cv_scores.append(_CVScoreTuple(
parameters,
score,
np.array(all_scores)))
cv_scores.append(_CVScoreTuple(parameters, score, all_scores))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... Sticking all the scores in one field has its advantages, but it's not clear how we fit training scores or times in here without changing the length of the namedtuple (breaking forwards compatibility), or without somehow modifying and restructuring the namedtuple returned by the scorer. I still think _CVScoreTuple has to go. But that may not be within the scope of this PR (but is one reason I think the multiple metrics thing isn't within the scope of this PR, either).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on both accounts.

# Store the computed scores
self.cv_scores_ = cv_scores

# Find the best parameters by comparing on the mean validation score:
# note that `sorted` is deterministic in the way it breaks ties
greater_is_better = getattr(self.scorer_, 'greater_is_better', True)
best = sorted(cv_scores, key=lambda x: x.mean_validation_score,
reverse=greater_is_better)[0]
reverse=True)[0]
self.best_params_ = best.parameters
self.best_score_ = best.mean_validation_score

Expand Down
4 changes: 2 additions & 2 deletions sklearn/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@
from .metrics import zero_one
from .metrics import zero_one_score

from .scorer import Scorer, SCORERS
from .scorer import make_scorer, SCORERS

from . import cluster
from .cluster import (adjusted_rand_score,
Expand Down Expand Up @@ -85,5 +85,5 @@
'silhouette_samples',
'v_measure_score',
'zero_one_loss',
'Scorer',
'make_scorer',
'SCORERS']
Loading