diff --git a/doc/modules/classification_threshold.rst b/doc/modules/classification_threshold.rst index 8b3e6e3a68438..9adf846e75cba 100644 --- a/doc/modules/classification_threshold.rst +++ b/doc/modules/classification_threshold.rst @@ -97,7 +97,7 @@ a meaningful metric for their use case. the label of the class of interest (i.e. `pos_label`). Thus, if this label is not the right one for your application, you need to define a scorer and pass the right `pos_label` (and additional parameters) using the - :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring` to get + :func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring_callable` to get information to define your own scoring function. For instance, we show how to pass the information to the scorer that the label of interest is `0` when maximizing the :func:`~sklearn.metrics.f1_score`:: diff --git a/doc/modules/model_evaluation.rst b/doc/modules/model_evaluation.rst index 6434c6f99c7c7..dacdb19a0111c 100644 --- a/doc/modules/model_evaluation.rst +++ b/doc/modules/model_evaluation.rst @@ -148,13 +148,16 @@ predictions: * **Estimator score method**: Estimators have a ``score`` method providing a default evaluation criterion for the problem they are designed to solve. - This is not discussed on this page, but in each estimator's documentation. + Most commonly this is :ref:`accuracy ` for classifiers and the + :ref:`coefficient of determination ` (:math:`R^2`) for regressors. + Details for each estimator can be found in its documentation. -* **Scoring parameter**: Model-evaluation tools using +* **Scoring parameter**: Model-evaluation tools that use :ref:`cross-validation ` (such as - :func:`model_selection.cross_val_score` and - :class:`model_selection.GridSearchCV`) rely on an internal *scoring* strategy. - This is discussed in the section :ref:`scoring_parameter`. + :class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and + :class:`linear_model.LogisticRegressionCV`) rely on an internal *scoring* strategy. + This can be specified using the `scoring` parameter of that tool and is discussed + in the section :ref:`scoring_parameter`. * **Metric functions**: The :mod:`sklearn.metrics` module implements functions assessing prediction error for specific purposes. These metrics are detailed @@ -175,24 +178,39 @@ value of those metrics for random predictions. The ``scoring`` parameter: defining model evaluation rules ========================================================== -Model selection and evaluation using tools, such as -:class:`model_selection.GridSearchCV` and -:func:`model_selection.cross_val_score`, take a ``scoring`` parameter that +Model selection and evaluation tools that internally use +:ref:`cross-validation ` (such as +:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and +:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that controls what metric they apply to the estimators evaluated. -Common cases: predefined values -------------------------------- +They can be specified in several ways: + +* `None`: the estimator's default evaluation criterion (i.e., the metric used in the + estimator's `score` method) is used. +* :ref:`String name `: common metrics can be passed via a string + name. +* :ref:`Callable `: more complex metrics can be passed via a custom + metric callable (e.g., function). + +Some tools do also accept multiple metric evaluation. See :ref:`multimetric_scoring` +for details. + +.. _scoring_string_names: + +String name scorers +------------------- For the most common use cases, you can designate a scorer object with the -``scoring`` parameter; the table below shows all possible values. +``scoring`` parameter via a string name; the table below shows all possible values. All scorer objects follow the convention that **higher return values are better -than lower return values**. Thus metrics which measure the distance between +than lower return values**. Thus metrics which measure the distance between the model and the data, like :func:`metrics.mean_squared_error`, are -available as neg_mean_squared_error which return the negated value +available as 'neg_mean_squared_error' which return the negated value of the metric. ==================================== ============================================== ================================== -Scoring Function Comment +Scoring string name Function Comment ==================================== ============================================== ================================== **Classification** 'accuracy' :func:`metrics.accuracy_score` @@ -260,12 +278,23 @@ Usage examples: .. currentmodule:: sklearn.metrics -.. _scoring: +.. _scoring_callable: + +Callable scorers +---------------- + +For more complex use cases and more flexibility, you can pass a callable to +the `scoring` parameter. This can be done by: -Defining your scoring strategy from metric functions ------------------------------------------------------ +* :ref:`scoring_adapt_metric` +* :ref:`scoring_custom` (most flexible) -The following metrics functions are not implemented as named scorers, +.. _scoring_adapt_metric: + +Adapting predefined metrics via `make_scorer` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following metric functions are not implemented as named scorers, sometimes because they require additional parameters, such as :func:`fbeta_score`. They cannot be passed to the ``scoring`` parameters; instead their callable needs to be passed to @@ -303,15 +332,22 @@ measuring a prediction error given ground truth and prediction: maximize, the higher the better. - functions ending with ``_error``, ``_loss``, or ``_deviance`` return a - value to minimize, the lower the better. When converting + value to minimize, the lower the better. When converting into a scorer object using :func:`make_scorer`, set the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the parameter description below). +.. _scoring_custom: + +Creating a custom scorer object +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +You can create your own custom scorer object using +:func:`make_scorer` or for the most flexibility, from scratch. See below for details. -.. dropdown:: Custom scorer objects +.. dropdown:: Custom scorer objects using `make_scorer` - The second use case is to build a completely custom scorer object + You can build a completely custom scorer object from a simple python function using :func:`make_scorer`, which can take several parameters: @@ -319,21 +355,21 @@ measuring a prediction error given ground truth and prediction: in the example below) * whether the python function returns a score (``greater_is_better=True``, - the default) or a loss (``greater_is_better=False``). If a loss, the output + the default) or a loss (``greater_is_better=False``). If a loss, the output of the python function is negated by the scorer object, conforming to the cross validation convention that scorers return higher values for better models. * for classification metrics only: whether the python function you provided requires continuous decision certainties. If the scoring function only accepts probability - estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter - `response_method`, thus in this case `response_method="predict_proba"`. Some scoring - function do not necessarily require probability estimates but rather non-thresholded - decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a - list such as `response_method=["decision_function", "predict_proba"]`. In this case, - the scorer will use the first available method, in the order given in the list, + estimates (e.g. :func:`metrics.log_loss`), then one needs to set the parameter + `response_method="predict_proba"`. Some scoring + functions do not necessarily require probability estimates but rather non-thresholded + decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one can provide a + list (e.g., `response_method=["decision_function", "predict_proba"]`), + and scorer will use the first available method, in the order given in the list, to compute the scores. - * any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`. + * any additional parameters of the scoring function, such as ``beta`` or ``labels``. Here is an example of building custom scorers, and of using the ``greater_is_better`` parameter:: @@ -357,16 +393,10 @@ measuring a prediction error given ground truth and prediction: >>> score(clf, X, y) -0.69... -.. _diy_scoring: +.. dropdown:: Custom scorer objects from scratch -Implementing your own scoring object ------------------------------------- - -You can generate even more flexible model scorers by constructing your own -scoring object from scratch, without using the :func:`make_scorer` factory. - - -.. dropdown:: How to build a scorer from scratch + You can generate even more flexible model scorers by constructing your own + scoring object from scratch, without using the :func:`make_scorer` factory. For a callable to be a scorer, it needs to meet the protocol specified by the following two rules: @@ -389,24 +419,24 @@ scoring object from scratch, without using the :func:`make_scorer` factory. more details. - .. note:: **Using custom scorers in functions where n_jobs > 1** +.. dropdown:: Using custom scorers in functions where n_jobs > 1 - While defining the custom scoring function alongside the calling function - should work out of the box with the default joblib backend (loky), - importing it from another module will be a more robust approach and work - independently of the joblib backend. + While defining the custom scoring function alongside the calling function + should work out of the box with the default joblib backend (loky), + importing it from another module will be a more robust approach and work + independently of the joblib backend. - For example, to use ``n_jobs`` greater than 1 in the example below, - ``custom_scoring_function`` function is saved in a user-created module - (``custom_scorer_module.py``) and imported:: + For example, to use ``n_jobs`` greater than 1 in the example below, + ``custom_scoring_function`` function is saved in a user-created module + (``custom_scorer_module.py``) and imported:: - >>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP - >>> cross_val_score(model, - ... X_train, - ... y_train, - ... scoring=make_scorer(custom_scoring_function, greater_is_better=False), - ... cv=5, - ... n_jobs=-1) # doctest: +SKIP + >>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP + >>> cross_val_score(model, + ... X_train, + ... y_train, + ... scoring=make_scorer(custom_scoring_function, greater_is_better=False), + ... cv=5, + ... n_jobs=-1) # doctest: +SKIP .. _multimetric_scoring: @@ -3066,15 +3096,14 @@ display. .. _clustering_metrics: Clustering metrics -====================== +================== .. currentmodule:: sklearn.metrics The :mod:`sklearn.metrics` module implements several loss, score, and utility -functions. For more information see the :ref:`clustering_evaluation` -section for instance clustering, and :ref:`biclustering_evaluation` for -biclustering. - +functions to measure clustering performance. For more information see the +:ref:`clustering_evaluation` section for instance clustering, and +:ref:`biclustering_evaluation` for biclustering. .. _dummy_estimators: diff --git a/sklearn/feature_selection/_sequential.py b/sklearn/feature_selection/_sequential.py index ac5f13fd00e7d..bd1e27efef60b 100644 --- a/sklearn/feature_selection/_sequential.py +++ b/sklearn/feature_selection/_sequential.py @@ -78,7 +78,7 @@ class SequentialFeatureSelector(SelectorMixin, MetaEstimatorMixin, BaseEstimator scoring : str or callable, default=None A single str (see :ref:`scoring_parameter`) or a callable - (see :ref:`scoring`) to evaluate the predictions on the test set. + (see :ref:`scoring_callable`) to evaluate the predictions on the test set. NOTE that when using a custom scorer, it should return a single value. diff --git a/sklearn/inspection/_permutation_importance.py b/sklearn/inspection/_permutation_importance.py index fb3c646a271a6..74000aa9e8556 100644 --- a/sklearn/inspection/_permutation_importance.py +++ b/sklearn/inspection/_permutation_importance.py @@ -177,7 +177,7 @@ def permutation_importance( If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: diff --git a/sklearn/metrics/_scorer.py b/sklearn/metrics/_scorer.py index bc8c3a09a320c..fb173cd096a43 100644 --- a/sklearn/metrics/_scorer.py +++ b/sklearn/metrics/_scorer.py @@ -640,7 +640,7 @@ def make_scorer( The parameter `response_method` allows to specify which method of the estimator should be used to feed the scoring/loss function. - Read more in the :ref:`User Guide `. + Read more in the :ref:`User Guide `. Parameters ---------- @@ -933,7 +933,7 @@ def check_scoring(estimator=None, scoring=None, *, allow_none=False, raise_exc=T Scorer to use. If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: diff --git a/sklearn/model_selection/_plot.py b/sklearn/model_selection/_plot.py index b16e0f4c1019a..8cae3dc97d2c5 100644 --- a/sklearn/model_selection/_plot.py +++ b/sklearn/model_selection/_plot.py @@ -369,7 +369,7 @@ def from_estimator( scoring : str or callable, default=None A string (see :ref:`scoring_parameter`) or a scorer callable object / function with signature - `scorer(estimator, X, y)` (see :ref:`scoring`). + `scorer(estimator, X, y)` (see :ref:`scoring_callable`). exploit_incremental_learning : bool, default=False If the estimator supports incremental learning, this will be @@ -752,7 +752,7 @@ def from_estimator( scoring : str or callable, default=None A string (see :ref:`scoring_parameter`) or a scorer callable object / function with signature - `scorer(estimator, X, y)` (see :ref:`scoring`). + `scorer(estimator, X, y)` (see :ref:`scoring_callable`). n_jobs : int, default=None Number of jobs to run in parallel. Training the estimator and diff --git a/sklearn/model_selection/_search.py b/sklearn/model_selection/_search.py index d37ece5df7249..39161e51bacc5 100644 --- a/sklearn/model_selection/_search.py +++ b/sklearn/model_selection/_search.py @@ -1247,7 +1247,7 @@ class GridSearchCV(BaseSearchCV): If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: @@ -1623,7 +1623,7 @@ class RandomizedSearchCV(BaseSearchCV): If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: diff --git a/sklearn/model_selection/_search_successive_halving.py b/sklearn/model_selection/_search_successive_halving.py index 5ff5f1198121a..55073df14bfc1 100644 --- a/sklearn/model_selection/_search_successive_halving.py +++ b/sklearn/model_selection/_search_successive_halving.py @@ -480,7 +480,7 @@ class HalvingGridSearchCV(BaseSuccessiveHalving): scoring : str, callable, or None, default=None A single string (see :ref:`scoring_parameter`) or a callable - (see :ref:`scoring`) to evaluate the predictions on the test set. + (see :ref:`scoring_callable`) to evaluate the predictions on the test set. If None, the estimator's score method is used. refit : bool, default=True @@ -821,7 +821,7 @@ class HalvingRandomSearchCV(BaseSuccessiveHalving): scoring : str, callable, or None, default=None A single string (see :ref:`scoring_parameter`) or a callable - (see :ref:`scoring`) to evaluate the predictions on the test set. + (see :ref:`scoring_callable`) to evaluate the predictions on the test set. If None, the estimator's score method is used. refit : bool, default=True diff --git a/sklearn/model_selection/_validation.py b/sklearn/model_selection/_validation.py index dddc0cce795af..7d38182911fb8 100644 --- a/sklearn/model_selection/_validation.py +++ b/sklearn/model_selection/_validation.py @@ -170,12 +170,13 @@ def cross_validate( scoring : str, callable, list, tuple, or dict, default=None Strategy to evaluate the performance of the cross-validated model on the test set. If `None`, the - :ref:`default evaluation criterion ` of the estimator is used. + :ref:`default evaluation criterion ` of the estimator + is used. If `scoring` represents a single score, one can use: - a single string (see :ref:`scoring_parameter`); - - a callable (see :ref:`scoring`) that returns a single value. + - a callable (see :ref:`scoring_callable`) that returns a single value. If `scoring` represents multiple scores, one can use: @@ -1562,7 +1563,7 @@ def permutation_test_score( scoring : str or callable, default=None A single str (see :ref:`scoring_parameter`) or a callable - (see :ref:`scoring`) to evaluate the predictions on the test set. + (see :ref:`scoring_callable`) to evaluate the predictions on the test set. If `None` the estimator's score method is used.