Skip to content

DOC Improve user guide on scoring parameter #30316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/modules/classification_threshold.rst
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ a meaningful metric for their use case.
the label of the class of interest (i.e. `pos_label`). Thus, if this label is not
the right one for your application, you need to define a scorer and pass the right
`pos_label` (and additional parameters) using the
:func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring` to get
:func:`~sklearn.metrics.make_scorer`. Refer to :ref:`scoring_callable` to get
information to define your own scoring function. For instance, we show how to pass
the information to the scorer that the label of interest is `0` when maximizing the
:func:`~sklearn.metrics.f1_score`::
Expand Down
145 changes: 87 additions & 58 deletions doc/modules/model_evaluation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -148,13 +148,16 @@ predictions:

* **Estimator score method**: Estimators have a ``score`` method providing a
default evaluation criterion for the problem they are designed to solve.
This is not discussed on this page, but in each estimator's documentation.
Most commonly this is :ref:`accuracy <accuracy_score>` for classifiers and the
:ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
Details for each estimator can be found in its documentation.

* **Scoring parameter**: Model-evaluation tools using
* **Scoring parameter**: Model-evaluation tools that use
:ref:`cross-validation <cross_validation>` (such as
:func:`model_selection.cross_val_score` and
:class:`model_selection.GridSearchCV`) rely on an internal *scoring* strategy.
This is discussed in the section :ref:`scoring_parameter`.
:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
:class:`linear_model.LogisticRegressionCV`) rely on an internal *scoring* strategy.
This can be specified using the `scoring` parameter of that tool and is discussed
in the section :ref:`scoring_parameter`.

* **Metric functions**: The :mod:`sklearn.metrics` module implements functions
assessing prediction error for specific purposes. These metrics are detailed
Expand All @@ -175,24 +178,39 @@ value of those metrics for random predictions.
The ``scoring`` parameter: defining model evaluation rules
==========================================================

Model selection and evaluation using tools, such as
:class:`model_selection.GridSearchCV` and
:func:`model_selection.cross_val_score`, take a ``scoring`` parameter that
Model selection and evaluation tools that internally use
:ref:`cross-validation <cross_validation>` (such as
:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that
Comment on lines +183 to +184
Copy link
Member Author

@lucyleeow lucyleeow Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to give example outside of the model_selection module.

controls what metric they apply to the estimators evaluated.

Common cases: predefined values
-------------------------------
They can be specified in several ways:

* `None`: the estimator's default evaluation criterion (i.e., the metric used in the
estimator's `score` method) is used.
* :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
name.
* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a custom
metric callable (e.g., function).

Some tools do also accept multiple metric evaluation. See :ref:`multimetric_scoring`
for details.

.. _scoring_string_names:

String name scorers
-------------------

For the most common use cases, you can designate a scorer object with the
``scoring`` parameter; the table below shows all possible values.
``scoring`` parameter via a string name; the table below shows all possible values.
All scorer objects follow the convention that **higher return values are better
than lower return values**. Thus metrics which measure the distance between
than lower return values**. Thus metrics which measure the distance between
the model and the data, like :func:`metrics.mean_squared_error`, are
available as neg_mean_squared_error which return the negated value
available as 'neg_mean_squared_error' which return the negated value
of the metric.

==================================== ============================================== ==================================
Scoring Function Comment
Scoring string name Function Comment
==================================== ============================================== ==================================
**Classification**
'accuracy' :func:`metrics.accuracy_score`
Expand Down Expand Up @@ -260,12 +278,23 @@ Usage examples:

.. currentmodule:: sklearn.metrics

.. _scoring:
.. _scoring_callable:

Callable scorers
----------------

For more complex use cases and more flexibility, you can pass a callable to
the `scoring` parameter. This can be done by:

Defining your scoring strategy from metric functions
-----------------------------------------------------
* :ref:`scoring_adapt_metric`
* :ref:`scoring_custom` (most flexible)

The following metrics functions are not implemented as named scorers,
.. _scoring_adapt_metric:

Adapting predefined metrics via `make_scorer`
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The following metric functions are not implemented as named scorers,
sometimes because they require additional parameters, such as
:func:`fbeta_score`. They cannot be passed to the ``scoring``
parameters; instead their callable needs to be passed to
Expand Down Expand Up @@ -303,37 +332,44 @@ measuring a prediction error given ground truth and prediction:
maximize, the higher the better.

- functions ending with ``_error``, ``_loss``, or ``_deviance`` return a
value to minimize, the lower the better. When converting
value to minimize, the lower the better. When converting
into a scorer object using :func:`make_scorer`, set
the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
parameter description below).

.. _scoring_custom:

Creating a custom scorer object
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can create your own custom scorer object using
:func:`make_scorer` or for the most flexibility, from scratch. See below for details.

.. dropdown:: Custom scorer objects
.. dropdown:: Custom scorer objects using `make_scorer`

The second use case is to build a completely custom scorer object
You can build a completely custom scorer object
from a simple python function using :func:`make_scorer`, which can
take several parameters:

* the python function you want to use (``my_custom_loss_func``
in the example below)

* whether the python function returns a score (``greater_is_better=True``,
the default) or a loss (``greater_is_better=False``). If a loss, the output
the default) or a loss (``greater_is_better=False``). If a loss, the output
of the python function is negated by the scorer object, conforming to
the cross validation convention that scorers return higher values for better models.

* for classification metrics only: whether the python function you provided requires
continuous decision certainties. If the scoring function only accepts probability
estimates (e.g. :func:`metrics.log_loss`) then one needs to set the parameter
`response_method`, thus in this case `response_method="predict_proba"`. Some scoring
function do not necessarily require probability estimates but rather non-thresholded
decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one provides a
list such as `response_method=["decision_function", "predict_proba"]`. In this case,
the scorer will use the first available method, in the order given in the list,
estimates (e.g. :func:`metrics.log_loss`), then one needs to set the parameter
`response_method="predict_proba"`. Some scoring
functions do not necessarily require probability estimates but rather non-thresholded
decision values (e.g. :func:`metrics.roc_auc_score`). In this case, one can provide a
list (e.g., `response_method=["decision_function", "predict_proba"]`),
and scorer will use the first available method, in the order given in the list,
to compute the scores.

* any additional parameters, such as ``beta`` or ``labels`` in :func:`f1_score`.
* any additional parameters of the scoring function, such as ``beta`` or ``labels``.

Here is an example of building custom scorers, and of using the
``greater_is_better`` parameter::
Expand All @@ -357,16 +393,10 @@ measuring a prediction error given ground truth and prediction:
>>> score(clf, X, y)
-0.69...

.. _diy_scoring:
.. dropdown:: Custom scorer objects from scratch

Implementing your own scoring object
------------------------------------

You can generate even more flexible model scorers by constructing your own
scoring object from scratch, without using the :func:`make_scorer` factory.


.. dropdown:: How to build a scorer from scratch
You can generate even more flexible model scorers by constructing your own
scoring object from scratch, without using the :func:`make_scorer` factory.

For a callable to be a scorer, it needs to meet the protocol specified by
the following two rules:
Expand All @@ -389,24 +419,24 @@ scoring object from scratch, without using the :func:`make_scorer` factory.
more details.


.. note:: **Using custom scorers in functions where n_jobs > 1**
.. dropdown:: Using custom scorers in functions where n_jobs > 1

While defining the custom scoring function alongside the calling function
should work out of the box with the default joblib backend (loky),
importing it from another module will be a more robust approach and work
independently of the joblib backend.
While defining the custom scoring function alongside the calling function
should work out of the box with the default joblib backend (loky),
importing it from another module will be a more robust approach and work
independently of the joblib backend.

For example, to use ``n_jobs`` greater than 1 in the example below,
``custom_scoring_function`` function is saved in a user-created module
(``custom_scorer_module.py``) and imported::
For example, to use ``n_jobs`` greater than 1 in the example below,
``custom_scoring_function`` function is saved in a user-created module
(``custom_scorer_module.py``) and imported::

>>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP
>>> cross_val_score(model,
... X_train,
... y_train,
... scoring=make_scorer(custom_scoring_function, greater_is_better=False),
... cv=5,
... n_jobs=-1) # doctest: +SKIP
>>> from custom_scorer_module import custom_scoring_function # doctest: +SKIP
>>> cross_val_score(model,
... X_train,
... y_train,
... scoring=make_scorer(custom_scoring_function, greater_is_better=False),
... cv=5,
... n_jobs=-1) # doctest: +SKIP

.. _multimetric_scoring:

Expand Down Expand Up @@ -3066,15 +3096,14 @@ display.
.. _clustering_metrics:

Clustering metrics
======================
==================

.. currentmodule:: sklearn.metrics

The :mod:`sklearn.metrics` module implements several loss, score, and utility
functions. For more information see the :ref:`clustering_evaluation`
section for instance clustering, and :ref:`biclustering_evaluation` for
biclustering.

functions to measure clustering performance. For more information see the
:ref:`clustering_evaluation` section for instance clustering, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:ref:`clustering_evaluation` section for instance clustering, and
:ref:`clustering_evaluation` section for clustering, and

Copy link
Member Author

@lucyleeow lucyleeow Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a typo, I think "instance clustering" is referring clustering of a single row, to differentiate it from biclustering, though I understand its a technical term and can read confusing (I think we can keep as is?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I wasn't aware it was a technical term before.

:ref:`biclustering_evaluation` for biclustering.

.. _dummy_estimators:

Expand Down
2 changes: 1 addition & 1 deletion sklearn/feature_selection/_sequential.py
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ class SequentialFeatureSelector(SelectorMixin, MetaEstimatorMixin, BaseEstimator

scoring : str or callable, default=None
A single str (see :ref:`scoring_parameter`) or a callable
(see :ref:`scoring`) to evaluate the predictions on the test set.
(see :ref:`scoring_callable`) to evaluate the predictions on the test set.

NOTE that when using a custom scorer, it should return a single
value.
Expand Down
2 changes: 1 addition & 1 deletion sklearn/inspection/_permutation_importance.py
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ def permutation_importance(
If `scoring` represents a single score, one can use:

- a single string (see :ref:`scoring_parameter`);
- a callable (see :ref:`scoring`) that returns a single value.
- a callable (see :ref:`scoring_callable`) that returns a single value.

If `scoring` represents multiple scores, one can use:

Expand Down
4 changes: 2 additions & 2 deletions sklearn/metrics/_scorer.py
Original file line number Diff line number Diff line change
Expand Up @@ -640,7 +640,7 @@ def make_scorer(
The parameter `response_method` allows to specify which method of the estimator
should be used to feed the scoring/loss function.

Read more in the :ref:`User Guide <scoring>`.
Read more in the :ref:`User Guide <scoring_callable>`.

Parameters
----------
Expand Down Expand Up @@ -933,7 +933,7 @@ def check_scoring(estimator=None, scoring=None, *, allow_none=False, raise_exc=T
Scorer to use. If `scoring` represents a single score, one can use:

- a single string (see :ref:`scoring_parameter`);
- a callable (see :ref:`scoring`) that returns a single value.
- a callable (see :ref:`scoring_callable`) that returns a single value.

If `scoring` represents multiple scores, one can use:

Expand Down
4 changes: 2 additions & 2 deletions sklearn/model_selection/_plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -369,7 +369,7 @@ def from_estimator(
scoring : str or callable, default=None
A string (see :ref:`scoring_parameter`) or
a scorer callable object / function with signature
`scorer(estimator, X, y)` (see :ref:`scoring`).
`scorer(estimator, X, y)` (see :ref:`scoring_callable`).

exploit_incremental_learning : bool, default=False
If the estimator supports incremental learning, this will be
Expand Down Expand Up @@ -752,7 +752,7 @@ def from_estimator(
scoring : str or callable, default=None
A string (see :ref:`scoring_parameter`) or
a scorer callable object / function with signature
`scorer(estimator, X, y)` (see :ref:`scoring`).
`scorer(estimator, X, y)` (see :ref:`scoring_callable`).

n_jobs : int, default=None
Number of jobs to run in parallel. Training the estimator and
Expand Down
4 changes: 2 additions & 2 deletions sklearn/model_selection/_search.py
Original file line number Diff line number Diff line change
Expand Up @@ -1247,7 +1247,7 @@ class GridSearchCV(BaseSearchCV):
If `scoring` represents a single score, one can use:

- a single string (see :ref:`scoring_parameter`);
- a callable (see :ref:`scoring`) that returns a single value.
- a callable (see :ref:`scoring_callable`) that returns a single value.

If `scoring` represents multiple scores, one can use:

Expand Down Expand Up @@ -1623,7 +1623,7 @@ class RandomizedSearchCV(BaseSearchCV):
If `scoring` represents a single score, one can use:

- a single string (see :ref:`scoring_parameter`);
- a callable (see :ref:`scoring`) that returns a single value.
- a callable (see :ref:`scoring_callable`) that returns a single value.

If `scoring` represents multiple scores, one can use:

Expand Down
4 changes: 2 additions & 2 deletions sklearn/model_selection/_search_successive_halving.py
Original file line number Diff line number Diff line change
Expand Up @@ -480,7 +480,7 @@ class HalvingGridSearchCV(BaseSuccessiveHalving):

scoring : str, callable, or None, default=None
A single string (see :ref:`scoring_parameter`) or a callable
(see :ref:`scoring`) to evaluate the predictions on the test set.
(see :ref:`scoring_callable`) to evaluate the predictions on the test set.
If None, the estimator's score method is used.

refit : bool, default=True
Expand Down Expand Up @@ -821,7 +821,7 @@ class HalvingRandomSearchCV(BaseSuccessiveHalving):

scoring : str, callable, or None, default=None
A single string (see :ref:`scoring_parameter`) or a callable
(see :ref:`scoring`) to evaluate the predictions on the test set.
(see :ref:`scoring_callable`) to evaluate the predictions on the test set.
If None, the estimator's score method is used.

refit : bool, default=True
Expand Down
7 changes: 4 additions & 3 deletions sklearn/model_selection/_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,12 +170,13 @@ def cross_validate(
scoring : str, callable, list, tuple, or dict, default=None
Strategy to evaluate the performance of the cross-validated model on
the test set. If `None`, the
:ref:`default evaluation criterion <model_evaluation>` of the estimator is used.
:ref:`default evaluation criterion <scoring_api_overview>` of the estimator
is used.

If `scoring` represents a single score, one can use:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment line 173, but now instead of cross-linking to <model_evaluation> it seems more pertinent to link to <scoring_api_overview>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Done!


- a single string (see :ref:`scoring_parameter`);
- a callable (see :ref:`scoring`) that returns a single value.
- a callable (see :ref:`scoring_callable`) that returns a single value.

If `scoring` represents multiple scores, one can use:

Expand Down Expand Up @@ -1562,7 +1563,7 @@ def permutation_test_score(

scoring : str or callable, default=None
A single str (see :ref:`scoring_parameter`) or a callable
(see :ref:`scoring`) to evaluate the predictions on the test set.
(see :ref:`scoring_callable`) to evaluate the predictions on the test set.

If `None` the estimator's score method is used.

Expand Down