DOC Improve user guide on scoring parameter #30316

lucyleeow · 2024-11-21T04:50:13Z

Reference Issues/PRs

Follows on from #30303, while referencing to the user guide, thought it could be improved.

What does this implement/fix? Explain your changes.

Adds some more info/context to the intro section in model evaluation
Adds an intro section to the "The scoring parameter:" section - allows users to quickly see all options and click to the most relevant
Adds an intro section 'Callable scorers', logically grouping all the 'callable scorer' sections.
Takes the custom scorer with make_scorer section out of dropdown. I think it didn't make sense for this to be in a dropdown inside a section titled "Defining your scoring strategy from metric functions" and it is of similar 'level' to the other sections detailing callable scorers

Any other comments?

Some changes opinionated, happy to change anything.

lucyleeow · 2024-11-21T04:51:26Z

doc/modules/model_evaluation.rst

+:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
+:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that


Wanted to give example outside of the model_selection module.

github-actions · 2024-11-21T04:51:32Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: bb03495. Link to the linter CI: here}

lucyleeow · 2024-11-22T10:33:19Z

Maybe @ArturoAmorQ or @StefanieSenger who looked at the related PR may be interested in taking a look?

StefanieSenger

I like these changes a lot, @lucyleeow! These add valuable information and structure this part of the Userguide better.
I found some typos, otherwise it's looking very good.

StefanieSenger · 2024-11-25T11:32:31Z

doc/modules/model_evaluation.rst

-  This is not discussed on this page, but in each estimator's documentation.
+  Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the
+  :ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
+  Details for each estimator can be found in it's documentation.


Suggested change

Details for each estimator can be found in it's documentation.

Details for each estimator can be found in its documentation.

I think "each estimator" is plural, right? So should it be "their" documentation?

I had to look it up but I think it is singular: https://editorsmanual.com/articles/each-singular-or-plural/ ?

doc/modules/model_evaluation.rst

StefanieSenger · 2024-11-25T12:14:20Z

doc/modules/model_evaluation.rst

+  estimators `score` method) is used.
+* :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
+  name.
+* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable


Suggested change

* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable

* :ref:`Callable <scoring_callable>`: more complex metrics or custom metrics

can be passed via a callable

Hmm what about:

"more complex metrics can be passed via a custom metric callable" ?

I don't think 'complex metric' and 'custom metric' have an 'or' relationship, if that makes sense?

Hmm what about:
"more complex metrics can be passed via a custom metric callable" ?
I don't think 'complex metric' and 'custom metric' have an 'or' relationship, if that makes sense?

I like your suggestion.

doc/modules/model_evaluation.rst

StefanieSenger · 2024-11-25T13:25:08Z

doc/modules/model_evaluation.rst

-biclustering.
-
+functions to measure clustering performance. For more information see the
+:ref:`clustering_evaluation` section for instance clustering, and


Suggested change

:ref:`clustering_evaluation` section for instance clustering, and

:ref:`clustering_evaluation` section for clustering, and

I don't think this is a typo, I think "instance clustering" is referring clustering of a single row, to differentiate it from biclustering, though I understand its a technical term and can read confusing (I think we can keep as is?).

I see, I wasn't aware it was a technical term before.

ArturoAmorQ

Thanks for the PR @lucyleeow! Here's a first batch of comments.

ArturoAmorQ · 2024-11-26T11:02:39Z

doc/modules/model_evaluation.rst

-  This is not discussed on this page, but in each estimator's documentation.
+  Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the
+  :ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
+  Details for each estimator can be found in it's documentation.


I think "each estimator" is plural, right? So should it be "their" documentation?

ArturoAmorQ · 2024-11-26T12:38:06Z

doc/modules/model_evaluation.rst

@@ -11,13 +11,16 @@ predictions:

 * **Estimator score method**: Estimators have a ``score`` method providing a
  default evaluation criterion for the problem they are designed to solve.
-  This is not discussed on this page, but in each estimator's documentation.
+  Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the


I have the feeling that "mean" here is somewhat ambiguous, does it mean the macro average over classes?

I was stumbling over this as well.

What is meant here, I believe, is that accuracy_score() returns the average accuracy over all samples in the test set with return float(_average(score, weights=sample_weight, normalize=normalize)).

The formulation "mean accuracy" is used inconsistently across our docs and I think this inconsistency is what makes it most confusing. And also, most people would taking the average here for granted.

What do you think, @lucyleeow?

Yeah good pick up, I have also noticed that we use "mean accuracy" and "accuracy" somewhat inconsistently in our docs.

My initial thought was that we use "mean", for multi-label cases, but I realise that is not the case.

From our docs 'accuracy' is "either the fraction (default) or the count (normalize=False) of correct predictions." so I think by definition its 'over all samples in the test set'. (and if we are returning count, there is no averaging here at all).

I would vote for removing 'mean' where we are referring to the accuracy score.

I would vote for removing 'mean' where we are referring to the accuracy score.

Yes, I think it's clearer. That could also be a new PR.

doc/modules/model_evaluation.rst

ArturoAmorQ · 2024-11-26T13:18:23Z

doc/modules/model_evaluation.rst

+    >>> my_custom_loss_func(y, clf.predict(X))
+    0.69...
+    >>> score(clf, X, y)
+    -0.69...

 .. _diy_scoring:


By doing a grep it seems that diy_scoring is not referenced elsewhere in the documentation. Maybe the preamble paragraph here was meant to avoid breaking the cross-reference, but can now be completely inside the dropdown, to simplify a bit.

To clarify, are you suggesting that we move the "Custom scorer objects from scratch" section into a dropdown, and then we'd rename the above "Custom scorer objects using make_scorer" section to just be called "Custom scorer objects" ?

Edit: nevermind, I got it after reading your next comment.

ArturoAmorQ · 2024-11-26T13:21:40Z

doc/modules/model_evaluation.rst

@@ -171,59 +196,61 @@ measuring a prediction error given ground truth and prediction:
  the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
  parameter description below).

+.. _scoring_make_scorer:


I would rather not remove this dropdown, as this is already a really long piece of the documentation and it's somewhat easy to get lost here. What we could do to keep the reference working is adding a preamble paragraph and then hide the details in a dropdown, similarly to what was done for the "Implementing your own scoring object" section.

Okay outline of sections is now:

* :ref:`scoring_adapt_metric` (least flexible) * :ref:`scoring_make_scorer` * Using `make_scorer` (more flexible) * From scratch (most flexible)

the last 2 sub-bullets are both in their own drop down.

ArturoAmorQ · 2024-11-26T13:28:15Z

doc/modules/model_evaluation.rst

-Implementing your own scoring object
------------------------------------
+Custom scorer objects from scratch
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

 You can generate even more flexible model scorers by constructing your own
 scoring object from scratch, without using the :func:`make_scorer` factory.


I just noticed that the wording "without using the :func:make_scorer factory" is not factually correct, as the example shown in the "Using custom scorers in functions where n_jobs > 1" note does require make_scorer (line 270 in main).

Hmm yes this is a good point. AFAICT the make_scorer is irrelevant here, we're really just showing that we should import the scorer.

Potentially this section has just been moved to live with the diy custom scorer section with the dropdown re-shuffling?

I am going to remove the make_scorer part as it's not relevant.

Actually this note should apply to both types of custom scorers, I am going to put it in it's own drop down at the end?

But happy to change, just let me know.

lucyleeow · 2024-11-27T05:44:15Z

Thank you @StefanieSenger and @ArturoAmorQ ! I think I have address your comments.

StefanieSenger

Thanks for your work, @lucyleeow!

I've read through it again in the local build: it looks and reads very nice.

StefanieSenger · 2024-11-27T07:40:37Z

doc/modules/model_evaluation.rst

-biclustering.
-
+functions to measure clustering performance. For more information see the
+:ref:`clustering_evaluation` section for instance clustering, and


I see, I wasn't aware it was a technical term before.

StefanieSenger · 2024-11-27T07:45:19Z

doc/modules/model_evaluation.rst

@@ -11,13 +11,16 @@ predictions:

 * **Estimator score method**: Estimators have a ``score`` method providing a
  default evaluation criterion for the problem they are designed to solve.
-  This is not discussed on this page, but in each estimator's documentation.
+  Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the


I would vote for removing 'mean' where we are referring to the accuracy score.

Yes, I think it's clearer. That could also be a new PR.

StefanieSenger · 2024-11-27T07:47:05Z

doc/modules/model_evaluation.rst

+  estimators `score` method) is used.
+* :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
+  name.
+* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable


Hmm what about:
"more complex metrics can be passed via a custom metric callable" ?
I don't think 'complex metric' and 'custom metric' have an 'or' relationship, if that makes sense?

I like your suggestion.

ArturoAmorQ

Just a last nitpick but otherwise LGTM :)

ArturoAmorQ · 2024-11-28T11:43:53Z

sklearn/model_selection/_validation.py

@@ -175,7 +175,7 @@ def cross_validate(
        If `scoring` represents a single score, one can use:


I cannot comment line 173, but now instead of cross-linking to <model_evaluation> it seems more pertinent to link to <scoring_api_overview>

Thank you! Done!

ArturoAmorQ · 2024-11-29T08:29:55Z

Thanks @lucyleeow, merging!

improve scoring param

be6a14a

github-actions bot added module:metrics Documentation labels Nov 21, 2024

lucyleeow commented Nov 21, 2024

View reviewed changes

lucyleeow added 4 commits November 21, 2024 16:03

wording

491982a

wording

853cfd8

wip

f09f353

fix cross ref

69f23de

lucyleeow mentioned this pull request Nov 21, 2024

DOC Improve and make consistent scoring parameter docstrings #30319

Merged

StefanieSenger reviewed Nov 25, 2024

View reviewed changes

Merge branch 'main' into doc_score_api

4acbe1a

ArturoAmorQ reviewed Nov 26, 2024

View reviewed changes

lucyleeow added 3 commits November 27, 2024 14:29

reviews

836a263

fix typos

68ba861

fixes

84cba21

StefanieSenger approved these changes Nov 27, 2024

View reviewed changes

ArturoAmorQ approved these changes Nov 28, 2024

View reviewed changes

review

bb03495

ArturoAmorQ merged commit 8a8bfc2 into scikit-learn:main Nov 29, 2024
30 checks passed

lucyleeow deleted the doc_score_api branch November 29, 2024 08:59

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 4, 2024

DOC Improve user guide on scoring parameter (scikit-learn#30316)

a0615e8

jeremiedbb mentioned this pull request Dec 4, 2024

DOC Fix broken ref #30407

Merged

jeremiedbb pushed a commit that referenced this pull request Dec 6, 2024

DOC Improve user guide on scoring parameter (#30316)

5f7d66c

virchan pushed a commit to virchan/scikit-learn that referenced this pull request Dec 9, 2024

DOC Improve user guide on scoring parameter (scikit-learn#30316)

199a10c

		:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
		:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that

	Details for each estimator can be found in it's documentation.
	Details for each estimator can be found in its documentation.

	* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable
	* :ref:`Callable <scoring_callable>`: more complex metrics or custom metrics
	can be passed via a callable

	:ref:`clustering_evaluation` section for instance clustering, and
	:ref:`clustering_evaluation` section for clustering, and

		@@ -175,7 +175,7 @@ def cross_validate(
		If `scoring` represents a single score, one can use:

Uh oh!

DOC Improve user guide on scoring parameter #30316

DOC Improve user guide on scoring parameter #30316

Uh oh!

Conversation

lucyleeow commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

lucyleeow Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

lucyleeow commented Nov 22, 2024

Uh oh!

StefanieSenger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArturoAmorQ left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucyleeow Nov 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

lucyleeow commented Nov 21, 2024 •

edited

Loading

lucyleeow Nov 21, 2024 •

edited

Loading

github-actions bot commented Nov 21, 2024 •

edited

Loading

lucyleeow Nov 27, 2024 •

edited

Loading

lucyleeow Nov 27, 2024 •

edited

Loading

lucyleeow Nov 27, 2024 •

edited

Loading

StefanieSenger left a comment •

edited

Loading