Skip to content

DOC Improve user guide on scoring parameter #30316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 29, 2024

Conversation

lucyleeow
Copy link
Member

@lucyleeow lucyleeow commented Nov 21, 2024

Reference Issues/PRs

Follows on from #30303, while referencing to the user guide, thought it could be improved.

What does this implement/fix? Explain your changes.

  • Adds some more info/context to the intro section in model evaluation
  • Adds an intro section to the "The scoring parameter:" section - allows users to quickly see all options and click to the most relevant
  • Adds an intro section 'Callable scorers', logically grouping all the 'callable scorer' sections.
  • Takes the custom scorer with make_scorer section out of dropdown. I think it didn't make sense for this to be in a dropdown inside a section titled "Defining your scoring strategy from metric functions" and it is of similar 'level' to the other sections detailing callable scorers

Any other comments?

Some changes opinionated, happy to change anything.

Comment on lines +46 to +47
:class:`model_selection.GridSearchCV`, :func:`model_selection.validation_curve` and
:class:`linear_model.LogisticRegressionCV`) take a ``scoring`` parameter that
Copy link
Member Author

@lucyleeow lucyleeow Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to give example outside of the model_selection module.

Copy link

github-actions bot commented Nov 21, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: bb03495. Link to the linter CI: here

@lucyleeow
Copy link
Member Author

Maybe @ArturoAmorQ or @StefanieSenger who looked at the related PR may be interested in taking a look?

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like these changes a lot, @lucyleeow! These add valuable information and structure this part of the Userguide better.
I found some typos, otherwise it's looking very good.

This is not discussed on this page, but in each estimator's documentation.
Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the
:ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
Details for each estimator can be found in it's documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Details for each estimator can be found in it's documentation.
Details for each estimator can be found in its documentation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "each estimator" is plural, right? So should it be "their" documentation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to look it up but I think it is singular: https://editorsmanual.com/articles/each-singular-or-plural/ ?

estimators `score` method) is used.
* :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
name.
* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable
* :ref:`Callable <scoring_callable>`: more complex metrics or custom metrics
can be passed via a callable

Copy link
Member Author

@lucyleeow lucyleeow Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what about:

"more complex metrics can be passed via a custom metric callable" ?

I don't think 'complex metric' and 'custom metric' have an 'or' relationship, if that makes sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what about:
"more complex metrics can be passed via a custom metric callable" ?
I don't think 'complex metric' and 'custom metric' have an 'or' relationship, if that makes sense?

I like your suggestion.

biclustering.

functions to measure clustering performance. For more information see the
:ref:`clustering_evaluation` section for instance clustering, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:ref:`clustering_evaluation` section for instance clustering, and
:ref:`clustering_evaluation` section for clustering, and

Copy link
Member Author

@lucyleeow lucyleeow Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a typo, I think "instance clustering" is referring clustering of a single row, to differentiate it from biclustering, though I understand its a technical term and can read confusing (I think we can keep as is?).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I wasn't aware it was a technical term before.

Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @lucyleeow! Here's a first batch of comments.

This is not discussed on this page, but in each estimator's documentation.
Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the
:ref:`coefficient of determination <r2_score>` (:math:`R^2`) for regressors.
Details for each estimator can be found in it's documentation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "each estimator" is plural, right? So should it be "their" documentation?

@@ -11,13 +11,16 @@ predictions:

* **Estimator score method**: Estimators have a ``score`` method providing a
default evaluation criterion for the problem they are designed to solve.
This is not discussed on this page, but in each estimator's documentation.
Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the feeling that "mean" here is somewhat ambiguous, does it mean the macro average over classes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was stumbling over this as well.

What is meant here, I believe, is that accuracy_score() returns the average accuracy over all samples in the test set with return float(_average(score, weights=sample_weight, normalize=normalize)).

The formulation "mean accuracy" is used inconsistently across our docs and I think this inconsistency is what makes it most confusing. And also, most people would taking the average here for granted.

What do you think, @lucyleeow?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good pick up, I have also noticed that we use "mean accuracy" and "accuracy" somewhat inconsistently in our docs.

My initial thought was that we use "mean", for multi-label cases, but I realise that is not the case.

From our docs 'accuracy' is "either the fraction (default) or the count (normalize=False) of correct predictions." so I think by definition its 'over all samples in the test set'. (and if we are returning count, there is no averaging here at all).

I would vote for removing 'mean' where we are referring to the accuracy score.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for removing 'mean' where we are referring to the accuracy score.

Yes, I think it's clearer. That could also be a new PR.

>>> my_custom_loss_func(y, clf.predict(X))
0.69...
>>> score(clf, X, y)
-0.69...

.. _diy_scoring:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By doing a grep it seems that diy_scoring is not referenced elsewhere in the documentation. Maybe the preamble paragraph here was meant to avoid breaking the cross-reference, but can now be completely inside the dropdown, to simplify a bit.

Copy link
Member Author

@lucyleeow lucyleeow Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, are you suggesting that we move the "Custom scorer objects from scratch" section into a dropdown, and then we'd rename the above "Custom scorer objects using make_scorer" section to just be called "Custom scorer objects" ?

Edit: nevermind, I got it after reading your next comment.

@@ -171,59 +196,61 @@ measuring a prediction error given ground truth and prediction:
the ``greater_is_better`` parameter to ``False`` (``True`` by default; see the
parameter description below).

.. _scoring_make_scorer:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather not remove this dropdown, as this is already a really long piece of the documentation and it's somewhat easy to get lost here. What we could do to keep the reference working is adding a preamble paragraph and then hide the details in a dropdown, similarly to what was done for the "Implementing your own scoring object" section.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay outline of sections is now:

* :ref:`scoring_adapt_metric` (least flexible)
* :ref:`scoring_make_scorer`
  * Using `make_scorer` (more flexible)
  * From scratch (most flexible)

the last 2 sub-bullets are both in their own drop down.

Implementing your own scoring object
------------------------------------
Custom scorer objects from scratch
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

You can generate even more flexible model scorers by constructing your own
scoring object from scratch, without using the :func:`make_scorer` factory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just noticed that the wording "without using the :func:make_scorer factory" is not factually correct, as the example shown in the "Using custom scorers in functions where n_jobs > 1" note does require make_scorer (line 270 in main).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm yes this is a good point. AFAICT the make_scorer is irrelevant here, we're really just showing that we should import the scorer.

Potentially this section has just been moved to live with the diy custom scorer section with the dropdown re-shuffling?

I am going to remove the make_scorer part as it's not relevant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this note should apply to both types of custom scorers, I am going to put it in it's own drop down at the end?

But happy to change, just let me know.

@lucyleeow
Copy link
Member Author

Thank you @StefanieSenger and @ArturoAmorQ ! I think I have address your comments.

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work, @lucyleeow!

I've read through it again in the local build: it looks and reads very nice.

biclustering.

functions to measure clustering performance. For more information see the
:ref:`clustering_evaluation` section for instance clustering, and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, I wasn't aware it was a technical term before.

@@ -11,13 +11,16 @@ predictions:

* **Estimator score method**: Estimators have a ``score`` method providing a
default evaluation criterion for the problem they are designed to solve.
This is not discussed on this page, but in each estimator's documentation.
Most commonly this is mean :ref:`accuracy <accuracy_score>` for classifiers and the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would vote for removing 'mean' where we are referring to the accuracy score.

Yes, I think it's clearer. That could also be a new PR.

estimators `score` method) is used.
* :ref:`String name <scoring_string_names>`: common metrics can be passed via a string
name.
* :ref:`Callable <scoring_callable>`: more complex metrics can be passed via a callable
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm what about:
"more complex metrics can be passed via a custom metric callable" ?
I don't think 'complex metric' and 'custom metric' have an 'or' relationship, if that makes sense?

I like your suggestion.

Copy link
Member

@ArturoAmorQ ArturoAmorQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a last nitpick but otherwise LGTM :)

@@ -175,7 +175,7 @@ def cross_validate(
If `scoring` represents a single score, one can use:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot comment line 173, but now instead of cross-linking to <model_evaluation> it seems more pertinent to link to <scoring_api_overview>

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Done!

@ArturoAmorQ
Copy link
Member

Thanks @lucyleeow, merging!

@ArturoAmorQ ArturoAmorQ merged commit 8a8bfc2 into scikit-learn:main Nov 29, 2024
30 checks passed
@lucyleeow lucyleeow deleted the doc_score_api branch November 29, 2024 08:59
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 4, 2024
@jeremiedbb jeremiedbb mentioned this pull request Dec 4, 2024
virchan pushed a commit to virchan/scikit-learn that referenced this pull request Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants