Skip to content

FIX handle aliases in displays when used as default and provided by user #30023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Oct 17, 2024

Conversation

JosephBARBIERDARNAL
Copy link
Contributor

@JosephBARBIERDARNAL JosephBARBIERDARNAL commented Oct 7, 2024

Reference Issue

closes #30015

What does this implement/fix?

Bug summary

When passing additional keyword arguments to the random classifier's line via the chance_level_kw argument, some arguments raise an error even though they are valid matplotlib.pyplot.plot() arguments. The error occurs with the c and ls arguments.

The reason is that in scikit-learn/sklearn/metrics/_plot/roc_curve.py, the following code exists:

chance_level_line_kw = {
    "label": "Chance level (AUC = 0.5)",
    "color": "k",
    "linestyle": "--",
}

if chance_level_kw is not None:
    chance_level_line_kw.update(**chance_level_kw)

Matplotlib raises an error when both color and c, or linestyle and ls are specified (this happens with other arguments too, but these are not relevant here since scikit-learn does not set values for them).

Solution TLDR

If the user supplies ls or c, linestyle and color are overwritten with user inputs, and ls and c are removed from chance_level_kw.

I've added a test for these specific cases.

Any other comments?

At the moment, it only fixes the RocCurveDisplay class. I need to investigate what other classes have the same reproducible bug.

Copy link

github-actions bot commented Oct 7, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: f00f5de. Link to the linter CI: here

@JosephBARBIERDARNAL
Copy link
Contributor Author

@glemaitre here the fix for #30015 . I'll wait for review before continuing.

@glemaitre
Copy link
Member

Thanks. We will need an entry in the changelog for sure. This should solve the CI. You can add it in doc/whats_new/v1.6.rst file.

@JosephBARBIERDARNAL
Copy link
Contributor Author

I added the fix for PredictionErrorDisplay.

Note: In metrics/_plot/tests/test_predict_error_display.py, there was a duplicate:

assert display.line_.get_color() == "black"
assert_allclose(display.scatter_.get_edgecolor(), [[1.0, 0.0, 0.0, 0.8]])

This is supposed to test the default parameters, but since display.plot(**extra_params) is used, the default parameters are not actually being tested.

@glemaitre
Copy link
Member

glemaitre commented Oct 7, 2024

I add a quick look at the different displays and I think we will have issue in the following:

  • CalibrationDisplay
  • PartialDependenceDisplay
  • ConfusionMatrixDisplay
  • PrecisionRecallDisplay
  • PredictionErrorDisplay

In general, it boils down to the same issue due to the aliases:

  • color or c
  • linewidth or lw
  • linestyle or ls

So I'm thinking that we can make a utility tool in sklearn/utils/_plotting.py module.
I think that this utility could be thought with the following signature:

def validate_style_kwargs(default_style_kwargs, user_style_kwargs):
    ...

The job will be to check that the user does not pass two aliases (e.g. lw + linewidth) such that we can safely delete an option afterwards. Once we are sure, we return a dictionary that use only one of the alias. Let's avoid to call del on the dict but instead create the dictionary.

Then in each function, we only have to call the utility whenever we want to pass it to the plotting function from matplotlib.

In terms of test, we therefore only need to check our small utility function.

@JosephBARBIERDARNAL
Copy link
Contributor Author

So I'm thinking that we can make a utility tool in sklearn/utils/_plotting.py module.
I think that this utility could be thought with the following signature:

I agree, I'll do that

@glemaitre glemaitre self-requested a review October 7, 2024 19:50
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. Could you also check the remaining classes for the different kwargs.

[
None,
{"linewidth": 1, "color": "red", "linestyle": "-", "label": "DummyEstimator"},
{"linewidth": 1, "c": "red", "ls": "-", "label": "DummyEstimator"},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe lw here instead of linewidth

@JosephBARBIERDARNAL
Copy link
Contributor Author

It looks good. Could you also check the remaining classes for the different kwargs.

Yes

JosephBARBIERDARNAL and others added 6 commits October 8, 2024 09:51
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
@glemaitre glemaitre changed the title fix invalid inputs for chance_level_line_kw in display classes FIX handle aliases in displays when used as default and provided by user Oct 8, 2024
@JosephBARBIERDARNAL
Copy link
Contributor Author

JosephBARBIERDARNAL commented Oct 9, 2024

  • RocCurveDisplay
  • CalibrationDisplay
  • ConfusionMatrixDisplay
  • PrecisionRecallDisplay
  • PredictionErrorDisplay
  • PartialDependenceDisplay: I've made a few changes but I can't get a few tests (assert line.get_color() == expected_colors[0]) to pass. My changes probably handle kw priorities incorrectly. Some help would be appreciated.

@JosephBARBIERDARNAL
Copy link
Contributor Author

Also, I'm not sure in which section of the changelog I should add the changes? Most of them are in sklearn.metrics but not all:

- |Fix| the classes :class:`metrics.ConfusionMatrixDisplay`, :class:`metrics.ROCCurveDisplay`,
  :class:`calibration.CalibrationDisplay`, :class:`metrics.PrecisionRecallDisplay`,
  :class:`metrics.PredictionErrorDisplay` and :class:`inspection.PartialDependenceDisplay`
  now properly handle Matplotlib aliases for style parameters (e.g., `c` and `color`, `ls`
  and `linestyle`, etc).
  :pr:`30023` by :user:`Joseph Barbier <JosephBARBIERDARNAL>` and :user:`Guillaume Lemaitre <glemaitre>`.

@glemaitre
Copy link
Member

On the top of the changelog with have a section called "Changes impacting many modules". I think we can add the entry in this section because as you mentioned, we are touching classes from several modules.

@glemaitre
Copy link
Member

I'll do a review. I might push a couple of fixes just to make the CI happy first.

@glemaitre glemaitre self-requested a review October 9, 2024 15:09
@JosephBARBIERDARNAL
Copy link
Contributor Author

JosephBARBIERDARNAL commented Oct 9, 2024

I've just made the changes @glemaitre

What is missing for the CI to pass? I use the pre-commit hook on my side but it still seems to have linting problems.

@glemaitre glemaitre added this to the 1.6 milestone Oct 11, 2024
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once the last change is done, everything looks good to me.

Thanks @JosephBARBIERDARNAL.

We will need to get a second review to get this one in. Let me ping maybe @lucyleeow or @Charlie-XIAO

Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
@JosephBARBIERDARNAL
Copy link
Contributor Author

Thanks @JosephBARBIERDARNAL.

Perfect, thanks for the prompt reviews.

Copy link
Contributor

@Charlie-XIAO Charlie-XIAO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @JosephBARBIERDARNAL this PR looks mostly good to me! Just a few questions/comments:

Matplotlib style kwargs.
"""

invalid_to_valid_kw = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May I know where this mapping came from? I searched in the matplotlib source code where they used their define_aliases decorator, and there seems to be many more that we missed here (though maybe not all of them are related here).

https://github.com/search?q=repo%3Amatplotlib%2Fmatplotlib+define_aliases&type=code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I made it myself, more or less arbitrarily. But since matplotlib has a clearly defined list of aliases, maybe let's just put the entire list instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed. We could even import it because we expect to have matplotlib installed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a quick check, and I'm not sure how to import it in practice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I could not find a non-private way of getting them. So I would define the list such that in account for the keys/values of the matplotlib artist that we are using.

FYI, our function is close to this function (https://github.com/matplotlib/matplotlib/blob/v3.9.2/lib/matplotlib/cbook.py#L1795-L1848) when it comes to matplotlib itself. No need to try using it thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've added all the aliases, except for those used for 3D plotting

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @JosephBARBIERDARNAL

Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com>
@glemaitre glemaitre self-requested a review October 14, 2024 21:16
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Charlie-XIAO do you want to have another look?

Matplotlib style kwargs.
"""

invalid_to_valid_kw = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @JosephBARBIERDARNAL

Copy link
Contributor

@Charlie-XIAO Charlie-XIAO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @JosephBARBIERDARNAL! There's just a small typo:

Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com>
@glemaitre glemaitre self-requested a review October 17, 2024 19:23
@glemaitre
Copy link
Member

We just change the way that we add entry into the changelog. Let me make the change and push it.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the fragment in the changelog. Enabling the auto-merge then.

Thanks @JosephBARBIERDARNAL. Nice contribution.

@glemaitre glemaitre enabled auto-merge (squash) October 17, 2024 19:27
@JosephBARBIERDARNAL
Copy link
Contributor Author

Thanks! This is my very first merged PR, and it's been a really good experience.

@glemaitre glemaitre merged commit 8b06fa6 into scikit-learn:main Oct 17, 2024
30 checks passed
@JosephBARBIERDARNAL JosephBARBIERDARNAL mentioned this pull request Nov 3, 2024
17 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

chance_level_kw in RocCurveDisplay raises an error when using valid matplotlib args
3 participants