Skip to content

[MRG] ENH Improves on the plotting api by storing weak ref to display in axes #15702

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Resolves #15581 (better)

What does this implement/fix? Explain your changes.

After a discussion with @tacaswell, it was concluded that adding attribute to the axes would be solution to figuring out

This allows use cases in #15581 to work:

fig, ax = plt.subplots(figsize=(10, 6))
tree_disp = plot_partial_dependence(tree, X, ["LSTAT"], ax=ax)
mlp_disp = plot_partial_dependence(mlp, X, ["LSTAT"], ax=ax,
                                   line_kw={"c": "red"})

Any other comments?

Storing the display object in the axes so it can be referenced later is a small hack, but it improves the API quite a bit.

CC @tacaswell @amueller @NicolasHug @glemaitre @qinhanmin2014

@NicolasHug
Copy link
Member

Does this also allow to pass a ax that was drawn on before? Or will it be cleared like currently in master?

@thomasjpfan
Copy link
Member Author

For this specific case, if the object was drawn on by a PartialDependenceDisplay object, it will not error as shown above.

If it was drawn on by a normal plot function, it will error. For example:

fig, ax = pyplot.subplots()
ax.plot([0, 1, 2], [1, 2, 3])
plot_partial_dependence(..., ax=ax)

@NicolasHug
Copy link
Member

Is it reasonable to store the display object as an attribute of the ax?

The display objects stores all the data which can get pretty big. You would expect that the data can be GCed if the Display object isn't there anymore.

But with this PR, the data exists as long as ax exists.

@NicolasHug
Copy link
Member

If we can, it would be ideal to also support that case

fig, ax = pyplot.subplots()
ax.plot([0, 1, 2], [1, 2, 3])
plot_partial_dependence(..., ax=ax)

This is typically useful if you want to plot the data points and then the PDPs on top, as done in #15582 (comment)

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Nov 21, 2019

Yes I have to make sure the axes is a weak reference.

@thomasjpfan
Copy link
Member Author

As in the axes has a weak reference to the display object.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments.

Sorry if I'm missing something but I don't see where the weak ref is being created? (this should also be tested)

Comment on lines 6 to 7
for used_attr in used_attrs:
if hasattr(ax, used_attr) and getattr(ax, used_attr):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this equivalent to simply if any(getattr(...., None) for attr in used_attrs)?

"""Return true if the axes has been used"""
used_attrs = ['lines', 'patches', 'texts', 'tables', 'artists',
'tables', 'images']
msg = "The ax was already used in another plot function"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a matplotlib plot function or one of scikit-learn's? (same for docstring)

if hasattr(ax, "_sklearn_display_object"):
if not isinstance(ax._sklearn_display_object, self.__class__):
raise ValueError("The ax was already used by another "
"display object")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should say "by another display object which is not an instance of self.class.name"

@thomasjpfan
Copy link
Member Author

Sorry if I'm missing something but I don't see where the weak ref is being created? (this should also be tested)

I was just thinking about the weakref on the train. I am going to move this back to WIP for a little bit.

thank you for the early reviews.

@thomasjpfan thomasjpfan changed the title [MRG] ENH Improves on the plotting api when dealing with multiple plots [WIP] ENH Improves on the plotting api when dealing with multiple plots Nov 22, 2019
@thomasjpfan thomasjpfan changed the title [WIP] ENH Improves on the plotting api when dealing with multiple plots [MRG] ENH Improves on the plotting api when dealing with multiple plots Nov 22, 2019
@thomasjpfan
Copy link
Member Author

If we can, it would be ideal to also support that case

fig, ax = pyplot.subplots()
ax.plot([0, 1, 2], [1, 2, 3])
plot_partial_dependence(..., ax=ax)

This is typically useful if you want to plot the data points and then the PDPs on top, as done in #15582 (comment)

Supporting this can run into an unclear API. For example:

fig, ax = pyplot.subplots()
ax.plot([0, 1, 2], [1, 2, 3])

plot_partial_dependence(est, features=[0, 1, 2], ax=ax)

For this case, I do not think a user would want the first call to plot to be used in all 3 partial dependence plots.

The only case where this can make sense is when there is only one feature:

fig, ax = pyplot.subplots()
ax.plot([0, 1, 2], [1, 2, 3])

plot_partial_dependence(est, features=[0], ax=ax)

I am -0.2 on this. It will lead to another code path to handle len(features)==1 and isinstance(ax, plt.Axes).

@NicolasHug
Copy link
Member

Please ping when ready @thomasjpfan .

Also, I was looking at pandas internal: when you pass ax to df.plot() but they need to clear the figure, they raise a UserWarning. Maybe we should do that too to avoid surprises. To reproduce:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

d = {'a': np.arange(10),
     'b': np.arange(10) + 23 * 4}

fig, ax = plt.subplots()

df = pd.DataFrame(d)
df.plot(subplots=True, ax=ax)  # ask for subplots but also for a single ax => warning
plt.show()

@thomasjpfan
Copy link
Member Author

@NicolasHug This is ready for another review.

I slightly prefer the current behavior where we do not clear the axes and raise an error if the axes has been used. If we relax this restriction and automatically clear the axes with a warning, there will be code out there that depends on this behavior.

@NicolasHug
Copy link
Member

please refresh my memory: if I pass an array-like as ax, with stuff already drawn-on, do we raise an error?

In retrospect I'm not even sure we want to raise errors/warnings.

@thomasjpfan
Copy link
Member Author

If there as an array like of ax with stuff already drawn on, it will raise an error if the number of axes in the array like is not the expected number of axes. If it has the expected number of axes, then they will be drawn on directly.

#15760 provides a better solution to how plot_partial_dependence checks the axes.

@thomasjpfan thomasjpfan changed the title [MRG] ENH Improves on the plotting api when dealing with multiple plots [MRG] ENH Improves on the plotting api by storing weak ref to display in axes Dec 5, 2019
Base automatically changed from master to main January 22, 2021 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New plotting API, basic usage
2 participants