Skip to content

DOC: attempt to explain the main different APIs #21877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 3, 2022

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented Dec 7, 2021

PR Summary

This is an initial attempt to explain the different Matplotlib APIs. Note the terminology here is suggested by @timhoffm "Axes", "pyplot" and I've suggested "Data" for the third pandas/xarray interface.

  • There are quite a few places in the docs where we could remove similar half explanations and reference this page (or what it turns into).
  • The proposed terminology should be worked into the docs in quite a few places where "OO"/"pyplot" is discussed.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

@jklymak jklymak added the Documentation: website layout/behavior/styling changes label Dec 7, 2021
@jklymak
Copy link
Member Author

jklymak commented Dec 7, 2021

ping @mwaskom for comment and suggestions as well, as you have previously commented on the need for such a page.

@mwaskom
Copy link

mwaskom commented Dec 7, 2021

Did you perhaps forget to git add the new file?

@jklymak
Copy link
Member Author

jklymak commented Dec 7, 2021

Ooops!

Copy link

@mwaskom mwaskom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jklymak for starting this — I think that alleviating this very common source of confusion will be incredibly helpful for matplotlib users.

I went through and made some comments, although I didn't get to the end before I needed to get to actual work. I am not a matplotlib dev and of course you don't need my approval, but as you did ask for my feedback, I would say that this could probably be made both more focused and more accessible. IMO the current version tries to cover too many concepts and isn't as clear as it could be about the key points that it wants to make.

A couple high-level structural suggestions:

  • As mentioned in the line comments, I would drop the discussion of the pandas API and the "imperative / declarative" distinction from this page.
  • The primary audience for this page is probably users of the implicit interface who need some education / persuading about the explicit interface. So it might make sense to start with the implicit interface, explain what's happening "behind the scenes" in an example plot, and then show how the Axes interface makes that behind-the-scenes work explicit.
  • Users would probably benefit from a structured "pros and cons" table

Comment on lines 22 to 25
- a number of downstream libraries offer a
`declarative <https://en.wikipedia.org/wiki/Declarative_programming>`_
interface, usually where a data object has a ``plot`` method implemented
that will plot the data.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest not including this third category here. It is complicated enough to explain (1) whether the target figure/axes is explicit or implicit and (2) the subtle differences in the configuration functions/methods across those two APIs. Introducing the imperative/declarative distinction and third-party libraries that users may or may not have encountered only serves to distract from that key message, IMO.

I'm also not sure I would even call the pandas/xarray interfaces "declarative". They are basically just semantic sugar that convert

df.scatter(x="column a", y="column b", ax=ax)

to

ax.scatter(x=data["column a"], y=data["column b"])

This is distinct from the declarative nature of an interface like gpplot or seaborn, where you could say

scatter(data=df, x="column a", y="column b", color="column c")

and the arguments to parameters like color would be data values, rather than rgb tuples.

Opinions about how to categorize these may differ (you could argue that the groupby/faceting interface in pandas plotting is more declarative), but it's a pretty fuzzy distinction IMO and that fuzziness only adds unnecessary complexity here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's worthwhile to discuss this category

  • This should be clearly separated from native Matplotlib think
    • Matplotlib
      • Axes
      • pyplot
    • Third party
  • I agree that we should not call it declarative. If we don't find a better name the section could be titled "How third party libraries provide plotting functions".
  • Not sure if the synthetic example is easy to transfer. I'd go with the probably most used third-party lib and make a pandas example.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wrt to the synthetic example, we do not have pandas (or xarray) as a doc dependency, so we would need to agree to add that. The only issue there might be the temptation to add pandas examples everywhere, which I think would be a mistake.

using different interfaces, often leading to confusion. The three major
interfaces are:

- an explicit interface that uses methods on a Figure or Axes object to
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I think converging on a standardized nomenclature for the two interfaces and using them consistently is good and important, this page should introduce the various synonyms as well, because you want people to get here when they google "matplotlib object oriented interface", and you want people to be able to map whatever concepts they learn here onto the terms they see elsewhere.

Comment on lines 7 to 12
Python is a flexible language allowing different design choices when
creating an application programming interface (API). Matplotlib
has seen two of these over the years (or more, depending on how you
count). In addition, external libraries have their own interfaces to
Matplotlib. This means that code snippets across the internet are written
using different interfaces, often leading to confusion. The three major
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would try to get to the point faster.

Comment on lines 31 to 33
and fine-tuning end up needing to be done at this level. So even if you
prefer the higher-level interfaces, it is very useful to understand that this
interface exists, and how to access it.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You haven't told them what the "higher-level interface" is yet, this makes me want to jump ahead to find out what it is.

Also is the pyplot interface "higher-level"? I don't think so — it's just implicit.

Comment on lines 47 to 48
fig = plt.figure()
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use subplots here? Either plt.subplots or plt.figure().subplots? Using add_axes with a generic rectangle should be an antipattern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argeed that suplots() is the canonical pattern and should be used here. add_axes is for special purposes.

Comment on lines 61 to 62
for many things (line widths, fonts, colors, etc.), but this interface tells
Matplotlib how to compose the visualization. Of course this paradigm is not
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does the implicit interface not "tell matplotlib how to compose the visualization"?

Comment on lines 63 to 65
pure, in that ``fig.add_axes()`` encapsulates creating many of the objects on
an Axes (spines, labels, ticks) and ``ax.plot()`` encapsulates creating a
line plot. But it is much more "imperative" than something like
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "pure" here; remember that the intended audience should be "scientists who are trying to plot their data in the right place" not "programming paradigm connoisseurs".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, decided to drop this paragraph as not particularly useful...

Comment on lines 72 to 74
The "pyplot" interface arose from Matlab, where one would just call
``plot([1, 2, 3, 4], [0, 0.5, 1, 0.2])`` and a figure with axes would be
created for you. The `~.matplotlib.pyplot` module shadows most of the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this reference to MATLAB and casual snippet of its syntax help introduce the key distinction? I think that making the connection to the MATLAB origins is fine, but it doesn't help introduce the concept, so I'd move it to a side comment after you make the main point.

Comment on lines 89 to 91
it can be at more of a disadvantage - imagine you want to loop through a number
of variables and plot them on separate Axes, it is often more straight-forward
to create the Axes beforehand. It can also become ambiguous when you have
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am a new user who is confused about the distinction between these interfaces, it is going to be hard for me to just imagine what matplotlib is going to do in this situation or why the counterfactual explicit implementation would be more straight-forward.

Comment on lines 92 to 94
multiple Figures or Axes which one is being acted on. In general the
Matplotlib project recommends using the explicit interface, at least for code
you want to share or preserve.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this recommendation in a separate section, rather than as an aside here.

@jklymak
Copy link
Member Author

jklymak commented Dec 7, 2021

Thanks @mwaskom very helpful comments. I agree I wasn't sure which should go first: pyplot or "Axes".

I would still argue that the third "data" style is also confusing to folks and we end up having to deal with that confusion often enough that it is worth explaining here. And more to the point, give folks the map back to basic Matplotlib via "ax" so they can easily see how to customize as per all the other snippets.

If we decide to keep the third section that somewhat skews the discussion to be Axes-first though it is probably possible to do both ways.

@timhoffm
Copy link
Member

timhoffm commented Dec 7, 2021

@jklymak thanks for starting this. This is a long-time item on my todo list. I've collected some ideas and opinions how to present this. Still figuring out how to best contibute them here. 😄

@mwaskom
Copy link

mwaskom commented Dec 8, 2021

I would still argue that the third "data" style is also confusing to folks and we end up having to deal with that confusion often enough that it is worth explaining here.

Yeah, I could see it being in scope to answer this question: "you encourage the interface that calls methods on matplotlib objects, but how am I supposed to do that if using a third party tool?" But I wouldn't get into the (fuzzy) distinction between "imperative" / "declarative" styles or even really express it in terms of a "data interface that can plot itself." (After all, everything you would say about how pandas intersects with with the explicit / implicit interface is also true of seaborn, but that uses functions that consume a dataframe, rather than adding methods to a data container.)

IMO, the way this issue ties into the message of the page is that multiple third party libraries have a general pattern where plotting functions/methods

  • accept an ax kwarg, acting like the explicit interface
  • otherwise grab the "current axes" and plot on it, acting like the the implicit interface
  • return the ax object in either case, so you can further customize the plot the through the explicit interface methods

@jklymak
Copy link
Member Author

jklymak commented Dec 8, 2021

Thanks @mwaskom and @timhoffm I think all your suggestions are easily actionable.

@timhoffm regarding keeping this as-is, it seems the only "permanent" decision here is whether it is in the right place in the doc hierarchy, and if the scope is approximately correct. I'd argue we should have something that is adequate for now, and then can continue to work on it afterward.

Wasn't there a way for us to have two code and figure blocks side-by-side at some point? Having the "Axes" and "pyplot" examples next to each other would be cool.

@jklymak
Copy link
Member Author

jklymak commented Dec 16, 2021

@jklymak jklymak added this to the v3.6.0 milestone Jan 24, 2022
@jklymak
Copy link
Member Author

jklymak commented Jan 24, 2022

Bypassed intersphinx for xarray

I'll push for this to be merged without too much more copyediting. Our standard for doc PRs is not "is it perfect", but rather "is it an improvement". Unless this is wrong or confusing, I think we should add it, and then others can edit to make clearer or more concise.

Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
@jklymak
Copy link
Member Author

jklymak commented Feb 3, 2022

I'll self-merge based on the one review, but if anyone feels this does more harm than good, feel free to revert and we can re-discuss....

@jklymak jklymak merged commit eda1404 into matplotlib:main Feb 3, 2022
@jklymak jklymak deleted the doc-explain-APIs branch February 3, 2022 08:07
@QuLogic
Copy link
Member

QuLogic commented Feb 3, 2022

Oh sorry, I meant to after you had fixed the review comments.

@jklymak
Copy link
Member Author

jklymak commented Feb 3, 2022

Did I miss something?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation: website layout/behavior/styling changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants