-
-
Notifications
You must be signed in to change notification settings - Fork 7.8k
DOC: attempt to explain the main different APIs #21877
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ping @mwaskom for comment and suggestions as well, as you have previously commented on the need for such a page. |
Did you perhaps forget to git add the new file? |
Ooops! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jklymak for starting this — I think that alleviating this very common source of confusion will be incredibly helpful for matplotlib users.
I went through and made some comments, although I didn't get to the end before I needed to get to actual work. I am not a matplotlib dev and of course you don't need my approval, but as you did ask for my feedback, I would say that this could probably be made both more focused and more accessible. IMO the current version tries to cover too many concepts and isn't as clear as it could be about the key points that it wants to make.
A couple high-level structural suggestions:
- As mentioned in the line comments, I would drop the discussion of the pandas API and the "imperative / declarative" distinction from this page.
- The primary audience for this page is probably users of the implicit interface who need some education / persuading about the explicit interface. So it might make sense to start with the implicit interface, explain what's happening "behind the scenes" in an example plot, and then show how the Axes interface makes that behind-the-scenes work explicit.
- Users would probably benefit from a structured "pros and cons" table
doc/users/explain/api_interfaces.rst
Outdated
- a number of downstream libraries offer a | ||
`declarative <https://en.wikipedia.org/wiki/Declarative_programming>`_ | ||
interface, usually where a data object has a ``plot`` method implemented | ||
that will plot the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest not including this third category here. It is complicated enough to explain (1) whether the target figure/axes is explicit or implicit and (2) the subtle differences in the configuration functions/methods across those two APIs. Introducing the imperative/declarative distinction and third-party libraries that users may or may not have encountered only serves to distract from that key message, IMO.
I'm also not sure I would even call the pandas/xarray interfaces "declarative". They are basically just semantic sugar that convert
df.scatter(x="column a", y="column b", ax=ax)
to
ax.scatter(x=data["column a"], y=data["column b"])
This is distinct from the declarative nature of an interface like gpplot or seaborn, where you could say
scatter(data=df, x="column a", y="column b", color="column c")
and the arguments to parameters like color
would be data values, rather than rgb tuples.
Opinions about how to categorize these may differ (you could argue that the groupby/faceting interface in pandas plotting is more declarative), but it's a pretty fuzzy distinction IMO and that fuzziness only adds unnecessary complexity here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's worthwhile to discuss this category
- This should be clearly separated from native Matplotlib think
- Matplotlib
- Axes
- pyplot
- Third party
- Matplotlib
- I agree that we should not call it declarative. If we don't find a better name the section could be titled "How third party libraries provide plotting functions".
- Not sure if the synthetic example is easy to transfer. I'd go with the probably most used third-party lib and make a pandas example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wrt to the synthetic example, we do not have pandas (or xarray) as a doc dependency, so we would need to agree to add that. The only issue there might be the temptation to add pandas examples everywhere, which I think would be a mistake.
doc/users/explain/api_interfaces.rst
Outdated
using different interfaces, often leading to confusion. The three major | ||
interfaces are: | ||
|
||
- an explicit interface that uses methods on a Figure or Axes object to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While I think converging on a standardized nomenclature for the two interfaces and using them consistently is good and important, this page should introduce the various synonyms as well, because you want people to get here when they google "matplotlib object oriented interface", and you want people to be able to map whatever concepts they learn here onto the terms they see elsewhere.
doc/users/explain/api_interfaces.rst
Outdated
Python is a flexible language allowing different design choices when | ||
creating an application programming interface (API). Matplotlib | ||
has seen two of these over the years (or more, depending on how you | ||
count). In addition, external libraries have their own interfaces to | ||
Matplotlib. This means that code snippets across the internet are written | ||
using different interfaces, often leading to confusion. The three major |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to get to the point faster.
doc/users/explain/api_interfaces.rst
Outdated
and fine-tuning end up needing to be done at this level. So even if you | ||
prefer the higher-level interfaces, it is very useful to understand that this | ||
interface exists, and how to access it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You haven't told them what the "higher-level interface" is yet, this makes me want to jump ahead to find out what it is.
Also is the pyplot interface "higher-level"? I don't think so — it's just implicit.
doc/users/explain/api_interfaces.rst
Outdated
fig = plt.figure() | ||
ax = fig.add_axes([0.1, 0.1, 0.8, 0.8]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use subplots
here? Either plt.subplots
or plt.figure().subplots
? Using add_axes
with a generic rectangle should be an antipattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argeed that suplots()
is the canonical pattern and should be used here. add_axes
is for special purposes.
doc/users/explain/api_interfaces.rst
Outdated
for many things (line widths, fonts, colors, etc.), but this interface tells | ||
Matplotlib how to compose the visualization. Of course this paradigm is not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does the implicit interface not "tell matplotlib how to compose the visualization"?
doc/users/explain/api_interfaces.rst
Outdated
pure, in that ``fig.add_axes()`` encapsulates creating many of the objects on | ||
an Axes (spines, labels, ticks) and ``ax.plot()`` encapsulates creating a | ||
line plot. But it is much more "imperative" than something like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean by "pure" here; remember that the intended audience should be "scientists who are trying to plot their data in the right place" not "programming paradigm connoisseurs".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, decided to drop this paragraph as not particularly useful...
doc/users/explain/api_interfaces.rst
Outdated
The "pyplot" interface arose from Matlab, where one would just call | ||
``plot([1, 2, 3, 4], [0, 0.5, 1, 0.2])`` and a figure with axes would be | ||
created for you. The `~.matplotlib.pyplot` module shadows most of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this reference to MATLAB and casual snippet of its syntax help introduce the key distinction? I think that making the connection to the MATLAB origins is fine, but it doesn't help introduce the concept, so I'd move it to a side comment after you make the main point.
doc/users/explain/api_interfaces.rst
Outdated
it can be at more of a disadvantage - imagine you want to loop through a number | ||
of variables and plot them on separate Axes, it is often more straight-forward | ||
to create the Axes beforehand. It can also become ambiguous when you have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I am a new user who is confused about the distinction between these interfaces, it is going to be hard for me to just imagine what matplotlib is going to do in this situation or why the counterfactual explicit implementation would be more straight-forward.
doc/users/explain/api_interfaces.rst
Outdated
multiple Figures or Axes which one is being acted on. In general the | ||
Matplotlib project recommends using the explicit interface, at least for code | ||
you want to share or preserve. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this recommendation in a separate section, rather than as an aside here.
Thanks @mwaskom very helpful comments. I agree I wasn't sure which should go first: pyplot or "Axes". I would still argue that the third "data" style is also confusing to folks and we end up having to deal with that confusion often enough that it is worth explaining here. And more to the point, give folks the map back to basic Matplotlib via "ax" so they can easily see how to customize as per all the other snippets. If we decide to keep the third section that somewhat skews the discussion to be Axes-first though it is probably possible to do both ways. |
@jklymak thanks for starting this. This is a long-time item on my todo list. I've collected some ideas and opinions how to present this. Still figuring out how to best contibute them here. 😄 |
Yeah, I could see it being in scope to answer this question: "you encourage the interface that calls methods on matplotlib objects, but how am I supposed to do that if using a third party tool?" But I wouldn't get into the (fuzzy) distinction between "imperative" / "declarative" styles or even really express it in terms of a "data interface that can plot itself." (After all, everything you would say about how pandas intersects with with the explicit / implicit interface is also true of seaborn, but that uses functions that consume a dataframe, rather than adding methods to a data container.) IMO, the way this issue ties into the message of the page is that multiple third party libraries have a general pattern where plotting functions/methods
|
Thanks @mwaskom and @timhoffm I think all your suggestions are easily actionable. @timhoffm regarding keeping this as-is, it seems the only "permanent" decision here is whether it is in the right place in the doc hierarchy, and if the scope is approximately correct. I'd argue we should have something that is adequate for now, and then can continue to work on it afterward. Wasn't there a way for us to have two code and figure blocks side-by-side at some point? Having the "Axes" and "pyplot" examples next to each other would be cool. |
7b23e9e
to
37d1197
Compare
I can't inter sphinx xarray for some reason, but |
d00e472
to
257494f
Compare
Bypassed intersphinx for xarray I'll push for this to be merged without too much more copyediting. Our standard for doc PRs is not "is it perfect", but rather "is it an improvement". Unless this is wrong or confusing, I think we should add it, and then others can edit to make clearer or more concise. |
Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
I'll self-merge based on the one review, but if anyone feels this does more harm than good, feel free to revert and we can re-discuss.... |
Oh sorry, I meant to after you had fixed the review comments. |
Did I miss something? |
PR Summary
This is an initial attempt to explain the different Matplotlib APIs. Note the terminology here is suggested by @timhoffm "Axes", "pyplot" and I've suggested "Data" for the third pandas/xarray interface.
PR Checklist
Tests and Styling
pytest
passes).flake8-docstrings
and runflake8 --docstring-convention=all
).Documentation
doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).