Skip to content

improve docstring of Axes.plot #10190

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 9, 2018
Merged

Conversation

timhoffm
Copy link
Member

@timhoffm timhoffm commented Jan 7, 2018

PR Summary

As part of #10148: Docstring update for Axes.plot.

Note

This is quite a monster due to the different ways it can be used. I've moved the description of the format strings into the notes section at the bottom to declutter it a bit. It would even be better to move this description somewhere else in the docs and just reference it. There's other methods that would need the same information.

@timhoffm
Copy link
Member Author

timhoffm commented Jan 7, 2018

For easier review:

Old docs:

New docs:

optional format string. For example, each of the following is
legal::
Plot lines and/or a series of markers. This is also known as line plot
or XY (scatter) plot respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the word 'scatter' here dangerous as we also have a scatter method. I worry about compounding the confusion.

Copy link
Member Author

@timhoffm timhoffm Jan 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the confusion is inherent in the naming of the methods. The general notion of scatter plot in its basic form is what we're doing here. That's why I added the sentence. I've also added the scatter method in the See Also section.

IMO doing both is as good as we can get to prevent confusion. If you find it beneficial, one could add an additional sentence: "Note that matplotlib also offers a scatter method which can additionally change marker properties per data." But that may be overdoing it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the use of "scatter" in here either. "Plot y versus x as lines and/or markers" is how I'd phrase this.

`pandas.DataFame` or a structured numpy array.

.. note::
The signature ``plot(xlabel, ylabel, data=obj)`` without a format
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. If it is ambiguous it will warn, but

In [27]: plt.plot('x', 'bob', data={'x': range(5), 'bob': range(5)})
Out[27]: [<matplotlib.lines.Line2D at 0x7f621f4fe898>]

works as expected.

Copy link
Member Author

@timhoffm timhoffm Jan 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but I didn't want to advertise a signature that issues a warning and suggests to change the code:

plt.plot('x', 'y', data={'x': range(5), 'y': range(5)})
RuntimeWarning: Second argument 'y' is ambiguous: could be a color spec but is in data. Using as data.
Either rename the entry in data or use three arguments to plot.

There was an even more obscure explaination, which frankly I do not understand:

If used with labeled data, make sure that the color spec is not included as an element in data, as otherwise the last case plot("v","r", data={"v":..., "r":...) can be interpreted as the first case which would do plot(v, r) using the default line style and color.

I recon this has something to do with plot(ylabel, fmt, data=data) vs. plot(xlabel, ylabel, data=data), which is not distinguishable. Where as plot(y, fmt) vs. plot(x, y) is implicitly distinguished by the data type of the second argument.

To me that's a symptom of the limited sanity of the API. When a call works with a label 'bob' but not with a label 'y', there's too much magic going on in the background.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is only ambiguous when the keys in data are also valid 'fmt' strings (and in that case we break the ambiguity in favor of pulling from data).

It's not magic, it's attempts to 'do what I mean' which is worse ;)

@tacaswell tacaswell added this to the v2.2 milestone Jan 7, 2018
@tacaswell
Copy link
Member

I am not a fan of this re-origination.

I think there is value in having that block of call signatures at the top.

I also do not like the "Basic" vs "MATLAB" vs "Labeled data" distinctions. Between the first two, the only diffrenece is using explicit keywords or a shorthand string notation for specifying styles (which can be mixed, ax.plot(x, y, 'r-', linewidth=5) works as expected). The third is a layer of de-referencing, but does not fundamentally change anything about how the API works.

Sorry for being negative. You have been doing great work recently!

@timhoffm
Copy link
Member Author

timhoffm commented Jan 7, 2018

Thanks for your feedback. I'm happy to work out how to improve the docs together with you. I will give my motivations and then we can see how to processed.

Sorry if I'm having some harsh words on the API and the docs here and in the comments above. It's been given me really a hard time and that may shine through. Neverteless, I'm committed to make it as accessible for the user as possible. I'm not claiming that my proposal the definitive best version.

  1. Need for better docs: I've been struggling quite a bit with the method in the beginning and I've also seen it with colleagues.

  2. The signature is quite complex and there are many different ways of achieving the same thing (providing the data, specifying the formatting). Some of them can be used alongside each other, some not. The semantics of positional parameters depends on the types of other parameters. IMHO this is not the best API, but it's grown over time and it's like that now. We have to make as much sense of this as possible in the docs.

  3. The current version was a large unstructured block of text with various bits of information scattered throughout.

I consider the following changes essential:

  • add the Parameters, Returns, Additional Parameters sections to give obvious and well known reference points in the docs.
  • Get the extensive description of the format string out of the way (either to Notes or another page). This distracts too much in the bulk of the description.
  • Provide a complete but at the same time instructive set of signatures. The current signatures are insufficient. They are incomplete and more like examples.
  • Provide typical example calls for these signatures. For this complicated API with the many possibilities, it's important that the user get an idea of common use cases.

What might not be essential:

  • The "Basic" vs "MATLAB" vs "Labeled data" is an attempt to order the the ways of providing data and formatting. There might be other ways to organize this information.
  • I'm fine with collecting all signatures at the top (If they really serve as that and not only as an example). Splitting the signatures was a consequence of the former bullet point.

@tacaswell
Copy link
Member

👍 to adding the numpydoc headers. The fmt string docs should stay in the docstring. The short hand is super handy (ex 'b-o' vs color='b', linestyle='-', marker='o')

The semantics of positional parameters depends on the types of other parameters. IMHO this is not the best API, but it's grown over time and it's like that now.

This is the MATLAB inspired API. No matter how painful it is, it is something we can not break, there is way too much code that depends on it.

My understanding of positional arg parsing is that it pulls off arguments of *args, if there is only 1 it treats it as 'y'. If the object has .index that is used as the x, else range(len(y)) is used as the x, if there are two arrays then the first one is used as 'x' and the second one is used for 'y'. If the second arg is a string then it is treated the format string and the logic starts again. If the second argument was an array if the third arg is a string then it is the format string, if it is an array it starts the next cycle. If the third arg was a string the next arg starts the cycle.

I suspect a flow-chart will help...

The data replacement logic happens before any of this logic and only works with up to 3 args and has the 2 arg collision warning. When we added this feature the consensus was that out side of examples the actually collisions will be rare.

@timhoffm
Copy link
Member Author

timhoffm commented Jan 8, 2018

Ok, many of the issues do only arise if there is more than one dataset. What about first describing the one-dataset case and later the changes for multiple data?

The signature could be:

plot([x], y, [fmt], data=None, **kwargs)
plot([x], y, [fmt], [x2], y2, [fmt2], ..., **kwargs)

or a little less precise:

plot([x], y, [fmt], ..., data=None, **kwargs)

That's not an exact signature and needs some explanation in the text, but it's probably instructive enough, so that the users recognize the basic structure and can use it for simple examples without having to understand all the edge cases.
If this is going in the right direction for you, I'd give it a try rewriting the introductory part of the docstring.

I'd rather not put a flowchart. It might be the correct way to exactly describe the signature. However, it's quite intimidating having to read and understand an algorithm just to know which parameters I have to pass in which order to get a simple plot. Though not part of a strictly valid python signature, the [optional] brackets and ellipsis convey the mechanism and semantics more clearly.

@timhoffm
Copy link
Member Author

timhoffm commented Jan 8, 2018

Here we go again. I've used my above suggestion for signature and structuring. Additionally, I've tried to maintain a sequence of 1-2 sentences explanation followed by an example. The examples use the >>> notation to clearly distinguish them from the signature. IMHO it's quite an improvement.

Axes.plot()

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Please take or leave my comments below.

linewidth=2, markersize=12)

When conflicting with *fmt*, these properties are taking precedence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"...the kwarg properties take precedence"

wonderful ways, including full names (``'green'``), hex
strings (``'#008000'``), RGB or RGBA tuples (``(0,1,0,1)``) or
grayscale intensities as a string (``'0.8'``). Of these, the
string specifications can be used in place of a ``fmt`` group,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just say "In addition, colors can be specified using a Matplotlib matplotlib.colors specification." If you want to add a few examples as well, thats fine, but a link to the colors module would help.

optional format string. For example, each of the following is
legal::
Plot lines and/or a series of markers. This is also known as line plot
or XY (scatter) plot respectively.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need the use of "scatter" in here either. "Plot y versus x as lines and/or markers" is how I'd phrase this.

@tacaswell
Copy link
Member

@timhoffm can you please email me?

See Also
--------
scatter : XY scatter plot with markers of variing size and/or color (
sometimes also called bubble chart).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh, I hate the term bubble chart :(

@tacaswell tacaswell merged commit 2abeda1 into matplotlib:master Jan 9, 2018
@timhoffm timhoffm deleted the axes-doc-plot branch January 9, 2018 18:07
@QuLogic QuLogic modified the milestones: needs sorting, v2.2.0 Feb 12, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants