Skip to content

Draft version of MEP28: Simplification of boxplots #7282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 19, 2016

Conversation

phobson
Copy link
Member

@phobson phobson commented Oct 16, 2016

This a MEP for a series of small changes that will

  1. Remove much of the the complexity (implementation, and API) from cbook.boxplot_stats, Axes.boxplot, and Axes.bxp
  2. Provide complete control over how the statistics are computed to outside libraries like seaborn with no action required on their part
  3. Remove some old parameter in a backwards incompatible fashion for a more consistent API.

importantly, at the very least, seaborn would require no changes to its
API to allow users to take advantage of these new options.

Simiplifications to the ``Axes.boxplot`` API and other functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small typo : Simiplifications -> simplifications

Copy link
Member

@NelleV NelleV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a couple of very small typo fixes.
I'll do a second review once I have enough caffeine flowing…

:local:


This MEP template is a guideline of the sections that a MEP should
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you are meant to remove this :)

======================

Implementation of this MEP would eventually result in the backwards
incompatible deprecation and then removeal of the keyword parameters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo : removeal -> removal

``usermedians``, ``conf_intervals``, and ``sym``. Curosry searches on
GitHub indicated that ``usermedians``, ``conf_intervals`` are used by
few users, who all seem to have a very strong knowledge of matplotlib.
A robust deprecation cycle should provide sufficient time for these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we are have discussion about one release cycle versus two release cycle on the finance module right now, it might be worth being specific here.

arguments and can easily be set to ``lambda x: x`` as a no-op when omitted
by the user. The ``transform_in`` function will be applied to the data
as the ``boxplot_stats`` function loops through each subset of the data
pass to it. After the list of statistics dictionaries are computed the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: passed not pass

This MEP seeks to dramatically simplify the creation of boxplots for
novice and advanced users alike. Importantly, the changes proposed here
will also be available to downstream packages like seaborn, as seaborn
smartly allows users to pass arbitrary dictionaries of parameters through
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, a little unclear about scope delineation here. What does seaborn do above and beyond mpl? And how is this not creeping into seaborn's territory.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought is that these few changes should be implemented in such a way that seaborn users have access to them without seaborn needing to change at all. Shooting for simplicity and flexibility for users and downstream libraries

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, so you want to maintain downstream use, but you plan to remove **kwargs. Is there verification that seaborn doesn't use those args? Granted, this seems like it'd be a pretty small PR on seaborn if it does change things with them..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seaborn uses very few parameters and just passes everything else from the user as **kws. Also instances of call matplotlib's boxplot API are below:

  1. https://github.com/mwaskom/seaborn/blob/dd8ab4985eadba43c94542cfad56a78dcfefef31/seaborn/categorical.py#L468
  2. https://github.com/mwaskom/seaborn/blob/dd8ab4985eadba43c94542cfad56a78dcfefef31/seaborn/categorical.py#L497

Next step would be to checkout yhat's python-ggplot

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some ways that's probably good in that it means you don't have to worry about downstream support with ggplot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're absolutely right about that. i guess it just makes me feel insecure about our implementation.

2. v2.1.0 deprecate ``usermedians``, ``conf_intervals``, ``sym`` parameters
3. v2.2.0 make deprecations noisier (???)
3. v2.3.0 remove ``usermedians``, ``conf_intervals``, ``sym`` parameters
4. v2.3.0 deprecate ``notch`` in favor of ``shownotches`` to be consistent with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show_notches?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, the inconsistency irks me. Like half the things, like conf_intervals put an _ between separate words and half, like shownotches don't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I accept full responsibility for that

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any interest in deprecating args to move towards consistency?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which did you have in mind? that was my intention with moving from notch to shownotches

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving stuff like shownotches to show_notches since that seems to be more pythonic.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be open minded about that, my initial inclination is to leave out the underscores.

@@ -179,3 +187,17 @@ This MEP can be divided into a few loosely coupled components:

With this approach, #2 depends and #1, and #4 depends on #3.

There are two possible approaches to #2. The first and most direct would
be to minor the new ``transform_in`` and ``tranform_out`` parameters of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not clear what "to minor" means here.

be to minor the new ``transform_in`` and ``tranform_out`` parameters of
``cbook.boxplot_stats`` in ``Axes.boxplot`` and pass them directly.

The second approach would be to add ``statfxn`` and ``statfxn_args``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance of psuedocode like examples of the two approaches?

@phobson
Copy link
Member Author

phobson commented Oct 17, 2016

@story645 I've added some examples of what the final boxplot API might look like.

The second approach would be to add ``statfxn`` and ``statfxn_args``
parameters to ``Axes.boxplot``. Under this implementation, the default
value of ``statfxn`` would be ``cbook.boxplot_stats``, but users could
pass their own function. Then `transform_in`` and ``tranform_out`` would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing a backtick ` here.


Using matplotlib's stats function, this would look similar to this:

.. python:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be very annoying, ( 😄 ) could you add an example of how you would do this with the current API?
The difference would be more striking.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NelleV great suggestion. done.

@NelleV NelleV changed the title WIP: first cut at boxplot MEP WIP: first cut at boxplot MEP28 Oct 17, 2016
@NelleV
Copy link
Member

NelleV commented Oct 17, 2016

My 2 cents is that we should merge this as "draft" quite quickly, and have discussion on the mailing list on the details.

@phobson
Copy link
Member Author

phobson commented Oct 17, 2016

@NelleV I'm going to clean this up tonight to make sure that the docs build (my rst is pretty poor). After that I'll remove the WIP and squash everything so far.

Abstract
========

Over the past few releases, the ``Axes.boxplot`` method as grown in
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as -> has

========

Over the past few releases, the ``Axes.boxplot`` method as grown in
complexity to support fully customizable artist styline and statistics
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

styline -> styling?
statistics -> statistical

the results to ``Axes.bxp``, and pre-processing style information for
each facet of the boxplot plots.

This MEP will outline a path forward to roll-back the added complexity
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No dash.

users to specify medians and confidence intervals for each box that
will be drawn in the plot. These were provided so that avdanced users
could provide statistics computed in a different fashion that the simple
method provided by matplotlob. However, handling this input requires
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

matplotlob -> matplotlib

could provide statistics computed in a different fashion that the simple
method provided by matplotlob. However, handling this input requires
complex logic to make sure that the forms of the data structure match what
needs to be drawn. At the moment, that logic contain 9 separate if/else
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

contain -> contains

With this approach, #2 depends and #1, and #4 depends on #3.

There are two possible approaches to #2. The first and most direct would
be to minor the new ``transform_in`` and ``tranform_out`` parameters of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor -> mirror?

The second approach would be to add ``statfxn`` and ``statfxn_args``
parameters to ``Axes.boxplot``. Under this implementation, the default
value of ``statfxn`` would be ``cbook.boxplot_stats``, but users could
pass their own function. Then `transform_in`` and ``tranform_out`` would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a backtick.
tranform_out -> transform_out

parameters to ``Axes.boxplot``. Under this implementation, the default
value of ``statfxn`` would be ``cbook.boxplot_stats``, but users could
pass their own function. Then `transform_in`` and ``tranform_out`` would
then be passes as elements of the ``statfxn_args`` parameter.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passes -> passed

if key != 'label':
s[key] = np.exp(value)

ax.bmp(stats, ...)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bmp -> bxp?

Doing less
----------

Another obvious alternative would be to omitted the added pre- and post-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

omitted -> omit

@NelleV
Copy link
Member

NelleV commented Oct 18, 2016

@phobson the doc build is still not happening. If you have everything installed, the quickest by be to test on your computer with the parallelization option.

@phobson
Copy link
Member Author

phobson commented Oct 18, 2016

@NelleV Correct -- I need to include the API clarifications in my long comment above and then I'll get the doc builds working.

Sorry this is dragging out. Relearning transportation and structural engineering + the day job are filling my plate.

@phobson phobson changed the title WIP: first cut at boxplot MEP28 Draft version of MEP28: Simplification of boxplots Oct 19, 2016
@phobson
Copy link
Member Author

phobson commented Oct 19, 2016

@QuLogic @NelleV @story645 I don't understand they appveyor failure, but the HTML doc build is working. In that sense, I think this is a ready for a "final" review to as a first draft for public discussion.

@NelleV NelleV merged commit accd403 into matplotlib:master Oct 19, 2016
@NelleV
Copy link
Member

NelleV commented Oct 19, 2016

Let's move the discussion to somewhere more public.

@phobson phobson deleted the MEP-bxp branch October 19, 2016 15:43
@phobson
Copy link
Member Author

phobson commented Oct 19, 2016

@NelleV thanks!

@QuLogic
Copy link
Member

QuLogic commented Oct 19, 2016

I guess it only fitting that @phobson sends the email to the mailing list.

@phobson
Copy link
Member Author

phobson commented Oct 19, 2016

@QuLogic I'll do that after #7309

@QuLogic QuLogic added this to the 2.1 (next point release) milestone Oct 20, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants