matplotlib · NelleV · Oct 19, 2016 · Oct 16, 2016 · story645 · Oct 16, 2016
diff --git a/doc/devel/MEP/MEP28.rst b/doc/devel/MEP/MEP28.rst
@@ -0,0 +1,326 @@
+=============================================
+ MEP 28: Remove Complexity from Axes.boxplot
+=============================================
+
+.. contents::
+   :local:
+
+Status
+======
+
+..
+.. MEPs go through a number of phases in their lifetime:
+
+ - **Discussion**
+..
+.. - **Progress**: Consensus was reached on the mailing list and
+..  implementation work has begun.
+..
+.. - **Completed**: The implementation has been merged into master.
+..
+.. - **Superseded**: This MEP has been abandoned in favor of another
+..   approach.
+
+Branches and Pull requests
+==========================
+
+Adding pre- & post-processing options to ``cbook.boxplot_stats``: https://github.com/phobson/matplotlib/tree/boxplot-stat-transforms
+Exposing ``cbook.boxplot_stats`` through ``Axes.boxplot`` kwargs: None
+Remove redundant statistical kwargs in ``Axes.boxplot``: None
+Remove redundant style options in ``Axes.boxplot``: None
+Remaining items that arise through discussion: None
+
+Abstract
+========
+
+Over the past few releases, the ``Axes.boxplot`` method has grown in
+complexity to support fully customizable artist styling and statistical
+computation. This lead to ``Axes.boxplot`` being split off into multiple
+parts. The statistics needed to draw a boxplot are computed in
+``cbook.boxplot_stats``, while the actual artists are drawn by ``Axes.bxp``.
+The original method, ``Axes.boxplot`` remains as the most public API that
+handles passing the user-supplied data to ``cbook.boxplot_stats``, feeding
+the results to ``Axes.bxp``, and pre-processing style information for
+each facet of the boxplot plots.
+
+This MEP will outline a path forward to rollback the added complexity
+and simplify the API while maintaining reasonable backwards
+compatibility.
+
+Detailed description
+====================
+
+Currently, the ``Axes.boxplot`` method accepts parameters that allow the
+users to specify medians and confidence intervals for each box that
+will be drawn in the plot. These were provided so that avdanced users
+could provide statistics computed in a different fashion that the simple
+method provided by matplotlib. However, handling this input requires
+complex logic to make sure that the forms of the data structure match what
+needs to be drawn. At the moment, that logic contains 9 separate if/else
+statements nested up to 5 levels deep with a for loop, and may raise up to 2 errors.
+These parameters were added prior to the creation of the ``Axes.bxp`` method,
+which draws boxplots from a list of dictionaries containing the relevant
+statistics. Matplotlib also provides a function that computes these
+statistics via ``cbook.boxplot_stats``. Note that advanced users can now
+either a) write their own function to compute the stats required by
+``Axes.bxp``, or b) modify the output returned by ``cbook.boxplots_stats``
+to fully customize the position of the artists of the plots. With this
+flexibility, the parameters to manually specify only the medians and their
+confidences intervals remain for backwards compatibility.
+
+Around the same time that the two roles of ``Axes.boxplot`` were split into
+``cbook.boxplot_stats`` for computation and ``Axes.bxp`` for drawing, both
+``Axes.boxplot`` and ``Axes.bxp`` were written to accept parameters that
+individually toggle the drawing of all components of the boxplots, and
+parameters that individually configure the style of those artists. However,
+to maintain backwards compatibility, the ``sym`` parameter (previously used
+to specify the symbol of the fliers) was retained. This parameter itself
+requires fairly complex logic to reconcile the ``sym`` parameters with the
+newer ``flierprops`` parameter at the default style specified by ``matplotlibrc``.
+
+This MEP seeks to dramatically simplify the creation of boxplots for
+novice and advanced users alike. Importantly, the changes proposed here
+will also be available to downstream packages like seaborn, as seaborn
+smartly allows users to pass arbitrary dictionaries of parameters through
+the seaborn API to the underlying matplotlib functions.
+
+This will be achieved in the following way:
+
+  1. ``cbook.boxplot_stats`` will be modified to allow pre- and post-
+     computation transformation functions to be passed in (e.g., ``np.log``
+     and ``np.exp`` for lognormally distributed data)
+  2. ``Axes.boxplot`` will be modified to also accept and naïvely pass them
+     to ``cbook.boxplots_stats`` (Alt: pass the stat function and a dict
+     of its optional parameters).
+  3. Outdated parameters from ``Axes.boxplot`` will be deprecated and
+     later removed.
+
+Implementation
+==============
+
+Passing transform functions to ``cbook.boxplots_stats``
+-------------------------------------------------------
+
+This MEP proposes that two parameters (e.g., ``transform_in`` and
+``transform_out`` be added to the cookbook function that computes the
+statistics for the boxplot function. These will be optional keyword-only
+arguments and can easily be set to ``lambda x: x`` as a no-op when omitted
+by the user. The ``transform_in`` function will be applied to the data
+as the ``boxplot_stats`` function loops through each subset of the data
+passed to it. After the list of statistics dictionaries are computed the
+``transform_out`` function is applied to each value in the dictionaries.
+
+These transformations can then be added to the call signature of
+``Axes.boxplot`` with little impact to that method's complexity. This is
+because they can be directly passed to ``cbook.boxplot_stats``.
+Alternatively, ``Axes.boxplot`` could be modified to accept an optional
+statistical function kwarg and a dictionary of parameters to be direcly
+passed to it.
+
+At this point in the implementation users and external libraries like
+seaborn would have complete control via the ``Axes.boxplot`` method. More
+importantly, at the very least, seaborn would require no changes to its
+API to allow users to take advantage of these new options.
+
+Simplifications to the ``Axes.boxplot`` API and other functions
+---------------------------------------------------------------
+
+Simplifying the boxplot method consists primarily of deprecating and then
+removing the redundant parameters. Optionally, a next step would include
+rectifying minor terminological inconsistencies between ``Axes.boxplot``
+and ``Axes.bxp``.
+
+The parameters to be deprecated and removed include:
+
+  1. ``usermedians`` - processed by 10 SLOC, 3 ``if`` blocks, a ``for`` loop
+  2. ``conf_intervals`` - handled by 15 SLOC, 6 ``if`` blocks, a ``for`` loop
+  3. ``sym`` - processed by 12 SLOC, 4 ``if`` blocks
+
+Removing the ``sym`` option allows all code in handling the remaining
+styling parameters to be moved to ``Axes.bxp``. This doesn't remove
+any complexity, but does reinforce the single responsibility principle
+among ``Axes.bxp``, ``cbook.boxplot_stats``, and ``Axes.boxplot``.
+
+Additionally, the ``notch`` parameter could be renamed ``shownotches``
+to be consistent with ``Axes.bxp``. This kind of cleanup could be taken
+a step further and the ``whis``, ``bootstrap``, ``autorange`` could
+be rolled into the kwargs passed to the new ``statfxn`` parameter.
+
+Backward compatibility
+======================
+
+Implementation of this MEP would eventually result in the backwards
+incompatible deprecation and then removal of the keyword parameters
+``usermedians``, ``conf_intervals``, and ``sym``. Cursory searches on
+GitHub indicated that ``usermedians``, ``conf_intervals`` are used by
+few users, who all seem to have a very strong knowledge of matplotlib.
+A robust deprecation cycle should provide sufficient time for these
+users to migrate to a new API.
+
+Deprecation of ``sym`` however, may have a much broader reach into
+the matplotlib userbase.
+
+Schedule
+--------
+An accelerated timeline could look like the following:
+
+#. v2.0.1 add transforms to ``cbook.boxplots_stats``, expose in ``Axes.boxplot``
+#. v2.1.0 deprecate ``usermedians``, ``conf_intervals``, ``sym`` parameters
+#. v2.2.0 make deprecations noisier
+#. v2.3.0 remove ``usermedians``, ``conf_intervals``, ``sym`` parameters
+#. v2.3.0 deprecate ``notch`` in favor of ``shownotches`` to be consistent with other parameters and ``Axes.bxp``
+#. v2.4.0 remove ``notch`` parameter, move all style and artist toggling logic to ``Axes.bxp``. ``Axes.boxplot`` is little more than a broker between ``Axes.bxp`` and ``cbook.boxplots_stats``
+
+
+Anticipated Impacts to Users
+----------------------------
+
+As described above deprecating ``usermedians`` and ``conf_intervals``
+will likely impact few users. Those who will be impacted are almost
+certainly advanced users who will be able to adapt to the change.
+
+Deprecating the ``sym`` option may import more users and effort should
+be taken to collect community feedback on this.
+
+Anticipated Impacts to Downstream Libraries
+-------------------------------------------
+
+The source code (GitHub master as of 2016-10-17) was inspected for
+seaborn and python-ggplot to see if these changes would impact their
+use. None of the parameters nominated for removal in this MEP are used by
+seaborn. The seaborn APIs that use matplotlib's boxplot function allow
+user's to pass arbitrary ``**kwargs`` through to matplotlib's API. Thus
+seaborn users with modern matplotlib installations will be able to take
+full advantage of any new features added as a result of this MEP.
+
+Python-ggplot has implemented its own function to draw boxplots. Therefore,
+no impact can come to it as a result of implementing this MEP.
+
+Alternatives
+============
+
+Variations on the theme
+-----------------------
+
+This MEP can be divided into a few loosely coupled components:
+
+#. Allowing pre- and post-computation tranformation function in ``cbook.boxplot_stats``
+#. Exposing that transformation in the ``Axes.boxplot`` API
+#. Removing redundant statistical options in ``Axes.boxplot``
+#. Shifting all styling parameter processing from ``Axes.boxplot`` to ``Axes.bxp``.
+
+
+With this approach, #2 depends and #1, and #4 depends on #3.
+
+There are two possible approaches to #2. The first and most direct would
+be to mirror the new ``transform_in`` and ``tranform_out`` parameters of
+``cbook.boxplot_stats`` in ``Axes.boxplot`` and pass them directly.
+
+The second approach would be to add ``statfxn`` and ``statfxn_args``
+parameters to ``Axes.boxplot``. Under this implementation, the default
+value of ``statfxn`` would be ``cbook.boxplot_stats``, but users could
+pass their own function. Then ``transform_in`` and ``tranform_out`` would
+then be passed as elements of the ``statfxn_args`` parameter.
+
+.. python:
+   def boxplot_stats(data, ..., transform_in=None, transform_out=None):
+       if transform_in is None:
+           transform_in = lambda x: x
+
+       if transform_out is None:
+           transform_out = lambda x: x
+
+       output = []
+       for _d in data:
+           d = transform_in(_d)
+           stat_dict = do_stats(d)
+           for key, value in stat_dict.item():
+               if key != 'label':
+                   stat_dict[key] = transform_out(value)
+           output.append(d)
+       return output
+
+
+    class Axes(...):
+        def boxplot_option1(data, ..., transform_in=None, transform_out=None):
+            stats = cbook.boxplot_stats(data, ...,
+                                        transform_in=transform_in,
+                                        transform_out=transform_out)
+            return self.bxp(stats, ...)
+
+        def boxplot_option2(data, ..., statfxn=None, **statopts):
+            if statfxn is None:
+                statfxn = boxplot_stats
+            stats = statfxn(data, **statopts)
+            return self.bxp(stats, ...)
+
+Both cases would allow users to do the following:
+
+.. python:
+   fig, ax1 = plt.subplots()
+   artists1 = ax1.boxplot_optionX(data, transform_in=np.log,
+                                  transform_out=np.exp)
+
+
+But Option Two lets a user write a completely custom stat function
+(e.g., ``my_box_stats``) with fancy BCA confidence intervals and the
+whiskers set differently depending on some attribute of the data.
+
+This is available under the current API:
+
+.. python:
+   fig, ax1 = plt.subplots()
+   my_stats = my_box_stats(data, bootstrap_method='BCA',
+                           whisker_method='dynamic')
+   ax1.bxp(my_stats)
+
+And would be more concise with Option Two
+
+.. python:
+   fig, ax = plt.subplots()
+   statopts = dict(transform_in=np.log, transform_out=np.exp)
+   ax.boxplot(data, ..., **statopts)
+
+Users could also pass their own function to compute the stats:
+
+.. python:
+   fig, ax1 = plt.subplots()
+   ax1.boxplot(data, statfxn=my_box_stats, bootstrap_method='BCA',
+               whisker_method='dynamic')
+
+From the examples above, Option Two seems to have only marginal benifit,
+but in the context of downstream libraries like seaborn, its advantage
+is more apparent as the following would be possible without any patches
+to seaborn:
+
+.. python:
+   import seaborn
+   tips = seaborn.load_data('tips')
+   g = seaborn.factorplot(x="day", y="total_bill", hue="sex", data=tips,
+                          kind='box', palette="PRGn", shownotches=True,
+                          statfxn=my_box_stats, bootstrap_method='BCA',
+                          whisker_method='dynamic')
+
+This type of flexibility was the intention behind splitting the overall
+boxplot API in the current three functions. In practice however, downstream
+libraries like seaborn support versions of matplotlib dating back well
+before the split. Thus, adding just a bit more flexibility to the
+``Axes.boxplot`` could expose all the functionality to users of the
+downstream libraries with modern matplotlib installation without intervention
+from the downstream library maintainers.
+
+Doing less
+----------
+
+Another obvious alternative would be to omit the added pre- and post-
+computation transform functionality in ``cbook.boxplot_stats`` and
+``Axes.boxplot``, and simply remove the redundant statistical and style
+parameters as described above.
+
+Doing nothing
+-------------
+
+As with many things in life, doing nothing is an option here. This means
+we simply advocate for users and downstream libraries to take advantage
+of the split between ``cbook.boxplot_stats`` and ``Axes.bxp`` and let
+them decide how to provide an interface to that.
diff --git a/doc/devel/MEP/index.rst b/doc/devel/MEP/index.rst
@@ -29,3 +29,4 @@ Matplotlib Enhancement Proposals
    MEP25
    MEP26
    MEP27
+   MEP28
-Original file line number
+Diff line change
@@ Expand Up / @@ -29,3 +29,4 @@ Matplotlib Enhancement Proposals @@
        MEP25
        MEP26
        MEP27
+       MEP28