11
11
..
12
12
.. MEPs go through a number of phases in their lifetime:
13
13
14
- - **Discussion**
14
+ **Discussion **
15
15
..
16
16
.. - **Progress**: Consensus was reached on the mailing list and
17
17
.. implementation work has begun.
@@ -95,6 +95,44 @@ This will be achieved in the following way:
95
95
3. Outdated parameters from ``Axes.boxplot `` will be deprecated and
96
96
later removed.
97
97
98
+ Importance
99
+ ----------
100
+
101
+ Since the limits of the whiskers are computed arithmetically, there
102
+ is an implicit assumption of normality in box and whisker plots.
103
+ This primarily affects which data points are classified as outliers.
104
+
105
+ Allowing transformations to the data and the results used to draw
106
+ boxplots will allow users to opt-out of that assumption if the
107
+ data are known to not fit a normal distribution.
108
+
109
+ Below is an example of how ``Axes.boxplot `` classifies outliers of lognormal
110
+ data differently depending one these types of transforms.
111
+
112
+ .. plot ::
113
+ :include-source: true
114
+
115
+ import numpy as np
116
+ import matplotlib.pyplot as plt
117
+ from matplotlib import cbook
118
+ np.random.seed(0)
119
+
120
+ fig, ax = plt.subplots(figsize=(4, 6))
121
+ ax.set_yscale('log')
122
+ data = np.random.lognormal(-1.75, 2.75, size=37)
123
+
124
+ stats = cbook.boxplot_stats(data, labels=['arimetic'])
125
+ logstats = cbook.boxplot_stats(np.log(data), labels=['log-transformed'])
126
+
127
+ for lsdict in logstats:
128
+ for key, value in lsdict.items():
129
+ if key != 'label':
130
+ lsdict[key] = np.exp(value)
131
+
132
+ stats.extend(logstats)
133
+ ax.bxp(stats)
134
+ fig.show()
135
+
98
136
Implementation
99
137
==============
100
138
@@ -209,7 +247,6 @@ This MEP can be divided into a few loosely coupled components:
209
247
#. Removing redundant statistical options in ``Axes.boxplot ``
210
248
#. Shifting all styling parameter processing from ``Axes.boxplot `` to ``Axes.bxp ``.
211
249
212
-
213
250
With this approach, #2 depends and #1, and #4 depends on #3.
214
251
215
252
There are two possible approaches to #2. The first and most direct would
@@ -222,7 +259,8 @@ value of ``statfxn`` would be ``cbook.boxplot_stats``, but users could
222
259
pass their own function. Then ``transform_in `` and ``tranform_out `` would
223
260
then be passed as elements of the ``statfxn_args `` parameter.
224
261
225
- .. python:
262
+ .. code :: python
263
+
226
264
def boxplot_stats (data , ..., transform_in = None , transform_out = None ):
227
265
if transform_in is None :
228
266
transform_in = lambda x : x
@@ -256,7 +294,8 @@ then be passed as elements of the ``statfxn_args`` parameter.
256
294
257
295
Both cases would allow users to do the following:
258
296
259
- .. python:
297
+ .. code :: python
298
+
260
299
fig, ax1 = plt.subplots()
261
300
artists1 = ax1.boxplot_optionX(data, transform_in = np.log,
262
301
transform_out = np.exp)
@@ -268,22 +307,25 @@ whiskers set differently depending on some attribute of the data.
268
307
269
308
This is available under the current API:
270
309
271
- .. python:
310
+ .. code :: python
311
+
272
312
fig, ax1 = plt.subplots()
273
313
my_stats = my_box_stats(data, bootstrap_method = ' BCA' ,
274
314
whisker_method = ' dynamic' )
275
315
ax1.bxp(my_stats)
276
316
277
317
And would be more concise with Option Two
278
318
279
- .. python:
319
+ .. code :: python
320
+
280
321
fig, ax = plt.subplots()
281
322
statopts = dict (transform_in = np.log, transform_out = np.exp)
282
323
ax.boxplot(data, ... , ** statopts)
283
324
284
325
Users could also pass their own function to compute the stats:
285
326
286
- .. python:
327
+ .. code :: python
328
+
287
329
fig, ax1 = plt.subplots()
288
330
ax1.boxplot(data, statfxn = my_box_stats, bootstrap_method = ' BCA' ,
289
331
whisker_method = ' dynamic' )
@@ -293,7 +335,8 @@ but in the context of downstream libraries like seaborn, its advantage
293
335
is more apparent as the following would be possible without any patches
294
336
to seaborn:
295
337
296
- .. python:
338
+ .. code :: python
339
+
297
340
import seaborn
298
341
tips = seaborn.load_data(' tips' )
299
342
g = seaborn.factorplot(x = " day" , y = " total_bill" , hue = " sex" , data = tips,
0 commit comments