@@ -80,7 +80,7 @@ or sample.
80
80
:func: `median ` Median (middle value) of data.
81
81
:func: `median_low ` Low median of data.
82
82
:func: `median_high ` High median of data.
83
- :func: `median_grouped ` Median, or 50th percentile, of grouped data.
83
+ :func: `median_grouped ` Median ( 50th percentile) of grouped data.
84
84
:func: `mode ` Single mode (most common value) of discrete or nominal data.
85
85
:func: `multimode ` List of modes (most common values) of discrete or nominal data.
86
86
:func: `quantiles ` Divide data into intervals with equal probability.
@@ -381,55 +381,56 @@ However, for reading convenience, most of the examples show sorted sequences.
381
381
be an actual data point rather than interpolated.
382
382
383
383
384
- .. function :: median_grouped(data, interval=1)
384
+ .. function :: median_grouped(data, interval=1.0 )
385
385
386
- Return the median of grouped continuous data, calculated as the 50th
387
- percentile, using interpolation. If * data * is empty, :exc: ` StatisticsError `
388
- is raised. * data * can be a sequence or iterable .
386
+ Estimates the median for numeric data that has been ` grouped or binned
387
+ <https://en.wikipedia.org/wiki/Data_binning> `_ around the midpoints
388
+ of consecutive, fixed-width intervals .
389
389
390
- .. doctest ::
390
+ The *data * can be any iterable of numeric data with each value being
391
+ exactly the midpoint of a bin. At least one value must be present.
391
392
392
- >>> median_grouped([52 , 52 , 53 , 54 ])
393
- 52.5
393
+ The *interval * is the width of each bin.
394
394
395
- In the following example, the data are rounded, so that each value represents
396
- the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5--1.5, 2
397
- is the midpoint of 1.5--2.5, 3 is the midpoint of 2.5--3.5, etc. With the data
398
- given, the middle value falls somewhere in the class 3.5--4.5, and
399
- interpolation is used to estimate it:
395
+ For example, demographic information may have been summarized into
396
+ consecutive ten-year age groups with each group being represented
397
+ by the 5-year midpoints of the intervals:
400
398
401
399
.. doctest ::
402
400
403
- >>> median_grouped([1 , 2 , 2 , 3 , 4 , 4 , 4 , 4 , 4 , 5 ])
404
- 3.7
405
-
406
- Optional argument *interval * represents the class interval, and defaults
407
- to 1. Changing the class interval naturally will change the interpolation:
401
+ >>> from collections import Counter
402
+ >>> demographics = Counter({
403
+ ... 25 : 172 , # 20 to 30 years old
404
+ ... 35 : 484 , # 30 to 40 years old
405
+ ... 45 : 387 , # 40 to 50 years old
406
+ ... 55 : 22 , # 50 to 60 years old
407
+ ... 65 : 6 , # 60 to 70 years old
408
+ ... })
409
+ ...
410
+
411
+ The 50th percentile (median) is the 536th person out of the 1071
412
+ member cohort. That person is in the 30 to 40 year old age group.
413
+
414
+ The regular :func: `median ` function would assume that everyone in the
415
+ tricenarian age group was exactly 35 years old. A more tenable
416
+ assumption is that the 484 members of that age group are evenly
417
+ distributed between 30 and 40. For that, we use
418
+ :func: `median_grouped `:
408
419
409
420
.. doctest ::
410
421
411
- >>> median_grouped([1 , 3 , 3 , 5 , 7 ], interval = 1 )
412
- 3.25
413
- >>> median_grouped([1 , 3 , 3 , 5 , 7 ], interval = 2 )
414
- 3.5
415
-
416
- This function does not check whether the data points are at least
417
- *interval * apart.
418
-
419
- .. impl-detail ::
420
-
421
- Under some circumstances, :func: `median_grouped ` may coerce data points to
422
- floats. This behaviour is likely to change in the future.
423
-
424
- .. seealso ::
422
+ >>> data = list (demographics.elements())
423
+ >>> median(data)
424
+ 35
425
+ >>> round (median_grouped(data, interval = 10 ), 1 )
426
+ 37.5
425
427
426
- * "Statistics for the Behavioral Sciences", Frederick J Gravetter and
427
- Larry B Wallnau (8th Edition).
428
+ The caller is responsible for making sure the data points are separated
429
+ by exact multiples of *interval *. This is essential for getting a
430
+ correct result. The function does not check this precondition.
428
431
429
- * The `SSMEDIAN
430
- <https://help.gnome.org/users/gnumeric/stable/gnumeric.html#gnumeric-function-SSMEDIAN> `_
431
- function in the Gnome Gnumeric spreadsheet, including `this discussion
432
- <https://mail.gnome.org/archives/gnumeric-list/2011-April/msg00018.html> `_.
432
+ Inputs may be any numeric type that can be coerced to a float during
433
+ the interpolation step.
433
434
434
435
435
436
.. function :: mode(data)
0 commit comments