Chapter 3, Part A Descriptive Statistics: Numerical Measures
Chapter 3, Part A Descriptive Statistics: Numerical Measures
Chapter 3, Part A Descriptive Statistics: Numerical Measures
x́=
∑ x i = 4 7,280 = 3,940
Descriptive Statistics: Numerical Measures n 12
Measures of Location
Measures of Variability Median
The median of a data set is the value in the
Numerical Measures middle when the data items are arranged in
If the measures are computed for data from a ascending order.
sample, they are called sample statistics. Whenever a data set has extreme values,
If the measures are computed for data from a median is the preferred measure of central
population, they are called population parameters. location.
A sample statistic is referred to as the point The median is the measure of location most
estimator of the corresponding population often reported for annual income and property
parameter. value data.
Measures of Location A few extremely large incomes or property
- Mean values can inflate the mean.
- Median For an odd number of observations:7 observations
- Mode
- Weighted Mean
- Geometric Mean
- Percentiles
In ascending order
- Quartiles
Median is the middle value; Median = 19
For an even number of observations:
Mean
8 observations
Perhaps the most important measure of location is the
Median is the average of the middle two values.
mean.
Median = (19 + 26)/2 = 22.5
The mean provides a measure of
central location.
Example: Monthly Starting Salary
The mean of a data set is the average of
Averaging the 6th and 7th data values:
all the data values.
Median = (3,890+ 3,920)/2 = 3,905
The sample mean x́ is the point
estimator of the population mean µ.
Sample Mean x́
x́=
∑ xi
n
where:
Sxi = sum of the values of the n observations
n = number of observations in the sample
Percentiles
A percentile provides information about how
the data are spread over the interval from the
smallest value to the largest value.
Weighted Mean
Admission test scores for colleges and
In some instances the mean is computed by
universities are frequently reported in terms of
giving each observation a weight that reflects its
percentiles.
relative importance.
The pth percentile of a data set is a value such
The choice of weights depends on the
that at least p percent of the items take on this
application.
value or less and at least (100 - p) percent of the
The weights might be the number of credit
items take on this value or more.
hours earned for each grade, as in GPA.
In other weighted mean computations,
Arrange the data in ascending order.
quantities such as pounds, dollars, or volume are
Compute Lp, the location of the pth percentile.
frequently used.
x́=
∑ w i xi Lp = (p/100)(n + 1)
∑ wi th
80 Percentile
where: xi = value of observation i Example: Monthly Starting Salary
wi = weight for observation I Lp = (p/100)(n + 1) = (80/100)(12 + 1) = 10.4
Numerator: sum of the weighted data values (the 10th value plus .4 times the difference between the
Denominator: sum of the weights 11th and 10th values)
80th Percentile = 4050 + 0.4 (4130 – 4050) = 4082
If data is from a population, m replaces x́ .
x́ =
∑ wi x i 18,500
= = 2.96 = $2.96
∑ wi 6,250
FYI, equally-weighted (simple) mean = $3.07
Quartiles
Quartiles are specific percentiles. Interquartile Range
The interquartile range of a data set is the
First Quartile = 25th Percentile
Second Quartile = 50th Percentile = Median difference between the third quartile and the first
quartile.
Third Quartile = 75th Percentile
It is the range for the middle 50% of the data.
It overcomes the sensitivity to extreme data
Third Quartile (75th Percentile)
Example: Monthly Starting Salary values.
Example: Monthly Starting Salary
Lp = (p/100)(n + 1) = (75/100)(12 + 1) = 9.75
(the 9th value plus .75 times the difference between 3rd Quartile (Q3) = 4,000
1st Quartile (Q1) = 3,865
the 10th and 9th values)
Third quartile = 3950 + .75(4050 – 3950) = 4025 IQR = Q3 - Q1 = 4,000 – 3,865 = 135
Coefficient of Variation
The coefficient of variation indicates how large
the standard deviation is in relation to the mean.
The coefficient of variation is computed as follows:
s 2. Moderately Skewed Right
for a sample [ x́ ]
x 100 %
- Skewness is positive
σ - Mean will usually be more than the median.
for a population [ μ ]
x 100 %
s2 =
∑ ( x i−x́ ) 2 = 27,440.91
n−1
Standard Deviation
3. Highly Skewed Right
s = √ s 2=√ 27,440.91=¿ 165.65
- Skewness is positive (often above 1.0).
Coefficient of Variation - Mean will usually be more than the median.
s 165.65
[ x́ ] [
x 100 % =
3,940 ]
x 100 %=4.2 %
Chapter 3, Part B
Descriptive Statistics: Numerical Measures
Chebyshev’s Theorem
At least (1 - 1/z2) of the items in any data set
will be within z standard deviations of the mean,
where z is any value greater than 1.
Chebyshev’s theorem requires z > 1, but z need
not be an integer. Detecting Outliers:
At least 75% of the data values must be within z An outlier is an unusually small or unusually
= 2 standard deviations of the mean. large value in a data set.
At least 89% of the data values must be within z A data value with a z-score less than -3 or
= 3 standard deviations of the mean. greater than +3 might be considered an outlier.
At least 94% of the data values must be within z It might be:
= 4 standard deviations of the mean. an incorrectly recorded data value
Example: Marks of students - Suppose the a data value that was incorrectly
marks of 100 students in a course had a mean of included in the data set
70 and a standard deviation of 5. We want to know a correctly recorded unusual data value
the number of students having test scores between that belongs in the data set
60 and 80. Example: Class Size data
60 and 80 are 2 standard deviations below and above
the mean respectively.
- 60 = 70 – 2(5) s
- 80 = 70 + 2(5)
- Z = 75%
( x i ¿−x́)( y i − ý)
Sample Covariance s xy = ∑ ¿ = 99
n−1
/ 9 = 11
Data Dashboards:
Adding Numerical Measures to Improve Effectiveness