Part 2-Chapter 3 - Describing Data - Edit
Part 2-Chapter 3 - Describing Data - Edit
Part 2-Chapter 3 - Describing Data - Edit
Chapter 3
Describing Data Using Numerical
Measures
Chapter Goals
2
Chapter Topics
• Measures of Center and Location
o Mean, median, mode, geometric mean, midrange
• Other measures of Location
o Weighted mean, percentiles, quartiles
• Measures of Variation
o Range, interquartile range, variance and standard deviation, coefficient of
variation
3
Summary Measures
Describing Data Numerically
Mode Variance
Coefficient of Variation
4
Measures of Center and Location
Overview
Center and Location
𝒙𝒊 𝐰𝐢𝐱𝐢
𝐗
𝒊 𝟏
𝐗w
𝒏 𝐰𝐢
𝑵
𝒙𝒊 𝐰𝐢𝐱𝐢
𝒊 𝟏 µw
𝑵 𝐰𝐢 5
Mean (Arithmetic Average)
• The Mean is the arithmetic average of data values
𝒙𝒊
𝒊 𝟏
x1 x2 ⋯ xn
𝐗 n = n
o Population mean N= Population Size
𝑵
𝒙𝒊
𝒊 𝟏 x1 x2 ⋯ xN
𝑵 = N 6
Mean (Arithmetic Average)
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5 7
Median
• Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 3
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5
Mean = 5 No mode
9
Weighted Mean
• Used when values are grouped by frequency or relative importance
10
Review Example
• Five houses on a hill by the beach
• House Prices:
o $2,000,000 $2,000 K
o $ 500,000
o $ 300,000
o $ 100,000 $300 K
$500 K
o $ 100,000
$100 K
$100 K
11
Summary Statistics
12
Which measure of location is the “best”?
13
Shape of a Distribution
Mean < Median < Mode Mean < Median < Mode Mean < Median < Mode
Percentiles Quartiles
• The pth percentile in an ordered array of n values is the value in ith position,
where
i= n 1
i= n 1 19 1 =12
16
Quartiles
• Quartiles split the ranked data into 4 equal groups
25% 25% 25% 25%
Q1 Q2 Q3
• Example: Find the first quartile
(n = 9)
Q1 = 25th percentile, so find the 9 1 =2.5 position
so use the value half way between the 2nd and 3rd values,
so 𝐐𝟏 𝟏𝟐. 𝟓 17
Box and Whisker Plot
• Example:
Quartile Quartile
18
Shape of Box and Whisker Plots
• The Box and central line are centered between the endpoints if data is
symmetric around the median
• A Box and Whisker plot can be shown in either vertical or horizontal format
19
Distribution Shape and Box and Whisker Plot
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
20
Box-and-Whisker Plot Example
Min Q1 Q2 Q3 Max
0 2 2 2 3 3 4 5 5 10 27
0 2 3 5 27
21
Measures of Variation
Variation
22
Variation
• Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13 24
Disadvantages of the Range
• Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119 25
Interquartile Range
26
Interquartile Range
• Example:
X Median X
Minimum Q1 Q2 Q3 Maximum
12 30 45 57 70
Interquartile range
= 57 – 30 = 27 27
Variance
o Sample variance 𝐧
𝐱𝐢 𝐱 𝟐
𝐢 𝟏
S𝟐 𝐧 𝟏
o Population variance
𝐍
𝐱𝐢 µ 𝟐
𝐢 𝟏
σ𝟐 𝐍
28
Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data
∑𝐧
𝐢 𝟏 𝐱𝐢 𝐱
𝟐
o Sample standard deviation: S2
𝐧 𝟏
∑𝐍
𝐢 𝟏 𝐱𝐢 µ
𝟐
o Population standard deviation: σ2
𝐍 𝟏
29
Calculation Example:
Sample Standard Deviation
• Sample Data (Xi) : 10 12 14 15 17 18 18 24
N=8 Mean = X = 16
𝟐 𝟐 𝟐 𝟐
𝟏𝟎 𝐗 𝟏𝟐 𝐗 𝟏𝟒 𝐗 𝐋 𝟐𝟒 𝐗
S
𝐧 𝟏
𝟐 𝟐 𝟐 𝟐
𝟏𝟎 𝟏𝟔 𝟏𝟐 𝟏𝟔 𝟏𝟒 𝟏𝟔 𝐋 𝟐𝟒 𝟏𝟔
S
𝟖 𝟏
𝟏𝟐𝟔
S
30
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = .9258
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.57
31
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Is used to compare two or more sets of data measured in different units
Population Sample
σ S
CV = . 100% CV = . 100%
µ
32
Comparing Coefficient of Variation
• Stock A:
o Average price last year = $50
o Standard deviation = $5
S $5
CVA = . 100%= . 100% 10% Both stocks have the
$50
same standard deviation,
• Stock B: but stock B is less
o Average price last year = $100 variable relative to its
o Standard deviation = $5 price
S $5
CVB = . 100%= . 100% 5%
$100
33
The Empirical Rule
• If the data distribution is bell-shaped, then the interval:
• µ ± 1σ contains about 68% of the values in the population or the sample
68%
µ
µ ± 1σ
34
The Empirical Rule
• µ ± 2σ contains about 95% of the values in the population or the sample
• µ ± 3σ contains about 99.7% of the values in the population or the sample
95% 99.7%
µ ± 2σ µ±3
35
Tchebysheff’s Theorem
• Regardless of how the data are distributed, at least (1 - 1/k2) of the values will
fall within k standard deviations of the mean
o Example
At least within
(1 – 1/12) = 0% ……….k=1 (µ±1σ)
(1 – 1/22) = 75% ……….k=1 (µ±2σ)
(1 – 1/32) = 89% ……….k=1 (µ±3σ)
36
Standardized Data Values
• A standardized data value refers to the number of standard deviations
a value is from the mean
• Standardized data values are sometimes referred to as z-scores
37
Standardized Population Values
𝐱 µ
z σ
• where:
o x = original data value
o µ = population mean
o σ = population standard deviation
o z = standard score
(number of standard deviations x is from µ)
38
Standardized Sample Values
𝐱 𝐱
z σ
• where:
o x = original data value
o x = sample mean
o s = sample standard deviation
o z = standard score
(number of standard deviations x is from µ)
39
Using Microsoft Excel
• Descriptive Statistics are easy to obtain from Microsoft Excel
o Use menu choice: tools / data analysis / descriptive statistics
o Enter details in dialog box
40
Using Excel
1
41
Using Excel
3. Click OK
42
Excel Output
• Microsoft Excel descriptive statistics
output, using the house price data:
• House Prices:
o $2,000,000
o $ 500,000
o $ 300,000
o $ 100,000
o $ 100,000
43
Chapter Summary
• Described measures of center and location
o Mean, median, mode, geometric mean, midrange
• Discussed percentiles and quartiles
• Described measure of variation
o Range, interquartile range, variance, standard deviation, coefficient of
variation
• Created Box and Whisker Plots
44
Chapter Summary
• Illustrated distribution shapes
o Symmetric, skewed
• Discussed Tchebysheff’s Theorem
• Calculated standardized data values
45
THANKS FOR WATCHING