Part 2-Chapter 3 - Describing Data - Edit

BUSINESS STATISTICS:
A DECISION – MAKING APPROACH

6th EDITION
Chapter 3
Describing Data Using Numerical
Measures
Chapter Goals
After completing this chapter, you should be able to:

• Compute and interpret the mean, median, and mode for a set of data
• Compute the range, variance, and standard deviation and know what these
values mean
• Construct and interpret a box and whiskers plot
• Compute and explain the coefficient of variation and z scores
• Use numerical measures along with graphs, charts, and tables to describe data
2
Chapter Topics
• Measures of Center and Location
o Mean, median, mode, geometric mean, midrange
• Other measures of Location
o Weighted mean, percentiles, quartiles
• Measures of Variation
o Range, interquartile range, variance and standard deviation, coefficient of
variation
3
Summary Measures
Describing Data Numerically
Center and Location Other Measures of Location Variation
Mean Percentiles Range
Median Quartiles Interquartile Range
Mode Variance
Weighted Mean Standard Deviation
Coefficient of Variation
4
Measures of Center and Location
Overview
Center and Location
Mean Median Mode Weighted Mean

𝒏
𝒙𝒊 𝐰𝐢𝐱𝐢
𝐗
𝒊 𝟏
𝐗w
𝒏 𝐰𝐢
𝑵
𝒙𝒊 𝐰𝐢𝐱𝐢
𝒊 𝟏 µw
𝑵 𝐰𝐢 5
Mean (Arithmetic Average)
• The Mean is the arithmetic average of data values
o Sample mean n= Sample Size

𝒏
𝒙𝒊
𝒊 𝟏
x1 x2 ⋯ xn
𝐗 n = n
o Population mean N= Population Size
𝑵
𝒙𝒊
𝒊 𝟏 x1 x2 ⋯ xN
𝑵 = N 6
Mean (Arithmetic Average)
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 4
1 2 3 4 5 15 1 2 3 4 10 20
3 4
5 5 5 5 7
Median
• Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Mean = 3 Mean = 3
• In an ordered array, the median is the “middle” number

o If n or N is odd, the median is the middle number
o If n or N is even, the median is the average of the two middle numbers
8
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5
Mean = 5 No mode
9
Weighted Mean
• Used when values are grouped by frequency or relative importance
• Example: Sample of 26 Repair Projects
Day to Weighted Mean Days to Complete:

Frequency
Complete
5 4
6 12
𝐖𝐢𝐗𝐢
7 8 4𝑥5 12𝑥6 8𝑥7 2𝑥8 164
8 2 𝐗w = 4 12 8 2 26
𝐖𝐢 𝟔. 𝟑𝟏 𝒅𝒂𝒚𝒔
10
Review Example
• Five houses on a hill by the beach
• House Prices:
o $2,000,000 $2,000 K
o $ 500,000
o $ 300,000
o $ 100,000 $300 K
$500 K
o $ 100,000
$100 K
$100 K
11
Summary Statistics
• House Prices: • Mean: ($ 3,000,000/5)

o $2,000,000 = $ 600,000
o $ 500,000
o $ 300,000 • Median: middle value of ranked data
o $ 100,000 = $ 300,000
o $ 100,000
Sum $3,000,000 • Model: most frequent value
= $ 100,000
12
Which measure of location is the “best”?
• Mean is generally used, unless extreme values (outliers) exist

• Then median is often used, since the median is not sensitive to extreme
values.
o Example: Median home prices may be reported for a region – less
sensitive to outliers
13
Shape of a Distribution
• Describes how data is distributed

• Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median < Mode Mean < Median < Mode Mean < Median < Mode
(Longer tail extends to left) (Longer tail extends to right)

14
Other Location Measures
Other Measures
of Location
Percentiles Quartiles
The pth percentile in a data array

• 1st quartile = 25th percentile
• p% are less than or equal to this value
• 2nd quartile = 50th percentile
• (100 – p)% are greater than or equal to = median
this value
• 3rd quartile = 75th percentile
(where 0 ≤ p ≤ 100) 15
Percentiles
• The pth percentile in an ordered array of n values is the value in ith position,
where
i= n 1
• Example: The 60th percentile in an ordered array of 19 values is the value in

12th position:
i= n 1 19 1 =12
16
Quartiles
• Quartiles split the ranked data into 4 equal groups
25% 25% 25% 25%
Q1 Q2 Q3
• Example: Find the first quartile
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
(n = 9)
Q1 = 25th percentile, so find the 9 1 =2.5 position
so use the value half way between the 2nd and 3rd values,
so 𝐐𝟏 𝟏𝟐. 𝟓 17
Box and Whisker Plot
• A Graphical display of data using 5-number summary:

Minimum -- Q1 -- Median -- Q3 -- Maximum
• Example:
25% 25% 25% 25%
Minimum 1st Median 3rd Maximum
Quartile Quartile
18
Shape of Box and Whisker Plots
• The Box and central line are centered between the endpoints if data is
symmetric around the median
• A Box and Whisker plot can be shown in either vertical or horizontal format
19
Distribution Shape and Box and Whisker Plot
Left-Skewed Symmetric Right-Skewed
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
20
Box-and-Whisker Plot Example
• Below is a Box-and-Whisker plot for the following data:
Min Q1 Q2 Q3 Max
0 2 2 2 3 3 4 5 5 10 27
0 2 3 5 27
• This data is very right skewed, as the plot depicts
21
Measures of Variation
Variation
Range Variance Standard Deviation Coefficient of Variation
Interquartile Population Population

Range Variance Standard
Deviation
Sample
Variance Sample
Standard
Deviation
22
Variation
• Measures of variation give information on the spread or variability of the data

values.
Same center, different variation

23
Range
• Simplest measure of variation

• Difference between the largest and the smallest observations:
Range = Xmaximum – Xminimum
• Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13 24
Disadvantages of the Range
• Ignores the way in which data are distributed
7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5
• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119 25
Interquartile Range
• Can eliminate some outlier problems by using the interquartile range

• Eliminate some high-and low-valued observations and calculate the range
from the remaining values.
• Interquartile range = 3rd quartile – 1st quartile
26
Interquartile Range
• Example:
X Median X
Minimum Q1 Q2 Q3 Maximum
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27 27
Variance
• Average of squared deviations of values from the mean
o Sample variance 𝐧
𝐱𝐢 𝐱 𝟐
𝐢 𝟏
S𝟐 𝐧 𝟏
o Population variance
𝐍
𝐱𝐢 µ 𝟐
𝐢 𝟏
σ𝟐 𝐍
28
Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data
∑𝐧
𝐢 𝟏 𝐱𝐢 𝐱
𝟐
o Sample standard deviation: S2
𝐧 𝟏
∑𝐍
𝐢 𝟏 𝐱𝐢 µ
𝟐
o Population standard deviation: σ2
𝐍 𝟏
29
Calculation Example:
Sample Standard Deviation
• Sample Data (Xi) : 10 12 14 15 17 18 18 24
N=8 Mean = X = 16
𝟐 𝟐 𝟐 𝟐
𝟏𝟎 𝐗 𝟏𝟐 𝐗 𝟏𝟒 𝐗 𝐋 𝟐𝟒 𝐗
S
𝐧 𝟏
𝟐 𝟐 𝟐 𝟐
𝟏𝟎 𝟏𝟔 𝟏𝟐 𝟏𝟔 𝟏𝟒 𝟏𝟔 𝐋 𝟐𝟒 𝟏𝟔
S
𝟖 𝟏
𝟏𝟐𝟔
S
30
Comparing Standard Deviations
Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = .9258
Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 4.57
31
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Is used to compare two or more sets of data measured in different units
Population Sample
σ S
CV = . 100% CV = . 100%
µ
32
Comparing Coefficient of Variation
• Stock A:
o Average price last year = $50
o Standard deviation = $5
S $5
CVA = . 100%= . 100% 10% Both stocks have the
$50
same standard deviation,
• Stock B: but stock B is less
o Average price last year = $100 variable relative to its
o Standard deviation = $5 price
S $5
CVB = . 100%= . 100% 5%
$100
33
The Empirical Rule
• If the data distribution is bell-shaped, then the interval:
• µ ± 1σ contains about 68% of the values in the population or the sample
68%
µ
µ ± 1σ
34
The Empirical Rule
• µ ± 2σ contains about 95% of the values in the population or the sample
• µ ± 3σ contains about 99.7% of the values in the population or the sample
95% 99.7%
µ ± 2σ µ±3
35
Tchebysheff’s Theorem
• Regardless of how the data are distributed, at least (1 - 1/k2) of the values will
fall within k standard deviations of the mean
o Example
At least within
(1 – 1/12) = 0% ……….k=1 (µ±1σ)
(1 – 1/22) = 75% ……….k=1 (µ±2σ)
(1 – 1/32) = 89% ……….k=1 (µ±3σ)
36
Standardized Data Values
• A standardized data value refers to the number of standard deviations
a value is from the mean
• Standardized data values are sometimes referred to as z-scores
37
Standardized Population Values
𝐱 µ
z σ
• where:
o x = original data value
o µ = population mean
o σ = population standard deviation
o z = standard score
(number of standard deviations x is from µ)
38
Standardized Sample Values
𝐱 𝐱
z σ
• where:
o x = original data value
o x = sample mean
o s = sample standard deviation
o z = standard score
(number of standard deviations x is from µ)
39
Using Microsoft Excel
• Descriptive Statistics are easy to obtain from Microsoft Excel
o Use menu choice: tools / data analysis / descriptive statistics
o Enter details in dialog box
40
Using Excel
1
Use menu choice:

Data / data analysis / descriptive statistics
41
Using Excel
1. Enter dialog box details
2. Check box for summary statistics
3. Click OK
42
Excel Output
• Microsoft Excel descriptive statistics
output, using the house price data:
• House Prices:
o $2,000,000
o $ 500,000
o $ 300,000
o $ 100,000
o $ 100,000
43
Chapter Summary
• Described measures of center and location
o Mean, median, mode, geometric mean, midrange
• Discussed percentiles and quartiles
• Described measure of variation
o Range, interquartile range, variance, standard deviation, coefficient of
variation
• Created Box and Whisker Plots
44
Chapter Summary
• Illustrated distribution shapes
o Symmetric, skewed
• Discussed Tchebysheff’s Theorem
• Calculated standardized data values
45
THANKS FOR WATCHING

Part 2-Chapter 3 - Describing Data - Edit

Uploaded by

Copyright:

Available Formats

Part 2-Chapter 3 - Describing Data - Edit

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Part 2-Chapter 3 - Describing Data - Edit

Uploaded by

Copyright:

Available Formats

BUSINESS STATISTICS:

A DECISION – MAKING APPROACH

After completing this chapter, you should be able to:

Center and Location Other Measures of Location Variation

Mean Percentiles Range

Median Quartiles Interquartile Range

Weighted Mean Standard Deviation

Mean Median Mode Weighted Mean

o Sample mean n= Sample Size

• In an ordered array, the median is the “middle” number

• Example: Sample of 26 Repair Projects

Day to Weighted Mean Days to Complete:

• House Prices: • Mean: ($ 3,000,000/5)

• Mean is generally used, unless extreme values (outliers) exist

• Describes how data is distributed

Left-Skewed Symmetric Right-Skewed

(Longer tail extends to left) (Longer tail extends to right)

The pth percentile in a data array

• Example: The 60th percentile in an ordered array of 19 values is the value in

Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22

• A Graphical display of data using 5-number summary:

25% 25% 25% 25%

Minimum 1st Median 3rd Maximum

Left-Skewed Symmetric Right-Skewed

• Below is a Box-and-Whisker plot for the following data:

• This data is very right skewed, as the plot depicts

Range Variance Standard Deviation Coefficient of Variation

Interquartile Population Population

• Measures of variation give information on the spread or variability of the data

Same center, different variation

• Simplest measure of variation

Range = Xmaximum – Xminimum

• Can eliminate some outlier problems by using the interquartile range

25% 25% 25% 25%

• Average of squared deviations of values from the mean

Use menu choice:

1. Enter dialog box details

2. Check box for summary statistics

You might also like