Numerical Descriptive Measures (Week2) : in This Chapter, The Students Should Be Able To

Download as pdf or txt
Download as pdf or txt
You are on page 1of 37

9/17/2017

Chapter 1:
Numerical
Descriptive
Measures
(week2)

LEARNING OBJECTIVES

In this chapter, the students should be able to:
○ To describe the properties of central
tendency, variation, and shape in
numerical data
○ To calculate descriptive summary
measures

1
9/17/2017

 The central tendency is the extent to


Summary which all the data values group around a
Definitions typical or central value.
 The variation is the amount of dispersion,
or scattering, of values
 The shape is the pattern of the
distribution of values from the lowest
value to the highest value.

Data description
Measures of is the extent to which all the data values group around
Central tendency a typical or central value.
• Arithmetic Mean
• Median
• Mode
• Midrange

Measures of
Variation is amount of dispersion, or scattering, of values
• Range
• Variance
• Standard deviation
• Coefficient of variation

Shape is the pattern of the distribution of values from the


lowest value to the highest value.

2
9/17/2017

Measures of Central Tendency


Measures of Central Tendency:

o A statistic is a characteristic or measure


obtained by using the data values from a
sample.
o A parameter is a characteristic or measure
obtained by using all the data values of the
specific population.

Measures of Central Tendency


Measures of Central Tendency:
o Central tendency is a single value that is
situated at the centre of a data.
 summary value for that data.
 also called the average.
o Measures of central tendency:
 Mean
 Median
 Mode
 Midrange

3
9/17/2017

Measures of Central Tendency : Mean

Mean

Ungrouped Grouped
data data

Summation Notation

o The Greek Letter “sigma” is a short-hand


symbol for summing a series of numbers:

Sigma

X
i 1
i  X1  X 2    X n

4
9/17/2017

Measures of Central Tendency: The Mean

o The arithmetic mean (often just called “mean” or


the “average”) is the most common measure of
central tendency
o For a sample of size n: The ith value
n
Pronounced x-bar
X
i 1
i
X1  X 2    X n
X 
n n
Sample size Observed
values

Measures of Central Tendency:


The Mean (for ungrouped data)
o The mean is the sum of the values, divided by the
total number of values.
1. Population mean,
o where N is the population size.
2. Sample mean,
o where n is the sample size.
o Useful in comparing two or more population.

5
9/17/2017

Measures of Central Tendency:


Numerical Descriptive Measures
The Mean for
(fora ungrouped
Population: data)
The mean µ
o The population mean is the sum of the values in
the population divided by the population size, N

X1  X 2  X 3    X N  X
 
N N
Where μ = “mu” - population mean
N = population size
Xi = ith value of the variable X

Measures of Central Tendency:


Numerical Descriptive Measures
The Mean for
(fora ungrouped
Population: data)
The mean µ

Rounding Rule:

The mean should be rounded to one more decimal


place than occurs in the raw data.

The mean, in most cases, is not an actual data


value.

6
9/17/2017

Measures of Central Tendency:


The Mean (for ungrouped data) (continued)

○ The most common measure of central tendency


○ Mean = sum of values divided by the number of values
○ Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5

Measures of Central Tendency:


The Mean (for ungrouped data) (continued)

Example 3-1 (Days off per year)


The data represent the number of days off per year for a sample of
individuals selected from nine different countries. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30
X  X 2  X3   X n  X
X 1 
n n
20 + 26 + ⋯ + 30 276
= = = 30.7
9 9
○ The mean number of days off is 30.7 years.

7
9/17/2017

Measures of Central Tendency:


The Mean (for grouped data) (continued)

○ Finding the mean for grouped data:


∑ .
=

where
f = frequency
= midpoint of each class
= sample size

Measures of Central Tendency:


The Mean (for grouped data) (continued)

Example 3-3 (Miles Run per Week)


Below is a frequency distribution of miles run per week. Find the
mean.

8
9/17/2017

Measures of Central Tendency:


The Mean (for grouped data) (continued)

X
 f X m

490
 24.5 miles
n 20

Measures of Central Tendency : Median

Median

Ungrouped Grouped
data data

9
9/17/2017

Measures of Central Tendency:


The Median (for ungrouped data)

Finding the median


○ Step 1 Arrange the data values in ascending order.
○ Step 2 determine the number of values in the data set.

○ Step 3
a. If n is odd, select the middle data value as the
median.
b. If n is even, find the mean of the two middle values.
That is, add them and divide the sum by 2.
Copyright © 2015 The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
○Bluman chapter 3

Measures of Central Tendency:


The Median (for ungrouped data)
○ In an ordered array, the median is the “middle”
number (50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

○ Not affected by extreme values

10
9/17/2017

Measures of Central Tendency: Locating


the Median (for ungrouped data)
○ The location of the median when the values are in numerical order
(smallest to largest):
n 1
Median position  position in the ordered data
2

○ If the number of values is odd, the median is the middle number


○ If the number of values is even, the median is the average of the two
middle numbers
n 1
Note that is not the value of the median, only the position of the
2
median in the ranked data

Measures of Central Tendency: Locating


the Median (for ungrouped data)
Example 3-4 (Hotel Rooms)
The number of rooms in the seven hotels in downtown Pittsburgh
is 713, 300, 618, 595, 311, 401, and 292. Find the median.
Solution:
○ Sort in ascending order: 292, 300, 311, 401, 596, 618, 713

○ Select the middle value.


○ Median, = 401

The median is 401 rooms.

11
9/17/2017

Measures of Central Tendency: Locating


the Median (for ungrouped data)
Example 3-6 (Tornadoes)

The number of tornadoes that have occurred in the United States over an
8-year period follows. Find the median.

684, 764, 656, 702, 856, 1133, 1132, 1303

Find the average of the two middle values.

656, 684, 702, 764, 856, 1132, 1133, 1303

764  856 1620


MD    810
2 2
The median number of tornadoes is 810.

Measures of Central Tendency: Locating


the Median (for grouped data)



Median = = + .

where
= lower class boundary of the median class
= cumulative frequency of all class intervals before the median class
= frequency of the median class
= width of the median class

12
9/17/2017

Measures of Central Tendency: Locating


the Median (for grouped data)

Example 3-3 (Miles Run)


Below is a frequency distribution of miles run per week. Find the mean.

Measures of Central Tendency: Locating


the Median (for grouped data)
Solution

13
9/17/2017

Measures of Central Tendency : Mode

Mode

Ungrouped Grouped
data data

Measures of Central Tendency:


The Mode (for ungrouped data)
○ Value that occurs most often
○ Not affected by extreme values
○ Used for either numerical or categorical (nominal) data
○ There may may be no mode
○ There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9

14
9/17/2017

Measures of Central Tendency:


The Mode (for ungrouped data)
Measures of
Central House Prices:  Mean: ($3,000,000/5)
Tendency:
= $600,000
Review $2,000,000
$500,000
Example $300,000
 Median: middle value of ranked
$100,000 data
$100,000 = $300,000
Sum $3,000,000  Mode: most frequent value
= $100,000

Measures of Central Tendency:


The Mode (for ungrouped data)
Example 3-9 (NFL Signing Bonuses)

Find the mode of the signing bonuses of eight NFL players for a
specific year. The bonuses in millions of dollars are

18.0 14.0 34.5 10 11.3 10 12.4 10

You may find it easier to sort first

10 10 10 11.312.4 14.0 18.0 34.5

Select the value that occurs the most: The mode is 10 million dollars

15
9/17/2017

Measures of Central Tendency:


The Mode (for ungrouped data)
Exercise

Measures of Central Tendency:


The Mode (for ungrouped data)
Example 3-10 (Coal Employees in PA)
Find the mode for the number of coal employees per county
for 10 selected counties in southwestern Pennsylvania.
110 731 1031 84 20 118 1162 1977 103 752

No value occurs more than once


Conclusion: There is no mode

16
9/17/2017

Measures of Central Tendency:


The Mode (for ungrouped data)

Example 3-11 (Licensed Nuclear Reactors)


The data show the number of licensed nuclear reactors in the
US for a recent 15-year period. Find the mode
104 104 104 104 104 107 109 109 109 110 109 111 112 111 109

104 and 109 both occur the most. The data set is said to be
bimodal.
The modes are 104 and 109.

Measures of Central Tendency:


The Mode (for grouped data)

Example 3-12 (Miles Run per Week)

Find the modal class for the frequency distribution of miles that
20 runners ran in one week.

17
9/17/2017

Measures of Central Tendency:


The Mode (for grouped data)


Mode = = + .
∆ ∆

where
= lower class boundaries of the modal class
∆ = frequency of modal class − frequency before the modal class
∆ = frequency of modal class − frequency after the modal class
= width of the median class

Measures of Central Tendency:


The Mode (for grouped data)

Example 3-12 (Miles Run per Week)

Mode = = 20.5 + 5 = 23.83

18
9/17/2017

Measures of Central Tendency:


The Mode (for grouped data) – estimating
from histogram

Example 3-12 (Miles Run per Week)

Mode = = 20.5 + 5 = 23.83

Measures of Central Tendency:


The Midrange

The midrange is the average of the lowest and highest values


in a data set
+ ℎ
=
2

19
9/17/2017

Measures of Variation:
The MidRange

Example 3-15 (Waterline Breaks)


In the last two winter seasons, the city of Brownville,
Minnesota, reported these numbers of water-line breaks per
month. Find the midrange.

2, 3, 6, 8, 4, 1
1+8 9
= = = 4.5
2 2
The midrange is 4.5

Measures of Central Tendency: Which Measure


to Choose?

o The mean is generally used, unless extreme


values (outliers) exist.

o The median is often used, since the median


is not sensitive to extreme values. For
example, median home prices may be
reported for a region; it is less sensitive to
outliers.

o In some situations it makes sense to report


both the mean and the median.

20
9/17/2017

Measures of Central Tendency:


Summary

Central
Tendency

Arithmetic
Median Mode
Mean
n

X
i1
i
X
n Most
Middle value
in the ordered frequently
array observed
value

Types of Distribution

21
9/17/2017

Types of Distribution

Measures of Variation (Dispersion)

Variation

Standard Coefficient
Range Variance
Deviation of Variation

 Measures of variation
give information on the
spread or variability or
dispersion of the data
values.
Same center,
different
variation

22
9/17/2017

Measures of Variation: The Range

o Simplest measure of variation

o Difference between the largest and the smallest


values:

Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 13 - 1 = 12

Measures of Variation:
Why The Range Can Be Misleading
Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119

23
9/17/2017

Measures of Variation:
Comparing each X to the Mean

Deviations from the Mean


For any value x, the deviation from the mean is the
difference between the x value and the mean…
(x - ) for populations and (x – X ) for samples.

Measures of Variation:
The Average Distance to the Mean?????
o Average deviation of values from the mean

∑ −
=

Where X = arithmetic mean


n = sample size
Xi = ith value of the variable X

Only one small problem though….the above formula


will always be equal to zero!!!!

24
9/17/2017

Measures of Variation: Variance


(Average Squared Distance to the Mean)

o Average (approximately) of squared deviations of


values from the mean


o Sample variance: =

Where X = mean
n = sample size
Xi = ith value of the variable X

Numerical Descriptive Measures For A


Population: The Variance σ2

o Average of squared deviations of values from the


mean


o Population variance: =
Where μ = population mean
N = population size
Xi = ith value of the variable X

25
9/17/2017

Measures of Variation: Sample Standard


Deviation
o Most commonly used measure of variation
o Shows variation about the mean
o Is the square root of the variance
o Has the same units as the original data


o Sample standard deviation: =

Numerical Descriptive Measures For A


Population: The Standard Deviation σ
o Most commonly used measure of variation

o Shows variation about the mean


o Is the square root of the population variance
o Has the same units as the original data


o Population standard deviation: =

26
9/17/2017

Measures of Variation: The Standard


Deviation

Steps for Computing Standard Deviation


1. Compute the difference between each value and the mean.

2. Square each difference.

3. Add the squared differences.

4. Divide this total by n-1 to get the sample variance.

5. Take the square root of the sample variance to get the


sample standard deviation.

Measures of Variation:
Sample Variance
Example
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
(10  X ) 2  (12  X ) 2  (14  X ) 2    (24  X ) 2
S2 
n 1

(10  16) 2  (12  16) 2  (14  16) 2    (24  16) 2



8 1

130 A measure of the “average” squared


  18 . 5714
7 distance to the mean

27
9/17/2017

Measures of Variation:
Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n=8 Mean = X = 16
(10  X)2  (12  X)2  (14  X)2    (24  X)2
S
n 1

(10  16)2  (12  16)2  (14  16)2    (24  16)2



8 1

130 A measure of the “average”


  4.3095
7 scatter around the mean

Measures of Variation:
Comparing Standard Deviations

Data A
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 S = 3.338

Data B Mean = 15.5


11 12 13 14 15 16 17 18 19 20
S = 0.926
21

Data C
Mean = 15.5
S = 4.570
11 12 13 14 15 16 17 18 19 20 21

28
9/17/2017

Measures of Variation:
Comparing Standard Deviations

Smaller standard deviation

Larger standard deviation

Measures of Variation:
Summary Characteristics

o The more the data are spread out, the greater the
range, variance, and standard deviation.
o The more the data are concentrated, the smaller the
range, variance, and standard deviation.
o If the values are all the same (no variation), all these
measures will be zero.
o None of these measures are ever negative.

29
9/17/2017

Sample statistics versus population


parameters

Measure Population Sample


Parameter Statistic
Mean  X
Variance 2 s2

Standard  s
Deviation

Excel output

Microsoft Excel
descriptive statistics output,
using the house price data:

House Prices:

$2,000,000
500,000
300,000
100,000
100,000

30
9/17/2017

Variance and Standard Deviation

Example 3-21 (Outdoor Paint)


Find the variance and standard deviation for the data set
for Brand A and B paint.
Brand A Brand B The average for both brands is the
10 35
same, but the range
for Brand A is much greater than the
60 45
range for Brand B.
50 30
30 35 Which brand would you buy?
40 40
20 25

Variance and Standard Deviation

Solution
Brand A Brand B


10 35 X 210
  35
Brand A: N 6
60 45
R  60  10  50
50 30
30 35
40 40
20 25

31
9/17/2017

Variance and Standard Deviation

Solution


Brand A Brand B X 210
  35
Brand B: N 6
10 35
R  45  25  20
60 45
50 30
30 35
40 40
20 25

Variance and Standard Deviation

Example 3-23 (European Auto Sales)


Find the variance and standard deviation for the amount of
European auto sales for a sample of 6 years. The data are
in millions of dollars.

32
9/17/2017

Variance and Standard Deviation

Solution

Variance and Standard Deviation


Solution

33
9/17/2017

Measures of Variation:
The Coefficient of Variation

o Measures relative variation

o Always in percentage (%)

o Shows variation relative to mean

o Can be used to compare the variability of two or more


sets of data measured in different units

= . 100%

Measures of Variation:
Comparing Coefficients of Variation
○Stock A:

◦Average price last year = $50

◦Standard deviation = $5
S $5
CVA     100%   100%  10%
$50 Both stocks
X have the same
○Stock B: standard
deviation, but
◦Average price last year = $100 stock B is less
variable
relative to its
◦Standard deviation = $5
price
S $5
CVB     100%   100%  5%
X $100

34
9/17/2017

Measures of Variation:
Comparing Coefficients of Variation
Example 3-25 (Sales of Automobiles)
The mean of the number of sales of cars over a 3-month
period is 87, and the standard deviation is 5. The mean of
the commissions is $5225, and the standard deviation is
$773. Compare the variations of the two.

Measures of Skewness

Pearson coefficient of skewness :


− 3 −

Skewness

<0 =0 >0
Skewed to left symmetrical skewed to right

35
9/17/2017

Measures of Skewness
Skewness

<0 =0 >0
Skewed to left symmetrical skewed to right

Summary
Ungrouped
Grouped data

Measure of Central Measure of


Tendency Dispersion

Range, Variance,
Standard deviation,
Mean, Median, Mode, Coefficient of
Midrange variation, Pearson
coefficient of
skewness

36
9/17/2017

Tutorial
73

Review Exercises
Pg. 118-121
Q 1-3, 12-14, 32-33
Pg. 173
Q3,5,6

Chapter Quiz
Pg. 176
Q 1-23

37

You might also like