0% found this document useful (0 votes)
192 views32 pages

Measures of Variation

The document discusses various measures of variation used to describe the spread of a data set, including range, standard deviation, variance, and interquartile range. It provides the steps to calculate sample standard deviation, discusses interpreting standard deviation using the empirical rule and three standard deviations rule, and introduces other concepts like coefficient of variation and Chebyshev's theorem.

Uploaded by

Injamam Alam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
192 views32 pages

Measures of Variation

The document discusses various measures of variation used to describe the spread of a data set, including range, standard deviation, variance, and interquartile range. It provides the steps to calculate sample standard deviation, discusses interpreting standard deviation using the empirical rule and three standard deviations rule, and introduces other concepts like coefficient of variation and Chebyshev's theorem.

Uploaded by

Injamam Alam
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Measures of variation

⚪ Data for 5 starting


players from two
basketball teams:
⚪ A: 72 , 73, 76, 76, 78
⚪ B: 67, 72, 76, 76, 84
⚪ Mean, Median & Mode
Measures of Variation

⚪ Ex. 1 continued. To describe the difference in the


two data sets, we use a descriptive measure that
indicates the amount of spread , or dispersion, in a
data set.

● Range: difference between maximum and


minimum values of the data set.
Measures of Variation

⚪ Range of team A: 78-72=6


⚪ Range of team B: 84-67=17
⚪ Advantage of range: 1) easy to compute
⚪ Disadvantage: only two values are
considered.
Unlike the range, the sample standard deviation
takes into account all data values. The following
procedure is used to find the sample standard
deviation:
⚪ 1. Find mean of data : =
Step 2: Find the deviation of each score from the
mean

72
72-75 = -3
73 73–75 = -2
76 76-75 = 1
76 76-75 = 1
78 78-75= 3
Note that
the sum of
the
deviations
=0

0
The sum of the deviations from mean will always be zero.
This can be used as a check to determine if your calculations
are correct.

⚪ Note that
Step 3: Square each deviation from the mean. Find the sum of
the squared deviations.

⚪ Height deviation squared deviation


⚪ 72 -3 9
⚪ 73 -2 4
⚪ 76 1 1
⚪ 76 1 1
⚪ 78 3 9
⚪ = 24
Step 4: The sample variance is determined by dividing the sum of
the squared deviations by (n-1) (the number of scores minus one)

⚪ Note that sum of squared deviations is 24

⚪ Sample variance is

⚪ =
The four steps can be combined into one mathematical
formula for the sample standard deviation. The sample
standard deviation is the square root of the quotient of the sum
of the squared deviations and (n-1)

⚪ Sample Standard Deviation:

⚪ =
Four step procedure to calculate sample standard
deviation:

⚪ 1. Find the mean of the data

⚪ 2. Set up a table which lists the data in the left


hand column and the deviations from the mean in
the next column.

⚪ 3. In the third column from the left, square each


deviation and then find the sum of the squares of
the deviations.

⚪ 4. Divide the sum of the squared deviations by


(n-1) and then take the positive square root of the
result.
Problem for students:

⚪ By hand: Find variance and


standard deviation of data: 5, 8, 9,
7, 6
⚪ Answer: Standard deviation is
approximately 1.581 and the
variance is the square of 1.581 =
2.496
Standard deviation of grouped data:
1. Find each class midpoint.
2. Find the deviation of each value from
the mean
3. Each deviation is squared and then
multiplied by the class frequency.
4. Find the sum of these values and
divide the result by (n-1) (one less than
the total number of observations).
Here is the frequency distribution of the number of rounds of golf
played by a group of golfers. The class midpoints are in the second
column. The mean is 29.35 . Third column represents the square of the
difference between the class midpoint and the mean. The 5th column is the
product of the frequency with values of the third column. The final result is
highlighted in red
class midpoint data-mean frequency (x-mean)^2*frequency x*f

squared

[0,7) 3.5 668.3948 0 0 0


[7,14) 10.5 355.4482 2 710.8963556 21
[14,21) 17.5 140.5015 10 1405.015111 175
[21,28) 24.5 23.55484 21 494.6517333 514.5
[28,35) 31.5 4.608178 23 105.9880889 724.5
[35,42) 38.5 83.66151 14 1171.261156 539
[42,49) 45.5 260.7148 5 1303.574222 227.5

75 5191.386667 29.35333

8.37579094
Variance For Grouped Data
Interpreting the standard deviation

⚪ 1. The more variation in a data set, the greater the


standard deviation.

⚪ 2. The larger the standard deviation, the more


“spread” in the shape of the histogram representing
the data.

⚪ 3. Standard deviation is used for quality control in


business and industry. If there is too much
variation in the manufacturing of a certain product,
the process is out of control and adjustments to the
machinery must be made to insure more uniformity
in the production process.
Three standard deviations rule

⚪ “ Almost all” the data will lie within 3 standard deviations


of the mean
⚪ Mathematically, nearly 100% of the data will fall in the
interval determined by
Empirical Rule

⚪ If a data set is “mound shaped” or “bell-shaped”,


then:
⚪ 1. approximately 68% of the data lies within one
standard deviation of the mean

⚪ 2. Approximately 95% data lies within 2 standard


deviations of the mean.

⚪ 3. About 99.7 % of the data falls within 3 standard


deviations of the mean.
Yellow region is 68% of the total area. This includes all data within
one standard deviation of the mean.
Yellow region plus brown regions include 95% of the total area. This
includes all data that are within two standard deviations from the
mean.
Question
A company produces a lightweight valve that is
specified to weigh 1365 grams. Unfortunately,
because of imperfections in the manufacturing
process not all of the valves produced weigh
exactly 1365 grams. In fact, the weights of the
valves produced are normally distributed with a
mean weight of 1365 grams and a standard
deviation of 294 grams. Within what range of
weights would approximately 95% of the valve
weights fall? Approximately 16% of the weights
would be more than what value? Approximately
0.15% of the weights would be less than what
value?
Solution
Chebyshev’s Theorem

⚪ The empirical rule applies only when


data are known to be approximately
normally distributed.
⚪ Chebyshev’s theorem applies to all
distributions regardless of their shape
and thus can be used whenever the data
distribution shape is unknown or is non-
normal.

Question
In the computing industry the average age of
professional employees tends to be younger than in
many other business professions. Suppose the
average age of a professional employed by a
particular computer firm is 28 with a standard
deviation of 6 years. A histogram of professional
employee ages with this firm reveals that the data
are not normally distributed but rather are amassed
in the 20s and that few workers are over 40. Apply
Chebyshev’s theorem to determine within what
range of ages would at least 80% of the workers’
ages fall.
Coefficient of Variation

⚪ The coefficient of variation is a


statistic that is the ratio of the
standard deviation to the mean
expressed in percentage and is
denoted CV.

⚪ CV = (σ/μ)*(100)
Interquartile Range
⚪ The interquartile range is the range of values
between the first and third quartile.
⚪ Essentially, it is the range of the middle 50% of the
data and is determined by computing the value of
Q3 - Q1.
⚪ The interquartile range is especially useful in
situations where data users are more interested in
values toward the middle and less interested in
extremes.
⚪ In describing a real estate housing market, Realtors
might use the interquartile range as a measure of
housing prices when describing the middle half of
the market for buyers who are interested in houses
in the midrange.
⚪ INTERQUARTILE RANGE Q3 - Q1

You might also like