SEE5211 Chapter2 p2017
SEE5211 Chapter2 p2017
SEE5211 Chapter2 p2017
(SEE5211/SEE8212)
Chapter 2
Population characteristic
3 4 5 8 10
Example
Suppose we caught a sample of 6 fish from the lake. The median
length is …
5.5
3 4 5 6 8 10
Measures of Central Tendency
Formula:
x x
n
Example
Suppose we caught a sample of 6 fish from the lake. Find the mean
length of the fish.
3 4 5 6 8 10
6
x 6
3 4 5 6 8 10
Example
Now find how each observation deviates from the mean.
To balance the
ruler on your
finger, you would
need to place your
finger at the mean
of 6.
The mean is the balance point of
a distribution
Example
What happens to the median & mean if the length of 10 inches
was 15 inches?
3 4 5 6 8 15
What happens to the median & mean if the 15 inches was 20?
3 4 5 6 8 20
Median & Mean
Some statistics that are not affected by extreme values . . .
YES
YES
Example
Suppose we caught a sample of 20 fish with the following lengths.
Create a histogram for the lengths of fish.
Mean =6.5
Median =6.5
3 5 6 10 6 7 7 8 4 5
6 4 7 5 9 9 8 7 6 8
Example
Suppose we caught a sample of 20 fish with the following lengths.
Create a histogram for the lengths of fish.
Mean =6.8
Median =5.5
3 5 6 10 15 7 3 3 4 5
6 4 12 5 3 4 8 13 11 9
Example
Suppose we caught a sample of 20 fish with the following lengths.
Create a histogram for the lengths of fish
Mean =7.75
Median =8.5
3 5 6 10 10 7 10 8 9 5
6 4 9 10 9 9 10 7 10 8
Distribution
12 14 19 20 22 24 25 26 26 50
Mean = 23.8
14 19 20 22 24 25 26 26
xT 22
8
What values are used to describe categorical data?
9
pˆ
60% of the sample was satisfied with
0.6 their cell phone service.
15
Why is the study of variability important?
20 30 40 50 60 70
20 30 40 50 60 70
20 30 40 50 60 70
Notice that these three data sets all have the same mean and median (at
45), but they have very different amounts of variability.
Measures of Variability
20 30 40 50 60 70
2
2 x x When calculating sample variance, we
s use degrees of freedom (n – 1) in the
denominator instead of n because this
n 1 tends to produce better estimates.
Example
Remember the sample of 6 fish that we caught from the lake . . .
Find the variance of the length of fish.
x 6 First square the deviations
x (x - x) (x - x)2
3 -3 9
4 -2 4 Finding the average of the
5 -1 1 deviations would always equal 0!
6 0 0
8 2 4 Divide this by 5.
10 4 16 x x 2
Sum 0 34 s2 = 6.5 s2
n 1
Measures of Variability
2
x x
s
n 1
Population standard deviation is denoted by (where n
is used in the denominator).
Measures of Variability
Interquartile range (iqr) is the range of the middle half of the
data.
Lower quartile (Q1) is the median of the lower half of the data
Upper quartile (Q3) is the median of the upper half of the data
iqr = Q3 – Q1
What advantage does the interquartile range have over the standard
deviation?
iqr = 30 – 24 = 6
Boxplots
• ease of construction
• convenient handling of outliers
• construction is not subjective (like histograms)
• Used with medium or large size data sets (n > 10)
• useful for comparative displays
Boxplots
When to Use Univariate numerical data
To describe
– comment on the center, spread, and shape of the
distribution and if there is any unusual features
Example
Remember the data on the percentage of the population with a bachelor’s or
higher degree in 2007 for each of the 50 states and the District of Columbia.
17 19 19 20 20 21 22 22 22 23
23 23 24 24 24 24 25 25 25 25
25 26 26 26 26 26 26 27 27 27
27 27 28 29 29 29 30 30 30 30
31 32 33 34 34 34 35 35 35 38
47
10 20 30 40 50
Percentages
Modified boxplots
To display outliers:
• Identify mild & extreme outliers
An observation is an outliers if it is more than 1.5(iqr) away
from the nearest quartile.
An outlier is extreme if it is more than 3(iqr) away from the
nearest quartile.
Percentages
Symmetrical boxplots Approximately symmetrical boxplot
Skewed boxplot
The 2009-2010 salaries of NBA players were used to construct
the comparative boxplot of salary data for five teams.
Empirical Rule-
Z-score
A z-score tells us how many standard
deviations the value is from the mean.
value - mean
z - score
standard deviation
62 56 69 65
z 1.714 z 1.429
3 .5 2. 8
Percentiles
A percentile is a value in the data set where r percent of the
observations fall AT or BELOW that value
Example
In addition to weight and length, head circumference is another
measure of health in newborn babies. The National Center for
Health Statistics reports the following summary values for head
circumference (in cm) at birth for boys.
Head circumference (cm) 32.2 33.2 34.5 35.8 37.0 38.2 38.6
Percentile 5 10 25 50 75 90 95