Lesson2 1450

Download as pdf or txt
Download as pdf or txt
You are on page 1of 43

Lesson 2.

Descriptive statistics: measures


of central tendency and measures of
variability
Penina Olga, dr. in medicine
USMF ”Nicolae Testemițanu”
Outline
• 1. Measures of central tendency
• 2. Measures of variability
• 3. Normal distribution and empirical rule
• 4. Five summary statistics and boxplot
Descriptive statistics:
• Measures of Central Tendency
• Measures of Variability

Descriptive statistics merely describe, organize,


or summarize data.
1. Measures of central tendency
Measures of central tendency
• An entire distribution can be characterized by one typical measure that
represents all the observations — measures of central tendency

• Measures of central tendency represent the centre of a distribution of


observations (It is a measure that tells us where the data tends to be
clustered). These measures include :

• the mode 𝑀𝑜
• the median 𝑀𝑑
• the mean 𝑋 (called X-bar)
Mode ( 𝑀𝑜 )
• The mode is the observed value that occurs with the greatest frequency
• It is found by simple inspection of the frequency distribution (it is easy to see
on a frequency polygon as the highest point on the curve)
• Unimodal distribution : the distribution has one mode
• If two scores both occur with the greatest frequency, the distribution is
bimodal
• If more than two scores occur with the greatest frequency, the distribution is
multimodal
• Some data sets do not have a mode because each value occurs only once
(uniform distribution).
• The mode is insensitive by extreme scores (outliers) in a distribution
Median (𝑴𝒅 )
• The median is the number that divides the frequency distribution in
half when all the scores are listed in order (from lowest to highest)
• The median is the score in the distribution that marks the 50th
percentile (C50). That is, 50% of the scores in the distribution fall
above the median and 50% fall below it
• When a distribution has an odd number (5,7,9,etc) of elements, the
median is the middle one
• When a distribution has an even number (6,8,10,etc) of elements,
the median lies halfway between the two middle scores (i.e., it is the
average or mean of the two middle scores)
Median (𝑴𝒅 ) (cont’n)
• For an odd number of scores: n = (n+1)/2 ; n : number of scores
6, 9, 15, 17, 24 : 𝑀𝑑 is 15

• For an even number of scores: average of n/2 and (n/2)+1


6, 9, 15, 17, 24, 29 : 𝑀𝑑 is 16 (the average of 15 and 17)

• The median is insensitive to extreme scores (outliers) in a distribution. In


other words, the median is robust to outliers.
For a distribution 6, 9, 15, 17, 24, 500 : 𝑀𝑑 is still 16
Mean (𝑋)
• The mean, or average, is the sum of all the individual scores divided by the number of
scores (cases) in the distribution
• The mean depends on individual scores in a distribution and as a results it is very
sensitive to extreme scores, unlike mode and median
• The mean is symbolized by μ in a population and by 𝑿 (“x-bar”) in a sample

𝑋 : is the sample mean


𝑋𝑖
For a sample : 𝑋= 𝑋𝑖 : is an individual score in the distribution
𝑛
n : the number of cases in a sample
: “the sum of”
Calculation of the Mean (𝑋)
Subjet ID Shock index
1 0.61
𝑋𝑖
2
3
0.56
0.52 𝑋=
𝑛
4 0.33
5 0.45
6 0.74
7 0.73
8 0.92
9 0.42
10 0.63
0.61 + 0.56 + … . +0.44 12.41
𝑋= = = 0.689
11 0.55 18 18
12 0.50
13 0.75
14 0.82
15 1.30
16 1.29
17 0.85
18 0.44
Calculation of the Weighted mean (𝑋)
Mid-value of
Shock Index (𝑿𝒊 ) 𝑿𝒊 Frequency, f 𝑋𝑖 × 𝑓
0.30 up to 0.40 0.35 38 𝑋=
0.40 up to 0.50 0.45 104
𝑛
0.50 up to 0.60 0.55 198
0.60 up to 0.70 0.65 199 0.35×38 + 0.45×104 + …+(1.25×19) 642.75
𝑋= = = 0.691
0.70 up to 0.80 0.75 155 931 931

0.80 up to 0.90 0.85 102


0.90 up to 1.0 0.95 60
1.00 up to 1.10 1.05 37 (𝑢𝑝𝑝𝑒𝑟 + 𝑙𝑜𝑤𝑒𝑟)
𝑚𝑖𝑑 − 𝑣𝑎𝑙𝑢𝑒 ∶ 𝑋𝑖 =
2
1.10 up to 1.20 1.15 19
1.20 up to 1.30 1.25 19 If the frequency distribution table has intervals, it is necessary to use
𝑛= 𝑓 = 931 the mid-value of the interval. It is calculated by adding the upper and
lower boundaries of the interval and dividing the result by two
Weighted mean (𝑋)
• If original scores of a distribution are not available, the weighted
mean can be estimated from a frequency table
𝑋 : is the weighted mean of a sample
𝑋𝑖 ×𝑓
𝑋= 𝑋𝑖 : is a group of scores in the distribution
𝑛
f : is the frequency of a group of scores
n : the number of cases in a sample
(or one can also write that n = 𝑓)
: “the sum of”
The relative location of the mean, median and
mode in a unimodal distribution
Unimodal distribution : a distribution of scores with one mode

The relationship among the three measures of central tendency


depends on the shape of the distribution:

• Symmetric distribution (normal)


• Skewed distribution (left-skewed and right skewed)
Symmetric distribution

In a unimodal symmetrical
distribution (like the
normal distribution), all
three measures of central
tendency are identical

𝑀𝑜 = 𝑀𝑑 = 𝑋
Skewed distributions
• The mode and median are insensitive (or they are robust) to extreme scores (outliers) in
a distribution, while the mean is very sensitive (it is not robust) to extreme scores
• As a result, the mean in a skewed distribution is pulled in the direction of the tail

Positively (right) skewed Negatively (left) skewed

𝑀𝑜 < 𝑀𝑑 < 𝑋 𝑋 < 𝑀𝑑 < 𝑀𝑜


Using measures of central tendency
Two factors are important for the correct practical application of
measures of central tendency:
1. the scale of measurement : ordinal or numerical
2. the shape of a distribution : symmetrical (normal) or skewed

• The mean is used for numerical data and for symmetric (not skewed,
normal) distributions
• The median is used for ordinal data or for numerical data if the
distribution is skewed
• The mode is used primarily for bimodal distribution
Nominal Ordinal Interval Ratio
Mode + + + +
Median + + +
Mean + +
Location of the mean

A distribution has many more scores above the


mean than below the mean. What can be said
about this distribution?
1. The distribution is positively
skewed.
2. The distribution is negatively
skewed.
3. The distribution is symmetric.
2. Measures of variability
Descriptive statistics
• Measures of Central Tendency
• Measures of Variability
Although these two distributions have identical measures of central tendency, they differ in
terms of their variability—the extent to which their scores are clustered together or
scattered about. The scores forming distribution A are clearly more scattered than are those
forming distribution B.
Three measures of variability
• Range
• Variance
• Standard deviation
• Coefficient of variation
Range
The range is the difference between the largest score (the maximum
value) and the smallest score (the minimum value) of a distribution

In the distribution 6, 9, 15, 17, 20 the range is (20 - 6) = 14

In the distribution 6, 9, 15, 17, 200 the range is (200 - 6) = 194

The range is sensitive to extreme scores (outliers) in a distribution


2
Variance (𝑠 )
1. Calculate the deviation score for each observation (𝑋𝑖 )
If 𝑋𝑖 = 12 and the mean 𝑋 = 10 , then deviation score (x) is (12-10) = 2
Deviation score = 𝑋𝑖 − 𝑋
2. Square each of these deviation scores (squared deviation)
2
𝑋𝑖 − 𝑋
It is necessary to eliminate minus signs, otherwise (𝑋𝑖 −𝑋) = 0
2
3. Sum the squared deviations (sum of squares) 𝑋𝑖 − 𝑋
4. Divide the sum of squares by the number of cases minus 1 (n-1) in a sample.
Thus, you get the variance which is the mean of the squared deviations.
A deviation
68, 69, 74, 76, 79, 87, 88, 90, 93 ; n = 9
68+69+74+76+79+87+88+90+93
Mean (𝑋) = = 80.4
9

(68−80.4)2 +(69−80.4)2 + …+ (93−80.4)2


Variance (𝑠 2 ) = = 87.3
9−1
2
Variance (𝑠 )
For ungrouped data For grouped data
2 2
2 𝑋𝑖 −𝑋 2 𝑋𝑖 −𝑋 ×𝒇
𝑠 = 𝑠 =
𝑛−1 𝑛−1

𝑋 : is the sample mean Variance is the sum of the


𝑋𝑖 : is an individual score in the distribution squared deviations divided by
n : the number of cases in a sample the number of cases minus one
: “the sum of”
Standard deviation (s)

s= 𝑠 2

𝑠 2 : the variance

Standard deviation is the average deviation between the individual scores


in the distribution and the mean for the distribution
68, 69, 74, 76, 79, 87, 88, 90, 93 ; n = 9
68+69+74+76+79+87+88+90+93
Mean (𝑋) = = 80.4
9

(68−84.4)2 +(69−80.4)2 + …+ (93−80.4)2


Variance (𝑠 2 ) = = 87.3
9−1

Standard deviation (s) = 87.3 = 9.3


𝟐 𝟐
Xi Frequency, f 𝑿𝒊 × 𝒇 𝑿𝒊 − 𝑿 𝑿𝒊 − 𝑿 𝑿𝒊 − 𝑿 ×𝒇
24 2 48 -5.9 34.7 69.5
26 4 104 -3.9 15.2 60.7
30 3 90 0.1 0.0 0.0
31 5 155 1.1 1.2 6.1
33 2 66 3.1 9.6 19.3
35 3 105 5.1 26.1 78.2
SUM 19 568 233.79
Mean 29.9
2
𝑋𝑖 ×𝑓 2 𝑋𝑖 −𝑋 ×𝑓 233.79
𝑋= = 29.9 𝑠 = = = 13.0 s = 13 = 3.6
𝑛 𝑛−1 19−1
Coefficient of variation (CV)

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
Coefficient of variation = × 100
𝑀𝑒𝑎𝑛

𝑠
Coefficient of variation = × 100
𝑋
Coefficient of variation

< 10 % : Low

10-35 % : Medium

> 35 % : High
Example. Compare variability in systolic blood pressure
and shock index for a sample of 200 patients.
Systolic blood pressure Shock index

𝑋 = 138 𝑋 = 0.69
s = 26 s = 0.2

CV =
26
× 100 = 18.8% 0.2
138 CV = × 100 = 29.0%
0.69
In this case, a comparison of the standard deviations makes no sense because shock index and systolic BP are
measured on much different scales. Coefficient of variation (CV) measures a relative variation – variation
relative to the size of the mean.
𝟐 𝟐
Xi Frequency, f 𝑿𝒊 × 𝒇 𝑿𝒊 − 𝑿 𝑿𝒊 − 𝑿 𝑿𝒊 − 𝑿 ×𝒇
24 2 48 -5.9 34.7 69.5
26 4 104 -3.9 15.2 60.7
30 3 90 0.1 0.0 0.0
31 5 155 1.1 1.2 6.1
33 2 66 3.1 9.6 19.3
35 3 105 5.1 26.1 78.2
SUM 19 568 233.79
Mean 29.9

233.79
𝑋 = 29.9 𝑠2 = = 13.0 s = 13 = 3.6
19−1

𝑠 3.6
Coefficient of variation = × 100 = × 100 = 12.0
𝑋 29.9
3. Normal distribution. Empirical rule
Three main characteristics
of normal distribution

1. Symmetrical : half the


scores above mean…half
below (bell-shaped form)

2. Unimodal : one mode


Mean
Median
3. 𝐗 = 𝐌𝐝 = 𝐌𝐨 Mode
Empirical rule
Applies only to the normal distributions

• Approximately 68% of the distribution falls within ±1 standard


deviations of the mean.
• Approximately 95% of the distribution falls within ±2 standard
deviations of the mean.
• Approximately 99.7% of the distribution falls within ±3 standard
deviations of the mean.
Empirical rule 68% falls within ±1
standard deviation of the
mean

95% falls within ±2


standard deviations of the
mean

99.7% falls within ±3


standard deviations of the
𝑋 − 3𝑠 𝑋 − 2𝑠 𝑋 − 1𝑠 𝑋 𝑋 + 1𝑠 𝑋 + 2𝑠 𝑋 + 3𝑠
mean
Example The mean of the resting heart rate in a
sample (for example, 200 students) :

𝑋 = 70 beats / min
𝑠 = 10

68% of students has a resting heart rate


between 60 and 80 beats/min

95% of students has the heart rate


𝑋 between 50 and 90 beats/min

The normal distribution of a resting 99.7% of students has a resting heart rate
between 40 and 100 beats/min
heart rate in a hypothetical sample
4. Five summary statistics and boxplot
Five summary statistics

Range = 93-68 = 25

Q1 : first quartile (marks the 25th percentile )


Q2 : second quartile or the median (marks the 50th percentile)
Q3 : third quartile (marks the 75th percentile )

Interquartile range (IQR)


IQR = Q3-Q1=88-74 = 14

IQR is not sensitive to extreme scores in a distribution


Box plot : five summary statistics
Summary
• Descriptive statistics:
• Measures of Central Tendency (Mode, Median, Mean)
• Measures of Variability (Range, Variance, Standard Deviation, Coefficient of
Variation)
• Location of Measures of Central Tendency in a unimodal distribution
• Normal distribution and empirical rule
• Five summary statistics and boxplot. Q1, Q2, Q3, IQR

You might also like