IL2-Describing Variation in Data
IL2-Describing Variation in Data
IL2-Describing Variation in Data
1
Distribution of age among diabetic Measures of central tendency
patients in the polyclinic
z Summarizes the set with a single value
z mean median,
mean, median and mode
z The mean is the average value of all the data
in the set.
z The median is the value that has exactly half
the data above it and half below itit.
z The mode is the value that occurs most
frequently in the set (rarely used)
Advantages and
Example Disadvantages
Systolic blood pressure
z Mean
130 145
130, 145, 150
150, 160
160, 165
z Widely used, easy to understand, measures
Mean: (130+145+150+160+165)/5 central location
Median: 150
z Overly sensitive to extreme values
130, 145, 150, 160, 165, 170
Mean: (130+145+150+160+165+170)/6
z Median
Median: (150+160)/2 z Insensitive to very large or very small values
z Determined by the middle points and less
sensitive to the actual numerical values of the
other data points
2
Mean = median A normal distribution
z Bell-curve or bell-shaped
histogram.
histogram
Sex-partners
3
Range Median and quartiles
z Range = difference between highest and z The median divides the data into two equal
lowest observed values sets (Q2).
z Greatly influenced by the presence of just z The lower quartile (Q1) is where 25% of the
one unusually large or small value (outlier). values are smaller than Q1 and 75% are
larger.
z Can be expressed as an interval such as 3-8,
or as an interval width, as a range of 5. z The upper quartile (Q3) is the value where
75% of the values are smaller than Q3 and
25% are larger
4
Graphic illustrations Percentile rank
z Box-plots z Divide all values into 100 parts (percentile)
z Error-bars
z The proportion of values in a distribution that
Upper quartile a specific score is greater than or equal to.
z Eg. if you received a score of 75 on a math
Lower quartile
test and this score was greater than or equal
t the
to th scores off 85% off the
th students
t d t taking
t ki
the test, then your percentile rank would be
85 (85th percentile)
n-1
5
Standard Deviation
Degree of freedom z Standard deviation (s) = square root of the variance
(give back the original scale)
zThe number of variables whose values can z Properties
p of standard deviation
be altered without affecting the mean, once it z measure spread or dispersion around the mean of a
is known. data set.
z never negative.
z sensitive to outliers.
zEg. 80, 85, 90, 105, X z for data with approximately the same mean, the
If mean is 95
95, X=115
X 115. Hence only 4 out of 5 values greater the spread,
g p , the greater
g the standard deviation.
can be changed to get back mean = 95.
n-1
6
zEg. 47,000 babies born in a
hospital
z 1,000 babies sampled, 1,000
weights obtained
z M
Mean = 3.25
3 25 kkg, Counts
SD=0.3 kg
95% of all the 1,000 babies
lie within 3.25 +/- (2x0.3) kg.
95% of all the 1,000 babies
lie within 2
2.65
65 and 3
3.85
85 kg
kg.
2.5% weigh less than 2.65 kg
and 2.5 % weigh more than 2.0 2.5 3.0 3.5 4.0
3.85 kg.