0% found this document useful (0 votes)
43 views

IL2-Describing Variation in Data

This document defines and describes various measurement variables and methods for presenting continuous data. It discusses variables that can be measured along a numerical continuum, such as height, weight, and blood pressure. It also covers topics like presenting continuous data through histograms and descriptive statistics, measures of central tendency like mean, median and mode, measures of spread such as range, quartiles and interquartile range, and the normal distribution. Graphic illustrations of concepts like box plots and distributions are also mentioned.

Uploaded by

Vanessa Hermione
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

IL2-Describing Variation in Data

This document defines and describes various measurement variables and methods for presenting continuous data. It discusses variables that can be measured along a numerical continuum, such as height, weight, and blood pressure. It also covers topics like presenting continuous data through histograms and descriptive statistics, measures of central tendency like mean, median and mode, measures of spread such as range, quartiles and interquartile range, and the normal distribution. Graphic illustrations of concepts like box plots and distributions are also mentioned.

Uploaded by

Vanessa Hermione
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Measurement variables

Describing (continuous variables)


Variation in Data
z Variables with an infinite number of values
A/P Koh Woon Puay that are equally spaced
MBBS, PhD z Can be measured along a numerical continuum
Email: ephkwp@nus.edu.sg
EPH office,, MD3,, Level 3 z Eg: height, weight, temperature, blood pressure
Tel: 6516 4975

Long ordinal data Presenting continuous data


z Ordinal data which are graded on a long scale, z Graphically: histogram
especially if numerically represented, may
sometimes be treated as continuous data z Descriptive:
z Eg: depression or anxiety on a scale of 1 to 10 z Summarize data with a single value
z But not true continuous data because (Measure central tendency)
z They have a finite number of distinct values
z There are gaps in the continuum z Measure absolute spread (dispersion)
z Spacing between categories is not numerically equivalent
(this may limit the interpretation of the results in analysis)

1
Distribution of age among diabetic Measures of central tendency
patients in the polyclinic
z Summarizes the set with a single value
z mean median,
mean, median and mode
z The mean is the average value of all the data
in the set.
z The median is the value that has exactly half
the data above it and half below itit.
z The mode is the value that occurs most
frequently in the set (rarely used)

Advantages and
Example Disadvantages
Systolic blood pressure
z Mean
130 145
130, 145, 150
150, 160
160, 165
z Widely used, easy to understand, measures
Mean: (130+145+150+160+165)/5 central location
Median: 150
z Overly sensitive to extreme values
130, 145, 150, 160, 165, 170

Mean: (130+145+150+160+165+170)/6
z Median
Median: (150+160)/2 z Insensitive to very large or very small values
z Determined by the middle points and less
sensitive to the actual numerical values of the
other data points

2
Mean = median A normal distribution
z Bell-curve or bell-shaped
histogram.
histogram

Mean > median


z Most of the values
accumulate around the
Median Mean middle. The mean,
median and mode are all
equal, and the scores at
Mean < median
either end of the
distribution occur less
Mean Median often

Skewness Measures of spread of continuous


or measurement data
z Skewed to right: if the
scores tend to cluster •Skewed to left: most of
toward the lower end of the scores tend to occur z Range
toward the upper end of the
the scale
scale while increasingly
Sex-partners fewer scores occur toward z Quartiles
the lower end.
z Variance and standard deviation

Sex-partners

3
Range Median and quartiles
z Range = difference between highest and z The median divides the data into two equal
lowest observed values sets (Q2).
z Greatly influenced by the presence of just z The lower quartile (Q1) is where 25% of the
one unusually large or small value (outlier). values are smaller than Q1 and 75% are
larger.
z Can be expressed as an interval such as 3-8,
or as an interval width, as a range of 5. z The upper quartile (Q3) is the value where
75% of the values are smaller than Q3 and
25% are larger

Example 1 – Upper and lower quartiles Interquartile Range


z Data:
z Interquartile range =difference between
z 8 49
8, 49, 51
51, 17
17, 45
45, 43
43, 9
9, 41
41, 45
45, 43
43, 38 upper quartile (Q3) and lower quartile (Q1)

z Ordered data z Interquartile range spans 50% of a data set,


and eliminates the influence of outliers
z 8, 9, 17, 38, 41, 43, 43, 45, 45, 49, 51
z Lower quartile: 17
z Median: 43;
z Upper quartile: 45;

4
Graphic illustrations Percentile rank
z Box-plots z Divide all values into 100 parts (percentile)
z Error-bars
z The proportion of values in a distribution that
Upper quartile a specific score is greater than or equal to.
z Eg. if you received a score of 75 on a math
Lower quartile
test and this score was greater than or equal
t the
to th scores off 85% off the
th students
t d t taking
t ki
the test, then your percentile rank would be
85 (85th percentile)

Advantages and disadvantages Variance


z Variance combines all the values in a data set to
z Range produce a measure of spread.
z Very easy to compute
z Very sensitive to extreme observations z The variance (symbolized by s2) is the sum of the
z Poor indication of distribution of points in between squared deviations from the mean, divided by the
number of observations minus 1 (degree of
z Quartiles freedom))
z Less sensitive to outliers
z Some of the observations are not used

n-1

5
Standard Deviation
Degree of freedom z Standard deviation (s) = square root of the variance
(give back the original scale)
zThe number of variables whose values can z Properties
p of standard deviation
be altered without affecting the mean, once it z measure spread or dispersion around the mean of a
is known. data set.
z never negative.
z sensitive to outliers.
zEg. 80, 85, 90, 105, X z for data with approximately the same mean, the
If mean is 95
95, X=115
X 115. Hence only 4 out of 5 values greater the spread,
g p , the greater
g the standard deviation.
can be changed to get back mean = 95.

n-1

More about a normal


Normal distribution distribution
z Many kinds of physiological z If the mean and standard deviation of a normal
data are approximated well by distribution are known, it is relatively easy to figure
the normal distribution.
out the percentile rank.
z Many statistical tests assume
a normal distribution.
z In a normal distribution, about 68% of the scores
z Most of these tests work well
even if the distribution is only are within one standard deviation of the mean,
approximately normal and in about 95% of the scores are within two standard
many cases as long as it does deviations, and about 99% of the scores are
not deviate greatly from within three standard deviations
normality.

6
zEg. 47,000 babies born in a

hospital
z 1,000 babies sampled, 1,000
weights obtained
z M
Mean = 3.25
3 25 kkg, Counts
ƒSD=0.3 kg
ƒ95% of all the 1,000 babies
lie within 3.25 +/- (2x0.3) kg.
95% of all the 1,000 babies
lie within 2
2.65
65 and 3
3.85
85 kg
kg.
2.5% weigh less than 2.65 kg
and 2.5 % weigh more than 2.0 2.5 3.0 3.5 4.0
3.85 kg.

You might also like