Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
Chapter 3, Numerical Descriptive Measures: - Data Analysis Is
Measures
• Data analysis is objective
– Should report the summary measures that best
meet the assumptions about the data set
Mode Variance
X i
X1 X 2 Xn
X i1
n n
XG ( X1 X 2 Xn ) 1/ n
RG [(1 R1 ) (1 R 2 ) (1 Rn )]1/ n 1
n 1
Median position position in the ordered data
2
Same center,
different variation
Range and Interquartile Rage
• Range
– Simplest measure of variation
– Difference between the largest and the smallest observations:
Range = Xlargest – Xsmallest
– Ignores the way in which data are distributed
– Sensitive to outliers
• Interquartile Range
– Eliminate some high- and low-valued observations and calculate
the range from the remaining values
– Interquartile range = 3rd quartile – 1st quartile
= Q3 – Q1
Variance
• Average (approximately) of squared
deviations of values from the mean
n
– Sample variance:
(X X) i
2
S 2 i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Has the same units as the original data
• It is a measure of the “average” spread around the mean
• Sample standard deviation:
(X X)
i
2
S i1
n -1
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of data
measured in different units
S
CV 100%
X
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
– Symmetric or skewed
Left-Skewed Symmetric Right-Skewed
Mean < Median Mean = Median Median < Mean
Using the Five-Number Summary to
Explore the Shape
• Box-and-Whisker Plot: A Graphical display of data using
5-number summary:
Minimum, Q1, Median, Q3, Maximum
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Relationship between Std. Dev. And
Shape: The Empirical Rule
Population Mean
X i
X1 X 2 XN
i 1
N N
(X i μ) 2
Population variance σ 2
i 1
N
Covariance and Coefficient of
Correlation
• The sample covariance measures the strength of the
linear relationship between two variables (called
bivariate data)
• The sample covariance:
n
( X X)( Y Y )
i i
cov ( X , Y ) i1
n 1
• Only concerned with the strength of the relationship
• No causal effect is implied
• Covariance between two random variables:
• cov(X,Y) > 0 X and Y tend to move in the same direction
( X X)( Y Y )
i i
cov ( X , Y )
r i1
n n SX SY
i
( X
i 1
X ) 2
i
( Y
i1
Y ) 2
• Coefficient of Correlation:
– Is unit free
– Ranges between –1 (perfect negative) and 1(perfect
positive)
– The closer to –1, the stronger the negative linear
relationship
– The closer to 1, the stronger the positive linear
relationship
– The closer to 0, the weaker any positive linear
relationship
– At 0 there is no relationship at all
Correlation vs. Regression
• A scatter plot (or scatter diagram) can be used
to show the relationship between two
variables
• Correlation analysis is used to measure
strength of the association (linear
relationship) between two variables
– Correlation is only concerned with strength of the
relationship
– No causal effect is implied with correlation