Lecture 1 Correlation
Lecture 1 Correlation
Lecture 1 Correlation
Correlation
(syllabus chapter 2) Bjorn Winkens Methodology and Statistics University of Maastricht Bjorn.Winkens@stat.unimaas.nl 11 April 2008
Methodology and Statistics | University of Maastricht Bjorn Winkens 2008
Content
Covariance and correlation Pearson correlation coefficient Tests and confidence interval for correlations Spearmann correlation Pitfalls
2
Association
Study goal = examine the association between two variables Some questions arise: What measure of association should we use? Is there a positive or negative association? Is there a linear association? Is there a significant association?
3
Covariance (1)
= measure of how much two random variables vary together difference with variance? formula:
cov( X , Y ) =
( x x )( y
i i
y)
n 1
Covariance (2)
Example: X = height, Y = weight Positive or negative covariance? x = 181 cm, y = 76.5 kg Cov(X,Y) = 35.0
Positive association Strong or weak?
110 100
Weight (kg)
90 80 70 60
50 150
+
160 170 180 190
200
X* = height in meters:
Cov(X*,Y) = 0.35
Height (cm)
Correlation (1)
= measure of linear association between two random variables Notation:
population: (rho) sample: r
Can take any value from -1 to 1 Closer to -1: stronger negative association Closer to +1: stronger positive association
6
Correlation (2)
Pearsons correlation coefficient
cov( X , Y ) r= = 2 2 s X sY ( xi x ) ( yi y )
i i i
( x x )( y
i
y)
No dimension Invariant under linear transformations Example (X, X* = height (cm; m), Y = weight (kg)): Corr(X,Y) = r = 0.38 Corr(X*,Y) = r = 0.38
7
FEV (l)
FEV (l)
Correlation r = 0.0
Caffeine
No association?
9
11
n2 t=r 2 1 r
Conclusion?
14
Conclusion?
t98 distribution
-2.56
2.56
15
Be aware!
Significance depends on sample size: n 10 20 50 100 200 Significant ( = 0.05) if r 0.63 0.44 0.28 0.20 0.14
16
17
Birthweight
1 1+ r z = ln 2 1 r
ln = natural logarithm (base = e = 2.718)
20
Fishers z-transformation
z = 1.1 z = 0.05 z = -1.1
r = -0.8
r = 0.05
r = 0.8
21
1 1 + 0 z0 = ln 1 2 0
= (z z0)(n-3) ~ N(0,1)
link31
22
e 1 e 1 1 = 2 z1 , 2 = 2 z2 e +1 e +1
2 z1 2 z2
26
Conclusion?
27
Test statistic:
z1 z 2 1 1 + n1 3 n2 3
Fishers transformation:
Frequency
Frequency
1.0
.5
0.0 1 2 3 4 5 6
0 1 2 3 4 5 6
Outcome
Outcome
Shapiro-Wilk: p = 0.961
p = 0.039
36
t-test for Spearman rank correlation: t = 3.45, df = 24 2 = 22 p-value < 0.01 Conclusion? Remarks?
37
Pitfalls
Spurious correlations No measurement of agreement Change scores (Y-X) always related to baseline X (regression to the mean) Dependent pairs of observations (xi, yi) Note: No mathematical problem Interpretation is incorrect
38
GRADE
GRADE
7 6 5 4 3 4 5 6 7
7 6 5 4 3 4 5 6 7
STUDY DURATION
STUDY DURATION
39
Prediction
Regression analysis
40
QUESTIONS?
41