Mhsagujar@neu Edu PH
Mhsagujar@neu Edu PH
Mhsagujar@neu Edu PH
ph
Correlation Analysis
Is IQ related to
I think savings educational
attainment?
is related to
weight.
• A correlation is a relationship or
association between two variables.
• A correlation coefficient is a
numerical measure of the linear
relationship between two variables.
• Some variables are so related in such a
way that if you know one of them, the
others can be estimated. Think of the
relationship engendered by this boxed
question.
What do you think is the relationship
between the number of hours spent in
studying (variable 1) and the grades
received (variable 2)?
25
20
15
r=1
10
0
10 15 20 25
A direct or positive relationship between
two variables implies that an increase in
value of some of the variables corresponds
to an increase in value of the other variable.
25
Example: IQ and
20 • • academic grades
15 •
• •
10 • •
•
5
•
0
10 15 20 25
0<r<1
• In reality, you could seldom find variables
with perfect positive correlation.
Oftentimes, you will come across variables
with only some degree of positive
relationship.
• In a perfect positive correlation, all the
points can be contained in a straight line
whose movement is upward right. Now
what do you notice in the “some positive
correlation”? Can they contained in one
straight line? If not describe the general
direction of the points.
What do you think is the relationship
between the number of absences in class
(variable 1) and the grades received
(variable 2)?
25
20
15
10
0
20 15 10 5
r = -1
5 10 15 20
• Again, this type of relationship is
not true for all. In real life, you
can only get some degree of
negative relationship.
An inverse or negative relationship between
two variables implies that an increase in
value of some of the variables corresponds
to a decrease in value of the other variable.
25
20
• • •
15
• •• •
10
• •
5 •
0
20 15 10 5
-1 < r < 0
5 10 15 20
• Analyze the relationships of the
following variables.
a) shoe size and IQ
b) waist line and GPA
c) number of books in the library
and the number of points made in
a basketball game.
• Yes! There are many variables
which do not have correlation at
all. Thus, there exists a zero
correlation.
A zero relationship exists between two
variables if an increase in value of one of
the variables is not accompanied by either
an increase or a decrease in value of the
other variable.
25
• • Example: Intelligence
20
• •• • •• • and gender
15
•• •
10
•
r=0
5
0
20
5 15
10 10
15 5
20
• To determine the degree of
relationship between two variables,
the “Pearson product-moment
correlation coefficient or simply
Pearson’s “r” formula will be used.”
The formula and the extent or the
degree of relationship are given in
the boxes below.
The Pearson product-moment
correlation coefficient or simply
Pearson r
n XY − X Y
r=
(
n X 2
) − ( X ) n Y − ( Y )
2
2 2
A correlation coefficient is the
magnitude or the degree of relationship
between two variables.
between 0.80 to 0.99 high correlation
between 0.60 to 0.79 moderately high correlation
between 0.40 to 0.59 moderate correlation
between 0.20 to 0.39 low correlation
between 0.01 to 0.19 negligible correlation
• For manual computation, you may
refer to the formula. However, it will
be easier if you have the required
calculator with LR/stat1/stat2/statxy
mode.
• Do the computation using the
example on the number of hours
spent in studying and the grades
received.
Hours spent in
studying (x)
2 2 2 3 3 4 5 5 6 6
Grades
57 63 70 72 69 75 73 84 82 89
received (y)
2 2
10 ( 2914 ) − ( 38 )( 734 )
r=
(10 )(168 ) − ( 38 ) (10 )( 54718 ) − ( 734 )
2 2
r = 0.8851144396
r = 0.89
2. An education researcher wishes to
determine the extent of relationship of the
results of the tests between reading
comprehension test and the vocabulary test
of 12 students. The results of the tests are
as follows:
X 3 7 2 9 8 4 1 10 16 5 3 8
y 11 1 19 5 17 3 15 9 15 8 12 4
3 11 9 121 33
7 1 49 1 7
2 19 4 361 38
9 5 81 25 45
8 17 64 289 136
4 3 16 9 12
1 15 1 225 15
10 9 100 81 90
16 15 256 225 240
5 8 25 64 40
3 12 9 144 36
8 4 64 16 32
76 119 168 1561 724
n XY − X Y
r=
(
n X 2
) − ( X ) n Y − ( Y )
2
2 2
12 ( 724 ) − ( 76 )(119 )
r=
(12 )( 678 ) − ( 76 ) (12 )(1561) − (119 )
2 2
r = −0.1084
r = −0.11
X ( or ";" or "," Y then M +, that is
2 ( or ";" or "," 57 then M +,
2 ( or ";" or "," 63 then M +,
2 ( or ";" or "," 70 then M +,
3 ( or ";" or "," 72 then M +,
3 ( or ";" or "," 69 then M +,
4 ( or ";" or "," 75 then M +,
5 ( or ";" or "," 73 then M +,
5 ( or ";" or "," 84 then M +,
6 ( or ";" or "," 82 then M +,
6 ( or ";" or "," 89 then M +,
• To get the correlation coefficient,
press SHIFT or RCL then “r”,
0.8851144396 will be display. In two
decimal places, rxy = 0.89 which is
interpreted as high correlation.
• Another important and interesting
statistics which can be obtained from
the correlation coefficient (r), is the
coefficient of determination “r2”. This
tells us how much of Y (grades) is
due to or can be attributed to X
(number of hours spent in studying).
Thus, if you square “r”, that is
0.8851143962, you will get
0.783427495.
• This value is interpreted as follows:
120
100
•
80 •
• •
Y
•
60
•
40
20
0
1 2 3 4 5 6 7
X
Testing the significance of correlation
• After learning how to get and interpret the
value of “r”, your next task is to determine
whether the correlation, which exists
between the variables, is significant and
not just due to chance. This time, it is
testing the significance of correlation.
• There are several ways to test if “r” is
significant. One can use the t-test for
correlation coefficient with r n−2
the formula: t=
df = n - 2 1− r2
r n−2 n−2
t= or t=r
1− r 2 1− r 2
0.8851144396 10 − 2
t=
1 − ( 0.8851144396 )
2
2.503481689
t=
0.465373429
t = 5.379511443
t = 5.3795
Approaches in Hypothesis Testing
A. Critical value approach
5 – step solution
1. H0: ______________
Ha: ______________
2. = _____; Cri-value = _____
3. Decision rule: Reject H0 if
Comp − value Cri − value
4. Decision:
5. Conclusion:
5 – step solution (Let r be the pop. Correlation)
1. H0: r = 0; There is no correlation between the
no. of hours spent in studying and the grades
received. (rho is the symbol for population r)
Ha: r ≠ 0; There is a correlation between the
number of hours spent in studying and the
grades received.
5 – step solution
1. H0: ______________
Ha: ______________
2. = _____; p-value = _____
3. Decision rule: Reject H0 if
p − value
4. Decision:
5. Conclusion:
5 – step solution (Let r be the pop. Correlation)
1. H0: r = 0; There is no correlation between the
no. of hours spent in studying and grades
received. (rho is the symbol for population r)
Ha: r ≠ 0; There is a correlation between the
number of hours spent in studying and the
grades received.
2. = 0.05; rcomp = 0.89; p-value for slope =
0.000662 (found in the printout)
3. Decision rule: Reject H0 if p-value for slope
(0.000662 (0.05).
4. Decision: Reject H0, because p-value for
slope (0.000662 < (0.05).
5. Conclusion: There is a significant correlation
between the number of hours spent in
studying and the grades received. Hence, as
the number of hours spent in studying
increases, the grades received also increase.