Lecture 5 - Correlation
Lecture 5 - Correlation
Lecture 5 - Correlation
RATIO/INTERVAL
NORMAL DATA ORDINAL DATA
DATA
Pearson a) Spearman
correlation a) Chi Power Two correlation
b) Cramer V b) Kruskal -Wallis
Correlation
What is the relationship between air temperature and
rainfall?
What is the relationship between air temperature
and flooding incidences?
Data scale
Pearson’s r
Spearman’s rho
Spearman’s rho
Kendall’s tau
Kendall’s tau
Numerical summary of the data — Correlation
Overall pattern
Direction
Form
Strength
a straight line.
Numerical summary
• After investigating the data visually, a numerical
summary of the strength of the association
between the two variables is often desired.
• This can be achieved with the population
correlation coefficient, which measures the strength
of the linear association between two variables, X
and Y .
• Since X and Y are two quantitative variables, is
also known as the Pearson correlation coefficient or
Pearson’s product-moment correlation coefficient.
PEARSON’s
CORRELATION
• The technique of determining this relationship is
called PEARSON'S CORRELATION ANALYSIS
• The strength of a relationship can be measured
by using an index called PEARSON'S
CORRELATION COEFFICIENT (r)
• The value of the correlation coefficient is
between -1 and +1
• A value of +ve indicates a direct (direct)
relationship and a value of –ve indicates an
inverse relationship
• Coefficients approaching the value of -1 or +1
indicate a strong relationship
Direct relationship (positive r)
The relationship between the price of rice and the weight
of rice
Price (RM)
80
60
40
20
0
5 10 15 20 Weight
(kg)
Inverse relationship (negative r)
Relationship between Car
Price (RM) Price and Car Age
80
60
40
20
0
5 10 15 20 Age (years)
• Pearson's correlation is a type of
analysis that is often used
• It can ONLY be used for data in Interval
Scale, Ratio Scale and Percentage only.
• This method should NOT be used on
NORMAL SCALE data
The strength of relationships
• Depending on the value of r
• Conclusions about the strength of the relationship
depend on the researcher
• Example:
Coefficient Correlation (r) (+ Strength Correlation
ve @ - ve )
0.91 – 1.00 Very Strong
0.71 – 0.90 Strong
0.51 – 0.70 Moderate
0.31 – 0.50 Weak
0.01 – 0.30 Very Weak
0.00 No Relationship
The formula for calculating r
r= n Σ xy - ( Σ x)( Σ y)
[ n Σ x 2 - ( Σ x) 2 ] [ n Σ y 2 - ( Σ y) 2 ]
Example 1
= 72,603 – 62,839
[41,265 – 36,481] [130,833 – 108,241]
= 9,764 = 9,764
[4,784] [22,592] 108,080,128
= 9,764
10,396.16
= 0.94
• A geographer has done a study on the
use of organic fertilizers on vegetable
production.
• Is there a relationship between the rate
of fertilizer use (g/l) and vegetable
production (kg)
Fertilizer (g/l) Yield (kg)
28 12
32 16
65 42
79 51
41 38
38 30
22 35
35 49
24 42
50 21
SPEARMAN'S
CORRELATION
Ordinal form data
Introduction
• The concept of SPEARMAN CORRELATION ANALYSIS
is the same as PEARSON CORRELATION ANALYSIS
• The strength of a relationship can be measured by
using an index called SPEARMAN'S CORRELATION
COEFFICIENT ( p )
• Difference - for ORDINAL data only
• Usually the data has only 2 categories, if there are
more than 2 categories, use the Kruskal Wallis Test
• The value of the correlation coefficient is between -1
and +1
• A value of +ve indicates a direct (direct) relationship
and a value of –ve indicates an inverse relationship
• Coefficients approaching the value of -1 or +1 indicate
a strong relationship
Spearman's correlation coefficient ( p)
p = 1- 6Σd
n(n
2 2
-1)
55 52 (10)
Step 2 - Calculate the rank difference for x and
y
Mathematics science Rank x Rank y d d2
(x) ( xy )
100 98 1 1 0 0
99 69 2 7 -5 25
97 83 3 3 0 0
95 70 4 6 -2 4
89 75 5 4 1 1
86 72 6 5 1 1
76 85 7 2 5 25
75 65 8 8 0 0
65 68 9 9 0 0
55 52 10 10 0 0
Total d 2 = 56
p=1- 6 x 56
10(10 2 -1)
p=1- 336
990
p = 1 - 0.339
p = 0.661
p = 1 – 546
4080
p = 1 – 0.1338
p = 0.87
ATTENTION
• For data RATIO (Ratio)/INTERVAL
(Interval); if the question does not specify
which method to use, you can choose the
PEARSON or SPEARMAN method
• Each method has
disadvantages/advantages
Calculate the relationship between BMI
and age
Student BMI Age
Student A1 16.5 37
A2 students 18.2 29
Student A3 19.4 65
F1 student 19.7 18
F2 students 19.9 52
G1 students 20.5 33
G2 student 21.1 26
Student H 22.3 20
B1 student 22.5 21
B2 student 24.0 29
Student C1 24.6 48
C2 students 25.7 45
Student D 28.9 32
Student E 29.5 54