Chapter-9
Chapter-9
CHAPTER 9
Simple Correlation
Examples:
- Income and expenditure
- Number of hours spent in studying and the score obtained
- Height and weight
- Distance covered and fuel consumed by car.
Examples:
- Demand and supply
- Income and the proportion of income spent on food.
1
II: simple linear regression and correlation
1. One variable being the cause of the other. The cause is called
“subject” or “independent” variable, while the effect is called
“dependent” variable.
2. Both variables being the result of a common cause. That is, the
correlation that exists between two variables is due to their being
related to some third force.
Example:
Let X1= be ESLCE result
Y1=be rate of surviving in the University
Y2=be the rate of getting a scholar ship.
3. Chance:
Examples:
Price of teff in Addis Ababa and grade of students in USA.
Weight of individuals in Ethiopia and income of individuals
in Kenya.
2
II: simple linear regression and correlation
r
( X i X )(Yi Y ) and the short cut formula is
( X i X ) (Yi Y )
2 2
n XY ( X )( Y )
r
[n X 2 ( X ) 2 ] [n Y 2 ( Y ) 2
r
XY nXY
[ X 2 nX 2 ] [ Y 2 nY 2 ]
Remark:
Interpretation of r
Examples:
1. Calculate the simple correlation between mid semester and final exam
scores of 10 students (both out of 50)
8 31 42
9 31 31
10 33 34
Solution:
n 10, X 31.2, Y 32.9, X 2 973.4, Y 2 1082.4
XY 10331, X 9920, Y 11003
2 2
r
XY nXY
[ X 2 n X 2 ] [ Y 2 nY 2 ]
10331 10(31.2)(32.9)
(9920 10(973.4)) (11003 10(1082 .4))
66.2
0.363
182.5
This means mid semester exam and final exam scores have a slightly
positive correlation.
X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360
The above formula and procedure is only applicable on quantitative data, but
when we have qualitative data like efficiency, honesty, intelligence, etc
We calculate what is called Spearman’s rank correlation coefficient as
follows:
Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula
4
II: simple linear regression and correlation
6 Di
2
rs 1
n(n 2 1)
Where rs coefficien t of rank correlatio n
D the difference between paired ranks
n the number of pairs
Example:
Aster and Almaz were asked to rank 7 different types of lipsticks, see if
there is correlation between the tests of the ladies.
Lipsticks A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7
Solution:
X Y R1-R2 D2
(R1) (R2) (D)
2 1 1 1
1 3 -2 4
4 2 2 4
3 4 -1 1
5 5 0 0
7 6 1 1
6 7 -1 1
Total 12
6 Di
2
6(12)
rs 1 1 0.786
n(n 2 1) 7(48)
- Simple linear regression refers to the linear relation ship between two
variables
- We usually denote the dependent variable by Y and the independent
variable by X.
- A simple regression line is the line fitted to the points plotted in the
scatter diagram, which would describe the average relation ship between
the two variables. Therefore, to see the type of relation ship, it is
advisable to prepare scatter plot before fitting the model.
5
II: simple linear regression and correlation
Y X
Where :Y Dependent var iable
X independen t var iable
Re gression cons tan t
regression slope
random disturbanc e term
Y ~ N ( X , 2 )
~ N (0, 2 )
Yˆ a bX
Where a is a constant which gives the value of Y when X=0 .It is called
the Y-intercept. b is a constant indicating the slope of the regression line,
and it gives a measure of the change in Y for a unit change in X. It is also
regression coefficient of Y on X.
- Minimizing SSE
2
gives
b
( X i X )(Yi Y ) XY nXY
(Xi X ) X nX
2 2 2
a Y bX
Example 1: The following data shows the score of 12 students for Accounting
and Statistics
Examinations.
Accounting Statistics
X Y
1 74.00 81.00
2 93.00 86.00
3 55.00 67.00
4 41.00 35.00
5 23.00 30.00
6 92.00 100.00
7 64.00 55.00
8 40.00 52.00
9 71.00 76.00
10 33.00 24.00
11 30.00 48.00
12 71.00 87.00
7
II: simple linear regression and correlation
Accounting Statistics
X2 Y2 XY
X Y
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75
8
II: simple linear regression and correlation
a)
The Coefficient of Correlation (r) has a value of 0.92. This indicates that the
two variables are positively correlated (Y increases as X increases).
b)
Using OLS:
9
II: simple linear regression and correlation
Yˆ 7.0194 0.9560 X
7.0194 0.9560(85) 88.28
Example 2:
2
i 1 X i 147,000,000 i 1Yi 314
5 5 2
- To know how far the regression equation has been able to explain the
2
variation in Y we use a measure called coefficient of determination ( r )
(Yˆ Y )
2
i.e r 2
(Y Y )
2
SX Y
( X i X )(Yi Y ) XY nXY
n 1 n 1
o Next we will see the relation ship between the coefficients.
2
S S
i. r XY r 2 X2 Y 2
S X SY S X SY
bS rS
ii. r X b Y
SY SX
11
II: simple linear regression and correlation
Xˆ a1 b1Y
b1
XY nXY
Y nY
2 2
b1SY
a1 X b1Y , r
SX
bYX S X bXY SY
Then r r 2 bYX * bXY
SY SX
12
II: simple linear regression and correlation
Example: The regression line between height (X) in inches and weight (Y)
in lbs of male students are:
4Y 15 X 530 0 and
20 X 3Y 975 0
Determine which is regression of Y on X and X on Y
Solution
We will assume one of the equation as regression of X on Y and the other
as Y on X and calculate r
530 15 15
4Y 15 X 530 0 Y X bYX
4 4 4
975 3 3
20 X 3Y 975 0 X Y bXY
20 20 20
15 3 9
r 2 bYX * bXY 0,1
4 20 16
14