0% found this document useful (0 votes)
15 views14 pages

Chapter-9

The document discusses simple linear regression and correlation, focusing on the relationship between two variables. It explains the concepts of regression and correlation analysis, providing formulas for calculating correlation coefficients and regression lines. Additionally, it includes examples and methods for estimating parameters and interpreting results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views14 pages

Chapter-9

The document discusses simple linear regression and correlation, focusing on the relationship between two variables. It explains the concepts of regression and correlation analysis, providing formulas for calculating correlation coefficients and regression lines. Additionally, it includes examples and methods for estimating parameters and interpreting results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

II: simple linear regression and correlation

CHAPTER 9

2. SIMPLE LINEAR REGRESSION AND CORRELATION

Linear regression and correlation is studying and measuring the linear


relation ship among two or more variables. When only two variables are
involved, the analysis is referred to as simple correlation and simple linear
regression analysis, and when there are more than two variables the term
multiple regression and partial correlation is used.

Regression Analysis: is a statistical technique that can be used to develop a


mathematical equation showing how variables are related.

Correlation Analysis: deals with the measurement of the closeness of the


relation ship which are described in the regression equation.
We say there is correlation when the two series of items vary together
directly or inversely.

Simple Correlation

Suppose we have two variables X  ( X 1 , X 2 ,...X n ) and


Y  (Y1 , Y2 ,...Yn )
 When higher values of X are associated with higher values of Y and
lower values of X are associated with lower values of Y, then the
correlation is said to be positive or direct.

Examples:
- Income and expenditure
- Number of hours spent in studying and the score obtained
- Height and weight
- Distance covered and fuel consumed by car.

 When higher values of X are associated with lower values of Y and


lower values of X are associated with higher values of Y, then the
correlation is said to be negative or inverse.

Examples:
- Demand and supply
- Income and the proportion of income spent on food.

1
II: simple linear regression and correlation

The correlation between X and Y may be one of the following


1. Perfect positive (slope=1)
2. Positive (slope between 0 and 1)
3. No correlation (slope=0)
4. Negative (slope between -1 and 0)
5. Perfect negative (slope=-1)
The presence of correlation between two variables may be due to three
reasons:

1. One variable being the cause of the other. The cause is called
“subject” or “independent” variable, while the effect is called
“dependent” variable.
2. Both variables being the result of a common cause. That is, the
correlation that exists between two variables is due to their being
related to some third force.

Example:
Let X1= be ESLCE result
Y1=be rate of surviving in the University
Y2=be the rate of getting a scholar ship.

Both X1&Y1 and X1&Y2 have high positive correlation, likewise


Y1 & Y2 have positive correlation but they are not directly related,
but they are related to each other via X1.

3. Chance:

The correlation that arises by chance is called spurious correlation.

Examples:
 Price of teff in Addis Ababa and grade of students in USA.
 Weight of individuals in Ethiopia and income of individuals
in Kenya.

Therefore, while interpreting correlation coefficient, it is necessary to see if


there is any likelihood of any relation ship existing between variables under
study.
The correlation coefficient between X and Y denoted by r is given by

2
II: simple linear regression and correlation

r
 ( X i  X )(Yi  Y ) and the short cut formula is
 ( X i  X )  (Yi  Y )
2 2

n XY  ( X )(  Y )
r
[n X 2  ( X ) 2 ] [n Y 2  ( Y ) 2
r
 XY  nXY
[ X 2  nX 2 ] [ Y 2  nY 2 ]

Remark:

Always this r lies between -1 and 1 inclusively and it is also symmetric.

Interpretation of r

1. Perfect positive linear relationship ( if r  1)


2. Some Positive linear relationship ( if r is between 0 and 1)
3. No linear relationship ( if r  0)
4. Some Negative linear relationship ( if r is between -1 and 0)
5. Perfect negative linear relationship ( if r  1)

Examples:

1. Calculate the simple correlation between mid semester and final exam
scores of 10 students (both out of 50)

Student Mid Sem.Exam Final Sem.Exam


(X) (Y)
1 31 31
2 23 29
3 41 34
4 32 35
5 29 25
6 33 35
7 28 33
3
II: simple linear regression and correlation

8 31 42
9 31 31
10 33 34
Solution:
n  10, X  31.2, Y  32.9, X 2  973.4, Y 2  1082.4
 XY  10331,  X  9920,  Y  11003
2 2

r
 XY  nXY
[  X 2  n X 2 ] [  Y 2  nY 2 ]
10331  10(31.2)(32.9)

(9920  10(973.4)) (11003  10(1082 .4))
66.2
  0.363
182.5
This means mid semester exam and final exam scores have a slightly
positive correlation.

2. The following data were collected from a certain household on the


monthly income (X) and consumption (Y) for the past 10 months.
Compute the simple correlation coefficient.( Exercise)

X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360

The above formula and procedure is only applicable on quantitative data, but
when we have qualitative data like efficiency, honesty, intelligence, etc
We calculate what is called Spearman’s rank correlation coefficient as
follows:
Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula

4
II: simple linear regression and correlation

6 Di
2
rs  1 
n(n 2  1)
Where rs  coefficien t of rank correlatio n
D  the difference between paired ranks
n  the number of pairs
Example:
Aster and Almaz were asked to rank 7 different types of lipsticks, see if
there is correlation between the tests of the ladies.
Lipsticks A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7
Solution:
X Y R1-R2 D2
(R1) (R2) (D)
2 1 1 1
1 3 -2 4
4 2 2 4
3 4 -1 1
5 5 0 0
7 6 1 1
6 7 -1 1
Total 12
6 Di
2
6(12)
 rs  1   1   0.786
n(n 2  1) 7(48)

Yes, there is positive correlation.


Simple Linear Regression

- Simple linear regression refers to the linear relation ship between two
variables
- We usually denote the dependent variable by Y and the independent
variable by X.
- A simple regression line is the line fitted to the points plotted in the
scatter diagram, which would describe the average relation ship between
the two variables. Therefore, to see the type of relation ship, it is
advisable to prepare scatter plot before fitting the model.
5
II: simple linear regression and correlation

- The linear model is:

Y    X  
Where :Y  Dependent var iable
X  independen t var iable
  Re gression cons tan t
  regression slope
  random disturbanc e term
Y ~ N (   X ,  2 )
 ~ N (0,  2 )

- To estimate the parameters (  and  ) we have several methods:


 The free hand method
 The semi-average method
 The least square method
 The maximum likelihood method
 The method of moments
 Bayesian estimation technique.

- The above model is estimated by:

Yˆ  a  bX

Where a is a constant which gives the value of Y when X=0 .It is called
the Y-intercept. b is a constant indicating the slope of the regression line,
and it gives a measure of the change in Y for a unit change in X. It is also
regression coefficient of Y on X.

- a and b are found by minimizing SSE      (Yi  Yˆi )


2 2

Where : Yi  observed value


Yˆi  estimated value  a  bX i

And this method is known as OLS (ordinary least square)


6
II: simple linear regression and correlation

- Minimizing SSE   
2
gives

b
 ( X i  X )(Yi  Y )   XY  nXY
(Xi  X )  X  nX
2 2 2

a  Y  bX

Example 1: The following data shows the score of 12 students for Accounting
and Statistics
Examinations.

a) Calculate a simple correlation coefficient


b) Fit a regression line of Statistics on Accounting using least square
estimates.
c) Predict the score of Statistics if the score of accounting is 85.

Accounting Statistics
X Y
1 74.00 81.00
2 93.00 86.00
3 55.00 67.00
4 41.00 35.00
5 23.00 30.00
6 92.00 100.00
7 64.00 55.00
8 40.00 52.00
9 71.00 76.00
10 33.00 24.00
11 30.00 48.00
12 71.00 87.00

7
II: simple linear regression and correlation

Scatter Diagram of raw data.

Accounting Statistics
X2 Y2 XY
X Y
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75

8
II: simple linear regression and correlation

a)

The Coefficient of Correlation (r) has a value of 0.92. This indicates that the
two variables are positively correlated (Y increases as X increases).
b)

Using OLS:

 Yˆ  7.0194  0.9560 X is the estimated regression line.

9
II: simple linear regression and correlation

Scatter Diagram and Regression Line

c) Insert X=85 in the estimated regression line.

Yˆ  7.0194  0.9560 X
 7.0194  0.9560(85)  88.28

Example 2:

A car rental agency is interested in studying the relationship between the


distance driven in kilometer (Y) and the maintenance cost for their cars
(X in birr). The following summarized information is given based on
samples of size 5. (Exercise)

2
i 1 X i  147,000,000 i 1Yi  314
5 5 2

 i 1 X i  23,000  i 1 Yi  36 ,  i 1 X i Yi  212, 000


5 5 5
,

a) Find the least squares regression equation of Y on X


b) Compute the correlation coefficient and interpret it.
c) Estimate the maintenance cost of a car which has been driven for 6 km
10
II: simple linear regression and correlation

- To know how far the regression equation has been able to explain the
2
variation in Y we use a measure called coefficient of determination ( r )

 (Yˆ  Y )
2
i.e r  2

 (Y  Y )
2

Where r  the simple correlatio n coefficien t.


2
- r gives the proportion of the variation in Y explained by the regression of
Y on X.
- 1  r 2 gives the unexplained proportion and is called coefficient of
indetermination.

Example: For the above problem (example 1): r  0.9194

 r 2  0.8453  84.53% of the variation in Y is explained and only


15.47% remains unexplained and it will be accounted by the random term.

o Covariance of X and Y measures the co-variability of X and Y together.


It is denoted by S XY and given by

SX Y 
 ( X i  X )(Yi  Y )   XY  nXY
n 1 n 1
o Next we will see the relation ship between the coefficients.
2
S S
i. r  XY  r 2  X2 Y 2
S X SY S X SY
bS rS
ii. r X b Y
SY SX

11
II: simple linear regression and correlation

o When we fit the regression of X on Y , we interchange X and Y in all


formulas, i.e. we fit

Xˆ  a1  b1Y

b1 
 XY  nXY
 Y  nY
2 2

b1SY
a1  X  b1Y , r
SX

Here X is dependent and Y is independent.

Choice of Dependent and Independent variable

- In correlation analysis there is no need of identifying the dependent and


independent variable, because r is symmetric. But in regression analysis
If bYX is the regression coefficient of Y on X
bXY is the regression coefficient of X on Y

bYX S X bXY SY
Then r    r 2  bYX * bXY
SY SX

- Moreover, bYX and bX Y are completely different numerically as well


as conceptually.

- Let us consider three cases concerning these coefficients.

1. If the correlation is perfect positive, i.e. r  1 then the b values


reciprocals of each other.
2. If S X  SY , then irrespective of the value of r the b values are equal,
i.e. r  bYX  bXY ( but this is unlikely case)

12
II: simple linear regression and correlation

3. The most important case is when S X  SY and r  1, here the b


values are not equal or reciprocals to each other, but rather the two lines
differ , intersecting at the common point ( X , Y )
 Thus to determine if a regression equation is X on Y or Y
on X , we have to use the formula r  bYX * bXY
2

 If r  [1,1] , then our assumption is correct


 If r  [1,1] , then our assumption is wrong

Example: The regression line between height (X) in inches and weight (Y)
in lbs of male students are:

4Y  15 X  530  0 and
20 X  3Y  975  0
Determine which is regression of Y on X and X on Y

Solution
We will assume one of the equation as regression of X on Y and the other
as Y on X and calculate r

Assume 4Y  15 X  530  0 is regression of X on Y


20 X  3Y  975  0 is regression of Y on X

Then write these in the standard form.


530 4 4
4Y  15 X  530  0  X   Y  bXY 
15 15 15
 975 20 20
20 X  3Y  975  0  Y   X  bYX 
3 3 3
 4  20 
 r 2  bXY * bYX      1.78  1 ,
 15  3 
This is impossible (contradiction). Hence our assumption is not
correct. Thus
4Y  15 X  530  0 is regression of Y on X
20 X  3Y  975  0 is regression of X on Y
To verify:
13
II: simple linear regression and correlation

 530 15 15
4Y  15 X  530  0  Y   X  bYX 
4 4 4
975 3 3
20 X  3Y  975  0  X   Y  bXY 
20 20 20

 15  3  9
 r 2  bYX * bXY       0,1
 4  20  16

14

You might also like