Topic04 - Simple Linear Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

TOPIC04 – SIMPLE LINEAR REGRESSION

4.0 Linear Regression

 A functional relation between two variables is expressed by a mathematical formula.


 The variable X denotes the independent variable and the variable Y denotes the
dependent variable.
 A predictor variable X can be age, temperature, years of experience and etc, while
response variable Y can be price, sales, quantity and etc.
 Simple linear regression model only have one predictor variable and the regression
function is linear.
 The simple linear regression model is shown below
𝑌𝑌𝑖𝑖 = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋𝑖𝑖 + 𝜖𝜖𝑖𝑖 OR 𝑌𝑌� = 𝛽𝛽0 + 𝛽𝛽1 𝑋𝑋1
 The scatter plot is used to check if there are statistical relation between independent
variable X with dependent variable Y
 The least square methods are used to calculate the intercept and slope denoted by 𝛽𝛽̂0 and
𝛽𝛽̂1 respectively.
 The formula or equations for 𝑏𝑏0 and 𝑏𝑏1 are shown below :

∑(𝑋𝑋𝑖𝑖 −𝑋𝑋�)(𝑌𝑌𝑖𝑖 −𝑌𝑌�) 𝑆𝑆𝑥𝑥𝑥𝑥


𝑏𝑏1 = 2 OR 𝑏𝑏1 = 𝑆𝑆
∑(𝑋𝑋𝑖𝑖 −𝑋𝑋�) 𝑥𝑥𝑥𝑥

1
𝑏𝑏0 = 𝑛𝑛 (∑ 𝑌𝑌𝑖𝑖 − 𝑏𝑏1 ∑ 𝑋𝑋𝑖𝑖 ) OR 𝑏𝑏0 = 𝑌𝑌� − 𝑏𝑏1 𝑋𝑋�

4.0 LINEAR REGRESSION


4.1 Least Square Estimators

 The deviations 𝑋𝑋𝑖𝑖 − 𝑋𝑋� and 𝑌𝑌𝑖𝑖 − 𝑌𝑌� or the errors of the prediction are the vertical distances
between observed and predicted values.
 It is possible to find many lines for which the value sum of errors is equal to 0, or there is
one line for the value sum of square errors is a minimum where
𝑛𝑛 𝑛𝑛

� 𝑒𝑒𝑖𝑖 = 0 and � 𝑒𝑒𝑖𝑖2 is minimum


𝑖𝑖=1 𝑖𝑖=1
𝑆𝑆𝑆𝑆𝑆𝑆
 The sum of square errors is 𝑆𝑆𝑆𝑆𝑆𝑆 = ∑𝑛𝑛𝑖𝑖=1 𝑒𝑒𝑖𝑖2 and the mean square error 𝑀𝑀𝑀𝑀𝑀𝑀 = 𝑛𝑛−2

Example 1 :
A study was conducted about the age and weight from eight BWK year 2 students in UTHM.
The data of the age and the weight was shown below.
Sample Age (years) Weight (kg)
1 20 50
2 21 51
3 22 55
4 21 60
5 22 72
6 20 48
7 20 48
8 21 49

4.1 LEAST SQUARE ESTIMATORS


2
a) Find the value of 𝑆𝑆𝑥𝑥𝑥𝑥 , 𝑆𝑆𝑦𝑦𝑦𝑦 and 𝑆𝑆𝑥𝑥𝑥𝑥

b) Find the value of 𝑏𝑏0 and 𝑏𝑏1 and construct the regression model.

c) Predict the value of weight if the age of student is 25 years old.

4.1 LEAST SQUARE ESTIMATORS


3
Example 2 :
The student in class having a problem with the test score. The teacher examine the number
of attendance among 15 students in the class. The data is shown in the table below.
Sample Number of attendance Test Score
1 14 95
2 14 88
3 13 89
4 12 78
5 14 89
6 14 92
7 13 95
8 13 87
9 12 75
10 14 94
11 13 94
12 12 73
13 13 75
14 11 69
15 10 67

4.1 LEAST SQUARE ESTIMATORS


4
a) Find the value of 𝑆𝑆𝑥𝑥𝑥𝑥 , 𝑆𝑆𝑦𝑦𝑦𝑦 and 𝑆𝑆𝑥𝑥𝑥𝑥

b) Find the value of 𝑏𝑏0 and 𝑏𝑏1 and construct the regression model.

c) Predict the value of test score if the student is attended the class 8 times only.

4.1 LEAST SQUARE ESTIMATORS


5
4.2 Correlation Coefficient and Coefficient of Determination

 The measurement of linear association between dependent variable Y and independent


variable X are random is the correlation coefficient with magnitude r.
 The negative of positive value of r are measured according to the slope of the fitted
regression line and the range of r is –1 < 𝑟𝑟 < 1.
𝑆𝑆𝑥𝑥𝑥𝑥
 The formula for correlation coefficient is 𝑟𝑟 =
�𝑆𝑆𝑥𝑥𝑥𝑥 ∙ 𝑆𝑆𝑦𝑦𝑦𝑦

 The measure of 𝑅𝑅 2 is called coefficient of determination as the proportionate reduction


of the total variation of Y is reduced by introducing the predictor variable X.
 The range of coefficient of determination 𝑟𝑟 2 is 0 < 𝑅𝑅 2 < 1.
𝑆𝑆𝑆𝑆𝑆𝑆 𝑆𝑆𝑆𝑆𝑆𝑆
 The formula for coefficient of determination 𝑟𝑟 2 is 𝑅𝑅 2 = 𝑆𝑆𝑆𝑆𝑆𝑆 or 𝑅𝑅 2 = 1 − 𝑆𝑆𝑆𝑆𝑆𝑆 .

 The sum of square formulas are shown below :

𝑆𝑆𝑆𝑆𝑆𝑆 = �(𝑌𝑌𝑖𝑖 − 𝑌𝑌�)2 𝑂𝑂𝑂𝑂 𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑦𝑦𝑦𝑦

𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑏𝑏12 �(𝑋𝑋𝑖𝑖 − 𝑋𝑋�)2 𝑂𝑂𝑂𝑂 𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑏𝑏12 𝑆𝑆𝑥𝑥𝑥𝑥


𝑛𝑛
2
𝑆𝑆𝑆𝑆𝑆𝑆 = ��𝑌𝑌𝑖𝑖 − 𝑌𝑌�� 𝑂𝑂𝑂𝑂 𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑆𝑆𝑆𝑆𝑆𝑆 − 𝑆𝑆𝑆𝑆𝑆𝑆
𝑖𝑖=1

Alternative way to find 𝑅𝑅 2 is just squared the correlation coefficient r �

4.3 HYPOTHESIS TESTING ON SIMPLE LINEAR REGRESSION


6
Example 3 :
The research about the monthly sales from the market area Pagoh, Muar is conducted
recently. The data are collected based on the money spent for advertising (RM) and the
monthly sales (RM) shown below.
Month Advertising (RM) Sales (RM)
Jan 40,937 502,729
Feb 42,376 507,553
Mar 43,355 516,885
Apr 44,126 528,347
May 45,060 537,298
Jun 49,546 544,066
Jul 56,105 553,664
Aug 59,322 563,201
Sep 59,877 568,657
Oct 60,481 569,384
Nov 62,356 573,764
Dec 63,246 582,746

a) Find the value of 𝑆𝑆𝑥𝑥𝑥𝑥 , 𝑆𝑆𝑦𝑦𝑦𝑦 and 𝑆𝑆𝑥𝑥𝑥𝑥

4.3 HYPOTHESIS TESTING ON SIMPLE LINEAR REGRESSION


7
b) Find the correlation coefficient of r.

c) Find the value of 𝑏𝑏0 , 𝑏𝑏1 , 𝑆𝑆𝑆𝑆𝑆𝑆, 𝑆𝑆𝑆𝑆𝑆𝑆 and 𝑆𝑆𝑆𝑆𝑆𝑆.

d) Construct the regression model and find the coefficient of determination 𝑟𝑟 2 . Find the
value of sales if the market spent RM65,000 for advertising.

4.3 HYPOTHESIS TESTING ON SIMPLE LINEAR REGRESSION


8
4.3 Hypothesis Testing on Simple Linear Regression

 The hypothesis testing on simple linear regression is conducted to test the


appropriateness of the linear regression model.
 Analysis of Variance (ANOVA) is used to make decision whether the null hypothesis
should be rejected or not.
 There are 2 ways to find the test statistics, either use 𝑡𝑡-test or 𝐹𝐹-test.
𝑏𝑏 𝑀𝑀𝑀𝑀𝑀𝑀
 The formula for finding 𝑡𝑡-test is 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 1 ) where 𝑠𝑠(𝑏𝑏1 ) =
𝑠𝑠(𝑏𝑏 1 𝑆𝑆𝑥𝑥𝑥𝑥
𝑀𝑀𝑀𝑀𝑀𝑀
 The formula for finding 𝐹𝐹-test is 𝐹𝐹𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 = 𝑀𝑀𝑀𝑀𝑀𝑀

Example 4 :

a) Refer to the answer from example 3, create the ANOVA Table

Source of Variation Sum of Square Degrees of Freedom Mean Square 𝑭𝑭𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕


Regression
Residuals
Total

b) Find 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 . Then, test the appropriateness of the regression model for monthly sales
from the market area Pagoh.

4.3 HYPOTHESIS TESTING ON SIMPLE LINEAR REGRESSION


9
Example 5 :
The data shown below is about the survey company that are looking for years of experience
and their salaries.
Years of Experience Salaries
1.5 37,731
1.1 39,343
2.2 39,891
2 43,525
1.3 46,205
3.2 54,445
4 55,749

The output from excel shows the result from ANOVA Table.

4.3 HYPOTHESIS TESTING ON SIMPLE LINEAR REGRESSION


10
a) Construct the regression model and find the value of coefficient of determination.

b) Test the appropriateness of the regression model about salaries in the company.

4.3 HYPOTHESIS TESTING ON SIMPLE LINEAR REGRESSION


11

You might also like