Chapter 8
Chapter 8
Chapter 8
LEARNING OBJECTIVES:
INTRODUCTION
Definition
Independent variable, x – The variables used to predict or model y and denoted by the symbol
x1 , x2 , x3 , etc.
Scatter Diagram
A scatter diagram is simply a two dimensional Cartesian plot of paired xi , yi values, where
i 1, 2,3, , n . From the diagram, we can have an idea about the kind of relationship between
the two variables.
Example 8.1
The data below obtained from a study of age and systolic blood pressure of six randomly
selected patients. Draw the scatter plot to view the relationship between age and pressure.
Patient Age, x Pressure, y
A 43 128
B 48 120
C 56 135
D 61 143
E 67 141
F 70 152
Solution:
160
150
140
Pressure
130
120
110
40 50 60 70
Age
THE COEFFICIENT OF DETERMINATION
The coefficient of determination is a measure if the variation of the dependent variable that is
explained by the regression line and the independent variable.
Formula
where
S
2
ˆ1S xy SSR
r R2
2 xy
S xx S yy S yy S yy
where
2
n
xi
S xx xi i 1
n
2
i 1 n
2
n
yi
yi 2 i 1
n
S yy
i 1 n
n n
xi yi
S xy xi yi i 1 i 1
n
i 1 n
Table below shows that the value of r and the relationship between variables:
Value of r Relationship between Variables
r 1.00 Perfect positive linear relationship.
r 1.00 Perfect negative linear relationship.
0.50 r 1.00 Strong positive linear relationship.
0.50 r 1.00 Strong negative linear relationship.
0 r 0.50 Weak positive linear relationship.
0 r 0.50 Weak negative linear relationship.
r 0 None linear relationship.
y 0 1 x
where
x = independent variable or predictor
y = dependent variable or response variable
0 = the y – intercept of the line
1 = the slope of the line
= a statistical error, that is, it is a random variable that accounts for the failure of
the model to fit the data exactly.
By using the least square method, we may estimate the unknown parameters, 0 and 1 , in
order to obtain the best-fitting line for a set of data. The least square method is the minimization
procedure for estimating the parameters. The estimated, or fitted regression line is given by
ŷ ˆ0 ˆ1 x
where
1 n 1 n
ˆ0 y ˆ1 x ;y yi , x xi
n i 1 n i 1
S
ˆ1 xy
S xx
Example 8.2
The data obtained in a study of age and blood pressure are as follow:
Age, x Pressure, y
43 128
48 120
56 135
58 137
61 143
67 141
70 152
a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0
Solution:
Example 8.3
A study was made by a businesswoman to determine the relation between advertising cost daily
and sales closed. The data is as follow:
Advertising Costs (RM) Sales (RM)
40 385
25 395
30 475
40 490
50 560
25 480
a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0
Solution:
x y x2 xy y2
40 385
25 395
30 475
40 490
50 560
25 480
x 210 y 2785 x 2
7850 xy 99125 y 2
1313975
EXERCISE
1) A study is done to investigate if Statistics scores have some effect on students’ CPA
scores. Data below are Statistics final examination scores of 10 randomly students and
their corresponding CPA scores.
Scores, x 87 69 75 56 63 90 71 74 80 78
CPA, y 3.41 3.15 3.28 2.46 2.89 3.73 3.11 3.23 3.50 3.34
2) A supervisor wants to determine the relationship between the age of her employee and
the number of sick days they take each year. The data is as follow:
Age, x 18 21 25 36 48 53
Days, y 16 12 9 5 6 2
a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficient.
c) Find ˆ and ̂ .
1 0
3) A researcher wishes to study the relationship between the monthly e-commerce sales
and the online advertising cost. You have the survey results for 7 online stores for the
last year. The data were recorded as follow:
a) Find S xx , S yy and S xy .
b) Find and interpret the sample correlation coefficients.
c) Find ˆ and ̂ .
1 0
4) A study was made on the amount of converted sugar in a certain process at various
temperatures. The data were coded and recorded as follows.
Temperature, x Converted sugar, y
1.0 8.1
1.1 7.8
1.2 8.5
1.3 9.8
1.4 9.5
1.5 8.9
1.6 8.6
1.7 10.2
1.8 9.3
1.9 9.2
2.0 10.2