0% found this document useful (0 votes)

17 views19 pages

Chapter 2

Uploaded by

fxiqxxhjxnnxh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views19 pages

Chapter 2

Uploaded by

fxiqxxhjxnnxh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

CHAPTER 2

MULTIPLE LINEAR REGRESSION

So far, the simple regression model only able to examine the relationship between
two variables. However, the model may have large unexplained variation, that is
having low R2 which indicate poor fit to the data. The poor fir may well be due to
the fact that the response variable, y depends on not just x but a few other factors as
well. When used alone, x fails to be a good predictor of y since the effect of those
other influencing variables have not been taken into the modeling. Naturally, adding
more factors to the model would explain more variation in the response variable.
Probabilistic model that is used to examine the relationship between two or more
independent variable and a dependent variable is called multiple regression model.
Multiple regression model should be a better model for predicting the dependent
variable.

2.1 MULTIPLE REGRESSION MODELS

Suppose that the yield (in tons) of crops depends on not only the amount of fertilizer
but also on the rainfall and average temperature. A multiple regression model that
might describe this relationship is
y = 0 + 1x1 +  2 x2 + 3 x3 + 
where y denotes the yield, x1 denotes the amount of fertilizer, x2 denotes the average
rainfall and x3 denotes the average temperature. This is a multiple linear regression
model with three independent variables. The term linear is used as the equation is a
linear function of the unknown parameters 0 , 1 , 2 and 3 .

The parameter  0 is the intercept of the regression plane. If the range of the data
includes x1 = x2 = x3 = 0 , then  0 is the mean of y when x1 = x2 = x3 = 0 . Otherwise  0
has no physical interpretation. The parameter 1 indicates the expected change in
response variable, y per unit change in x1 when x2 and x3 are held constant.
Similarly, 2 measures the expected change in y per unit change in x2 when x1 and
x3 are held constant.

In general, the response (dependent) y may be related to k independent or predictor

variables. The model
y = 0 + 1x1 +  2 x2 + +  k xk + 
is called a multiple linear regression model with k predictors. The parameters
 j , j = 1, 2, , k , are called the regression coefficients. This model describes a
hyperplane in the k-dimensional space of the independent variables x j . The
parameter  j represents the expected change in the response y per unit change in x j

1
when all of the remaining independent variables xi ( i  j ) are held constant. For
this reason the parameters  j , j = 1, 2, , k , are often called partial regression
coefficients.
Consider the following estimated 2 predictor variables relating performance at
university, performance at high school and entrance test score:
UGPA = 1.29 + 0.453HSGPA + 0.094ENTS
Since no one who attends college has either a zero high school GPA or a zero on
university entrance test, the intercept in this equation is not meaningful. Holding
ENTS fixed, another point on HSGPA is associated with additional 0.453 of a point
on the university GPA. That is, if we choose two students A and B and these students
have the same ENTS score, but the high school GPA of student A is one point higher
than the high school GPA of student B, the we would expect student A to have
university GPA 0.453 higher than that of student B.
Multiple linear regression models are often used as empirical models or
approximating functions. That is, the true functional relationship between y and
x1 , x2 ,..., xk is unknown, but over certain ranges of the independent variables, the
linear regression model is an adequate approximation to the true unknown function.
The variable,  is the error term containing factors other than x1 , x2 ,..., xk that affect
y. The independent variables can be function of variables such as higher order term,
interaction between variables as well as coded/dummy variables.
CMOX = 0 + 1TAR + 2TAR2 + 3 FIL + 

2.1.1 ESTIMATION OF MODEL PARAMETERS

The method of least squares can be used to estimate the regression coefficients in
a multiple regression model. Suppose that n  k observations are available, and let
yi denote the ith observed response and xij denote the ith observation or level of
independent variable, x j . The error term  in the model has a normal distribution
with E (  ) = 0 , Var ( ) =  2 , and that the errors are uncorrelated.

The sample multiple regression model above may be rewritten as,

k
yi =  0 + 1xi1 +  2 xi 2 + +  k xik +  i =  0 +   j xij +  i
j =1

The method of ordinary least squares chooses the estimates, ˆ j to minimize the sum
of squared residuals. That is, given n observations on y and x j , the estimates ˆ j are
chosen simultaneously to make the least squares function

2
2
n k n 
S (  0 , 1 , ...,  k ) =   i2 =   yi −  0 −   j xij 
i =1 i =1  j =1 
as small as possible. The function S must be minimized with respect to
0 , 1, ,  k . The least squares estimators of 0 , 1, , k must satisfy
n  
S k
= −2  yi − ˆ0 −  ˆ j xij  = 0
0 ˆ0 , ˆ1 , , ˆk i =1  j =1 
and
S n k 
= −2 xij   yi − 0 −  ˆ j xij  = 0,
ˆ j = 1, 2, ,k
 j i =1  j =1 
ˆ0 , ˆ1 , , ˆk

giving least squares normal equations

n n n n
nˆ0 + ˆ1  xi1 + ˆ2  xi 2 + + ˆ k  xik =  yi
i =1 i =1 i =1 i =1

n n n n n
ˆ0  xi1 + 1  xi1 +  2  xi1 xi 2 +  xi1xik =  xi1 yi
ˆ 2 ˆ + ˆ k
i =1 i =1 i =1 i =1 i =1
n n n n n
ˆ0  xi 2 + ˆ1  xi1 xi 2 + ˆ2  xi22 + + ˆk  xi 2 xik =  xi 2 yi
i =1 i =1 i =1 i =1 i =1

n n n n n
ˆ0  xik + ˆ1  xi1xik + ˆ2  xi 2 xik + + ˆk  xik2 =  xik yi
i =1 i =1 i =1 i =1 i =1

Note that there are p = k + 1 normal equations, one for each of the unknown
regression coefficients. The solution to the normal equations will be the least squares
estimators 0 , 1, , k . It is more convenient to deal with multiple regression
models if there are expressed in matrix notation. This allows a very compact display
of the model, data, and results. In matrix notation, the regression model for k
independent variables can be written as:
y = Xβ + ε
where
 y1  1 x11 x12 x1k   0   1 
y  1 x x22 x2 k     
y =  2 , X= 21  , β =  1 , ε =  2
       
       
 yn  1 xn1 xn 2 xnk   k   n 

3
In general, y is an ( n x 1) vector of the observations, X is an ( n  p ) matrix of the
independent variables, β is a ( p  1) vector of the regression coefficients, and ε is
an ( n  1) vector of random errors.

The OLS estimation finds the vector of least squares estimators, β̂ , that minimizes:
n
S (  ) =  ε i2 = εε = ( y − X ) ( y − X )
i =1

= yy − Xy − yX + XX

= yy − 2Xy + XX

The least squares estimators must satisfy:

S
= −2Xy + 2XXˆ =   XXˆ = Xy
 ˆ

To solve the normal equations, multiply both sides by the inverse of XX . Thus, the
least squares estimator of β is given by:

ˆ = ( XX) Xy
−1

provided that the inverse matrix ( XX) exists. The ( XX) matrix will always exists
-1 -1

if the independent variables are linearly independent, that is if no column of the X

matrix is a linear combination of the other columns.
The matrix XX is a ( p x p ) symmetric matrix and Xy is a ( p  1) column vector.
Note the special structure of the XX matrix. The diagonal elements of XX , and
the off-diagonal elements are the sums of cross products of the elements in the
columns of X . Furthermore, note that the elements of Xy are the sums of cross
products of the columns of X and the observations yi . That is:

 n n n
  ˆ   n 
 n  xi1  xi 2 L  ik   0    yi 
x
 i =1 i =1 i =1     i =1 
 n n n n  ˆ   n 
  xi1  xi21  xi1xi 2 L  x i1 xik    
 1  x i1 yi 
 i =1 i =1 i =1 i =1    i =1 
 n n n n   =  n 
  xi 2  xi 2 xi1  xi22 L  i 2 ik   2   i 2 i 
x x ˆ x y
 i =1 i =1 i =1 i =1     i =1 
 M M M M     M 
 n   M  n 
 n n n
2   
  xik  xik xi1  xik xi 2 L  xik   ˆk    xik yi 
 i =1 i =1 i =1 i =1   i =1 

4
For even moderately sized n and k, solving the normal equations by hand calculation
is tedious. Luckily, for practical purpose, modern computers statistical software can
solve these equations, even for large n and k in a fraction of second.
The vector of fitted values yˆ i corresponding to the observed values yi is given by:
yˆ = Xβˆ = X ( XX) Xy = Hy
-1

The ( n  n ) matrix H = X ( XX ) X is usually called the hat matrix. The n

−1

residuals may be conveniently written in matrix notation as

e = y − yˆ
which can be written as: e = y − Xβˆ = y − Hy = ( I − H ) y . Note that H = ( I − H )
is a symmetric and an idempotent matrix with tr ( H ) = ( n − k ) .

Like a simple linear regression model, a multiple linear regression model is based
on certain assumptions. The major assumptions for multiple regression model are:
1) The probability distribution of the error has a mean of zero
2) The errors are independent. In addition, these errors are normally distributed and
have a constant standard deviation.
3) The independent variables are not linearly related. 4) There is no linear
association between the error and each independent variable.

Example: The Delivery Time Data

A soft drink bottler interested in predicting the amount of time required by the route
driver to service the vending machines in an outlet. The industrial engineer has
suggested that two most important variables affecting the delivery time (y) are the
number of cases of product stocked ( x1 ), and the distance walked by the route driver
( x2 ). The multiple linear regression model to be fitted is: y = 0 + 1x1 +  2 x2 + 
Delivery No of Distance Delivery No of Distance
Obs. Time (min), Cases, x1 (feet), x2 Obs. Time Cases, x1 (feet), x2
y (min), y
1 16.68 7 560 14 19.75 6 462
2 11.50 3 220 15 24.00 9 448
3 12.03 3 340 16 29.00 10 776
4 14.88 4 80 17 15.35 6 200
5 13.75 6 150 18 19.00 7 132
6 18.11 7 330 19 9.50 3 36
7 8.00 2 110 20 35.10 17 770
8 17.83 7 210 21 17.90 10 140
9 79.24 30 1460 22 52.32 26 810
10 21.50 5 605 23 18.75 9 450
11 40.33 16 688 24 19.83 8 635
12 21.00 10 215 25 10.75 4 150
13 13.50 4 255

5
1 7 560  16.68 
1 3 220  11.50 
  
1 3 340  12.03 
   
1 4 80  14.88 
1 6 150  13.75 
   
1 7 330  18.11 
1 2 110   8.00 
   
1 7 210  17.83 
1 30 1460   79.24 
   
1 5 605   21.5 
   
1 16 688   40.33
1 10 215   21.00 
   
X = 1 4 255  , y = 13.50 
1 6 462  19.75 
   
1 9 448   24.00 
1 10 776   29.00 
   
1 6 200  15.35 
1 7 132  19.00 
   
1 3 36   9.50 
   
1 17 770   35.10 
1 10 140  17.90 
   
1 26 810   52.32 
1 9 450  18.75 
   
1 8 635  19.83 
1 4 150  10.75 
  
The XX matrix is
1 7 560 
 1 1 1  1 3 220   25 219 10232 
XX =  7 3 4    =  219 3,055 133899 
     
560 220 150     10232 133899 6725688 
1 4 150 
and the Xy vector is
16.68
 1 1 1  11.50   559.60 

XX = 7 3 4    =  7375.44 
     
560 220 150    
 337072.00 
10.75 

6
The least squares estimator of β is βˆ = ( XX ) Xy or
-1

ˆ    25 219
−1
10,232   559.60 
  
 ˆ   =  219 3055 133899   7375.44 
  
 ˆ  10232 133899 6725688 337072.00
     
 0.11321528 −.00444859 −.00008367   559.60   2.34123115
=  −.00444859 0.00274378 −.00004786   7375.44  = 1.61590712 
     
 −.00008367 −.00004876 0.00000123  337072.00   0.01438483

The least squares fit (with the regression coefficients reported to 5 decimal places)
is
yˆ = 2.341 + 1.616x1 + 0.014x2

2.1.2 PROPERTIES OF THE LEAST SQUARES ESTIMATORS

The variance property of β̂ is expressed by the covariance matrix
   
() 
()   ()
Cov βˆ = E  βˆ − E βˆ  βˆ − E βˆ   = E  ( XX ) Xy − β  ( XX ) Xy − β  
 
-1

-1

which is a ( p  p ) symmetric matrix whose jth diagonal element is the variance of
ˆ and whose (ij)th off-diagonal element is the covariance between ˆ and ˆ . It
j i j

can proven that the covariance matrix of β̂ is given by:

()
Cov βˆ =  2 ( XX )
−1

Therefore, if let C = ( XX ) , the covariance of ˆ j is  2C jj and the covariance

−1

between ˆi and ˆ j is  2Cij .

2.1.3 ESTIMATION OF  2
As in simple linear regression, an estimator of  2 may be developed from the
residual sum of squares
n n
SSE =  ( yi − yˆi ) =  ei2 = ee
2

i =1 i =1

substituting e = y − Xβˆ , then

( 
SSE = y − Xβˆ y − Xβˆ )( )
= yy − βˆ Xy − yXβˆ + βˆ XXβˆ
= yy − 2βˆ Xy + βˆ XXβˆ

7
Since XXβˆ = Xy , this last equation becomes: SSE = yy − βˆ Xy

It can be shown that the residual sum of squares has ( n − p ) degrees of freedom
associated with it since p parameters are estimated in the regression model. The
residual mean square is estimated as:
SSE
ˆ 2 = MSE =
n− p

Example: Delivery Time Data

Estimate the error variance  2 for the multiple regression model fit to the soft drink
delivery time data. Note that,
25
yy =  yi2 = 18310.6290
i =1

 559.60 
βˆ Xy =  2.34123115 1.61590721 0.01438483  7375.44  = 18076.90304
 
337072.00 
The residual sum of squares is
SSE = yy − βˆ Xy
= 18310.6290 − 18076.9030 = 233.7260

Therefore, the estimate of σ 2 is the residual mean square

SSE 223.7260
ˆ 2 = = = 10.6239
n− p 25 − 3
Recall that the estimate of  2 is model dependent. The estimate of  2 for a simple
regression model involving only one dependent variable, cases ( x1 ) is 17.4841,
which is considerably larger that the estimate for the two independent multiple
regression model. Which estimate is correct? Both estimates are in a sense correct,
but heavily dependent on the choice of model. Which model is correct/better? Since
 2 is the variance of the errors, it is usually preferred to have a model with a smaller
residual mean square.

2.2 HYPOTHESIS TESTING IN MULTIPLE LINEAR REGRESSION

Several hypothesis testing procedures prove useful for addressing the questions:
1. Which is the overall adequacy of the model?
2. Which specific independent variable seem important?
The formal tests require that our random errors be independent and follow a normal
distribution with mean E ( i ) = 0 and variance Var ( i ) =  2 .

8
2.2.1 TEST FOR SIGNIFICANCE OF REGRESSION
The test for significance of regression is a test to determine if there is a linear
relationship between the response y and any of the independent variables
x1, x2 , ..., xk . This procedure is often thought of as an overall or global test of
model adequacy. The appropriate hypotheses are

H0: 1 =  2 = ... =  k = 0 vs. H1:  k  0 for at least one j

Rejection of this null hypothesis implies that at least one of the independent variables
x1, x2 , ..., xk contributes significantly to the model.

The test procedure is a generalization of the analysis of variance used in simple linear
regression. The total sum of squares SST is partitioned into a sum of squares due to
regression, SSR , and a residual sum of squares, SSE . Thus,
SST = SSR + SSE

It can be shown that if the null hypothesis is true, then SSR  2 follows a  k2
distribution, which has the number of degrees of freedom as number independent
variables in the model. It also can be shown that SSE  2  n2−k −1 and that SSE and
SS R are independent. The F statistic is given by:
SSR k MSR
F= =
SSE ( n − k − 1) MSE

Therefore, to test the hypothesis H0: 1 = 2 = ... = k = 0 , compute the test statistic
F and reject H 0 if F  F ,k , −k −1 .

A computational formula for SSE is found by starting with

SSE = yy − βˆ Xy
and since
2 2
 n   n 
n   yi    yi 
SST =  yi −
2  i =1  = yy −  i =1 
i =1 n n

the above equation may be written as

 n    n  
2 2

  yi     yi  
SSE = yy −  i =1  − βXˆ y −  i =1  
n  n 
 
 
SSE = SST − SSR

9
Therefore, the regression sum of squares is given as:
2
 n 
  yi 
ˆ y −  i =1 
SSR = βX
n
The test procedure is usually summarized in an analysis of variance ANOVA table
as below.
Table ANOVA for Significance of Regression in Multiple Regression
Source of Sum of Degrees of Mean
Variation Squares Freedom Square F
Regression SSR k MSR MSR MSE
Residual SSE n − k −1 MSE
Total SST n −1

Example: The Delivery Data

Test the significance of multiple regression model.
2
 n 
  yi 
( 559.60 )
2

SST = yy −  i =1  = 18310.6290 = = 5784.5426

n 25
2
 n 
  yi 
( )
2
ˆ y −
SSR = βX  i =1  = 18076.9030 =
559.60
= 5550.8166
n 25
SSE = SST − SSR = yy − βˆ Xy = 233.7260

ANOVA: Test for Significance of Regression for Delivery Time

Source of Sum of Degrees of Mean
F p-value
Variation Squares Freedom Square
Regression 5550.8166 2 2775.4083 261.24 4.7 x 10-16
Residual 233.7260 22 10.6239
Total 5784.5426 24

The analysis of variance is shown in the table above. To test

H0: 1 =  2 = ... =  k = 0 , the F statistic is
MSR 2775.4083
F= = = 261.24
MSRes 10.6239

Since the p-value is extremely small, it can be concluded that the delivery time is
linearly related to delivery volume and/or distance. However, this does not
necessarily imply that the relationship found is an appropriate one for predicting

10
delivery time as a function of volume and distance. Further tests of model adequacy
are required.

2.2.2 TEST ON INDIVIDUAL REGRESSION COEFFICIENTS

Once the F-test has determine that at least one of the independent variable is
important, a logical question becomes which one(s) is (are) the significant factor(s)
affecting the dependent variable.
The hypotheses for testing the significance of any individual regression coefficient
are
H0 :  j = 0 vs. H1 :  j = 0
If H 0 :  j = 0 is not rejected, then this indicates that the independent variable x j is
not significantly affecting the dependent variable and perhaps can be excluded from
the model. The test statistic for this hypothesis is
ˆ j ˆ j
t= =
se ˆ ( )
j ˆ 2C  jj

where C jj is the diagonal element of ( XX ) corresponding to ˆ j . The null

−1

hypothesis H0 :  j = 0 is rejected if t  tα 2, n−k −1 . Note that this is really a partial or

marginal test because the regression coefficient ˆ j depends on all of the other
independent variables xi ( i  j ) that are in the model. Thus, this is a test of the
contribution of x j given the other independent variables in the model.

Example: The Delivery Time Data

The hypotheses are: H0 : 1 = 0 vs. H1 : 1 = 0 and H0 :  2 = 0 vs. H1 :  2 = 0

The main diagonal element of ( XX ) corresponding to 1 and  2 are

−1

C11 = 0.00274378 and C22 = 0.00000123 respectively, so the corresponding t

statistic are given by:
ˆ1 1.61590712
t1 = = = 9.46
ˆ2C11 (10.6239 )( 0.00274378)
ˆ2 0.01438
t2 = = = 3.98
ˆ2C22 (10.6239)( 0.00000123)
Since critical value, t0.025, 22 = 2.074 , H0 : 1 = 0 and H0 : 2 = 0 are rejected and
conclude that both x1 and x2 , contribute significantly to the model, that is both
CASES and DISTANCE significantly affect the delivery TIME.

11
2.2.3 CONFIDENCE INTERVAL ON REGRESSION COEFFICIENTS
Confidence intervals on individual regression coefficients and confidence intervals
on the mean response given specific levels of the independent play the same
important role in multiple regression that they do in simple linear regression.

To construct confidence interval for the regression coefficients, assumptions for the
errors  i are normally and independently distributed with mean zero and variance
 2 is needed. Therefore, the observations yi are normally and independently
distributed with mean  0 +  j =1  j xij and variance  2 . Since the least squares
k

estimator β̂ is a linear combination of the observations, it follows that β̂ is normally

distributed with mean vector β and covariance matrix  2 ( XX ) . This implies that
−1

the marginal distribution of any regression coefficient ˆ j is normal with mean  j

and variance  2C jj , where C jj is the jth diagonal element of the ( XX )
−1
matrix.
Consequently, each of the statistics:
ˆ j −  j
, j = 1, 2, ..., k
ˆ2C jj
is distributed as t with ( n − p ) degrees of freedom, where   is the estimate of the
error variance.

The 100 (1 −  ) % confidence interval for the regression coefficient  j , j = 1, 2, ..., k

as
ˆ j − t 2, n− p ˆ 2C jj   j  ˆ j + t 2, n− p ˆ 2C jj

Example: The Delivery Time Data

Given ˆ2C11 = (10.6239 )( 0.00274378) = 0.17073 and ˆ2C22 = 0.0000131. The
95% confidence interval for the parameter 1 and 2 are calculated as the following:

ˆ1 − t0.025, 22 ˆ 2C11  1  ˆ1 + t0.025, 22 ˆ 2C11

1.61591 − ( 2.074 )( 0.17073 )  1  1.61591 + ( 2.074 )( 0.17073 )
1.26181  1  1.97001

ˆ2 − t0.025, 22 ˆ 2C22   2  ˆ2 + t0.025, 22 ˆ 2C22

0.01438 − ( 2.074 )( 0.0000131)   2  0.01438 + ( 2.074 )( 0.0000131)
0.01435   2  0.01441

12
Notice that for each case, the value of 0 falls outside the confidence interval,
supporting the rejection of the null hypotheses for testing individual coefficient.

2.3 MULTIPLE COEFFICIENT OF DETERMINATION, R2 AND ITS

ADJUSTED VERSION, R2
In the 2-variable case, R 2 measures the goodness-of-fit of the regression model, that
is, it gives the proportion or percentage of the total variation in the dependent
variable y that can be explained by the single independent variable, x. This notation
of R 2 can always be easily extended to regression models containing more than two
variables. Thus, in a 3-variable model, R 2 known as the multiple coefficient of
determination, measures the proportion of the variation in y that can be explained by
the independent variables x1 and x2 . The fit of the model is said to be “better” the
closer is R 2 to 1.

An important property of R 2 is that it is a nondecreasing function of the number of

independent/explanantory variables that are present in the model. As the number of
independent variables increases, R 2 almost inevitably increases and never decreases.
That is, an additional x (independent variable) will not decrease R 2 . Recall that
n n

 ( yˆi − y )  ( y − yˆ )
2 2
i
SSR SS SSE
R2 = = i =1
n
= ˆ1 xy = 1 − =1− i =1
n

( y − y ) ( y − y )
SST 2 SS yy SST 2
i i
i =1 i =1

( y − y ) is independent of the number of x variables in the model, the

2
Although i

 ( y − yˆ )
2
SSE, i depends on the number of independent variables in the model. It is
clear that as the number of x variables increases,  ( yi − yˆ ) is likely to decrease and
2

hence R 2 will increase. As such, in comparing two regression models with the
different number of independent variables, one should be wary of choosing the
model with the highest R 2 .

To compare two R 2 values, one must take into account the number of independent
variables that are present in the model. This can be done by considering the
alternative coefficient of determination, known as the “adjusted” R 2 or R 2 that is
defined as follows:
( SSE n − k − 1) = 1 − 1 − R2 n − 1
R2 = 1 −
( SST n − 1)
( ) n − k −1
The term adjusted refer to the adjustment made for the “degree of freedom”
associated with the sum of squares in SSE and SST. It should be clear from the
equation above that for k  1 ,

13
i) R 2  R 2 which implies that as the number of independent variables increases, the
R 2 increases less than the unadjusted R 2
ii) R 2 can be negative. In case R 2 is negative in practical application, its value is
taken as zero.
iii) The plot of R 2 against k, has a turning point!!!

The main attractiveness of R 2 is that it imposes a penalty for adding extra

independent variables to the model. As such, when comparing two models with
different (moderate) number of variables, R 2 should be used, rather than R 2 .
An interesting algebraic fact is the following: if we add an extra independent variable
to the regression model, R 2 increases, if and only if, the t-statistic on the new
variable is greater than one in absolute value.

Example: Delivery Time Data

ANOVA Table
Source of Sum of Degrees of Mean
F
Variation Squares Freedom Square
Regression 5550.8166 2 2775.4083 261.24
Residual 233.7260 22 10.6239
Total 5784.5426 24
SSR 5550.82
R2 = = = 0.9596 or 95.96%
SST 5784.54
R2 = 1 −
( SSE n − k − 1) = 1 − ( 233.73 22) = 0.9559 or 95.59%
( SST n − 1) 5784.54 24

2.4 MEAN RESPONSE AND PREDICTION OF NEW OBSERVATION

Think of the following two scenarios.
Suppose the population regression model relating food expenditure, monthly income
and number of children in the households is given as below:
y food = 0 + 1x1inc + 2 x2chil + 
Since E (  ) = 0 , the point estimate for the mean value of food expenditure for all
households with monthly income of RM5.5 thousand with 3 children is given by
( )
E y food x1 = 5.5, x2 = 3 =  0 + 1 x1inc +  2 x2mbr .

However, if we take a second sample of the same number of households from the
( )
same population, the point estimate of E y food x1 = 5.5, x2 = 3 is expected to be
different from those of the first sample. All possible samples of the same size taken
from the same population will give different point estimate. Therefore, a confidence

14
( )
interval for E y food x1 = 5.5, x2 = 3 will be a more reliable estimate than the point
estimate.
The second scenario is the same as above but the aim to predict the food expenditure
for one particular household with monthly income RM5.5 thousand and having 3
children. The point estimate is the same as before but the confidence interval is not
the same, the interval is wider (see below) and it is called prediction interval.
The prediction interval for predicting a single value of y for a certain value of x is
always larger/wider than the confidence interval for estimating the mean value of y
for a certain value of x.

Define the vector x0 as x0 = 1 x01 x02 x0 k  . The fitted value or mean

response at this point is given as yˆ mr = x0βˆ with the variance of ŷ0 given by:

Var ( yˆmr ) =  2x0 ( XX) x0

-1

Therefore, a 100(1 - α) percent confidence interval on the mean response at the

point x0 is given by:

yˆ mr − tα 2, n− p  2x0 ( XX ) x0  E ( y x0 )  yˆmr + tα 2, n− p  2x0 ( XX ) x0

−1 −1

Meanwhile, a point estimate of new prediction at point x0 = 1 x01 x02 x0 k 

has a 100(1 - α) percent prediction interval given by:

( ) (
yˆ npr − tα 2, n− p  2 1 + x0 ( XX ) x0  ynpr  yˆ npr + tα 2, n− p  2 1 + x0 ( XX ) x0
−1 −1
)
The extra width/variation,   for the interval is due to the error in predicting a
particular value. This is in contrast to the error of zero in predicting the mean
response for all members in the population.

Example: Delivery Time Data

Suppose, the soft drink bottler in would like to construct a 95% confidence interval
on the mean delivery time for an outlet requiring x1 = 8 cases and where the distance
x2 = 275 feet. Therefore, x0 = 1 8 275 .

The fitted value at this point is calculated as:

 2.34123
yˆ mr / npr = x0βˆ = 1 8 275 1.61591  = 19.22 minutes
 
0.01438

15
The variance of yˆmr is estimated by:
 2 x0 ( XX ) x 0
-1

 0.11321528 −.00444859 −.00008367   1 

= 10.6239 1 8 275  −.00444859 0.00274378 −.00004786   8 
  
 −.00008367 −.00004876 0.00000123   275
= 10.6239 ( 0.05346 ) = 0.56794

Therefore, a 95% confidence interval on the mean delivery time at this point is
given as
19.22 − 2.074 10.6239 ( 0.05346 )  ymr  19.22 + 2.074 10.6239 ( 0.05346 )
19.22 − 2.074 0.56794  ymr  19.22 + 2.074 0.56794
17.66  ymr  20.78

Meanwhile, a 95% prediction interval on the mean delivery time at this point is
given as
19.22 − 2.074 10.6239 (1 + 0.05346 )  ynpr  19.22 + 2.074 10.6239 (1 + 0.05346 )
12.28  ynpr  26.16
It is very obvious that in this case, the prediction interval for new observation is
much larger/wider than the confidence interval for the mean response.

2.5 SOME CONSIDERATIONS IN THE USE OF REGRESSION

Regression analysis is widely used and, unfortunately, frequently misused. There
are several common abuses of regression that should be mentioned:
1. Regression models are intended as interpolation equations over the range of the
regressor variable(s) used to fit the model. As observed previously, we must be
careful if we extrapolate outside of this range.
2. The disposition of the x values plays an important role in the least-squares fit.
While all points have equal weight in determining the height of the line, the slope
is more strongly influenced by the remote values of x. For example, consider the
data in Figure 3.6. The slope in the least-squares fit depends heavily on either or
both of the points A and B. Furthermore, the remaining data would give a very
different estimate of the slope if A and B were deleted. Situations such as this
often require corrective action, such as further analysis and possible deletion of
the unusual points, estimation of the model parameters with some technique that
is less seriously influenced by these points than least-squares, or restricting the
model, possibly by introducing further regressors.

16
A

y
y

x x
Figure 3.6 two influential observations Figure 3.7 A point remote in x-space

A somewhat different situation is illustrated in Figure 3.7, where one of the 18

observations is very remote in x-space. In this example, the slope is largely
determined by the extreme point. If this point is deleted, the slope estimate is
probably zero. Because of the gap between the two clusters of points, we really
have two distinct information units with which to fit the model. Thus, there are
effectively far fewer than apparent 16 degrees of freedom for error. Situation such
as these seem to occur fairly in practice. In general, we should be aware that in
some data sets one point (or a small cluster of points) may control key model
properties.

3. Outliers or bad values can seriously disturb the least-squares fit. For example,
consider the data in Figure 3.8. Observation A seems to be an “outlier” or “bad
value” because it falls far from the line implied by the rest of the data. If this point
is really an outlier, then the estimate of the intercept may be incorrect and the
residual mean square may be an inflated estimate of σ 2 . On the other hand, the
data point may not be a bad value and may be a highly useful piece of evidence
concerning the process under investigation. Methods for detecting and dealing
with outliers are discussed more completely in Chapter 4.

x
Figure 3.8 An outlier

17
4. As mentioned in Chapter 1, just because a regression analysis has indicated a
strong relationship between two variables, this does not imply that the variables
are related in any causal sense. Causality implies necessary correlation. It cannot
address the issues of necessity. Thus, our expectations of discovering cause and
effect relationships from regression should be modest.
As an example of a “nonsense” relationship between two variables, consider the
data in Table 3.7. This table presents the number of certified mental defectives in
the United Kingdom per 10,000 of estimates population (y), the number of radio
receiver licenses issued ( x1 ), and the first name of the President of the United
States ( x2 ) for the years 1924-1937. We can show that the regression equation
relating y to x1 is
yˆ = 4.582 + 2.204x1
The t-statistic for testing H0 : β1 = 0 for this model is t0 = 27.312 (thus very small
p-value), and the coefficient of determination is R2 = 0.9842 . That is, 98.42% of
the variability in the data is explained by the number of radio receiver licenses
issued. Clearly this is a nonsense relationship, as it is highly unlikely that the
number of mental defectives in the population is functionally related to the
number of radio receiver licenses issued. The reason for this strong statistical
relationship is that y and x are monotonically related (two sequences of numbers
are monotonically related if as one sequence increases, e.g., the other always
either increases or decrease). In this example, y is increasing because diagnostic
procedures for mental disorders are becoming more refined over the years
represented in the study and x1 is increasing because of the emergency and low-
cost availability of radio technology over the years.
Table 3.7
Number of Certified Number of Radio
First Name of
Mental Defectives per Receiver Licenses
Year President of the
10,000 of Estimated Issued (millions) in
U.S. ( x2
Population in U.K. (y) the U.K. ( x1 )
1924 8 1.350 Calvin
1925 8 1.960 Calvin
1926 9 2.270 Calvin
1927 10 2.483 Calvin
1928 11 2.730 Calvin
1929 11 3.091 Calvin
1930 12 3.647 Herbert
1931 16 4.620 Herbert
1932 18 5.497 Herbert
1933 19 6.260 Herbert
1934 20 7.012 Franklin
1935 21 7.618 Franklin
1936 22 8.131 Franklin
1937 23 8.593 Franklin
Source: Kendall and Yule [1950] and Tufte [1974].

18
Any two sequences of numbers that are monotonically related will exhibit similar
properties. To illustrate this further, suppose we regress y on the number of letters
in the first name of U.S. President in the corresponding year. The model is
yˆ = − 26.442 + 5.900x2
with t0 = 8.996 (thus small p-value) and R2 = 0.8709 . Clearly this is a nonsense
relationship as well.

5. In some applications of regression the value of the regressor variable x required

to predict y is unknown. For example, consider predicting maximum daily load
on an electric power generation system from a regression model relating the load
to the maximum daily temperature. To predict tomorrow’s maximum load we
must first predict tomorrow’s maximum temperature. Consequently, the
prediction of maximum load is conditional on the temperature forecast. The
accuracy of the maximum load forecast depends on the accuracy of the
temperature forecast. This must be considered when evaluating model
performance.
6. When using multiple regression, occasionally we find an apparent contradiction
of intuition or theory when one or more of the regression coefficients seems to
have the wrong (unexpected) sign. For example, the problem situation may imply
that a particular regression coefficient should be positive, while the actual
estimate of the parameter is negative. This wrong sign problem can be
disconcerting, as it is usually difficult to explain a negative estimate of a
parameter to the model user when that user believes that the coefficient should
be positive. The regression coefficients may have the wrong sign for the
following reasons:
i) the range of some of the regressor is too small
ii) important regressor have not been included in the model
iii) multicollinearity is present
iv) computational errors have been made

Statistics and Probability: Quarter 3 - Module 22: Identifying Percentiles Using The T-Table
100% (1)
Statistics and Probability: Quarter 3 - Module 22: Identifying Percentiles Using The T-Table
27 pages
Multiple Regression
No ratings yet
Multiple Regression
22 pages
Multiple Linear Regression Model: (Or Equivalently
No ratings yet
Multiple Linear Regression Model: (Or Equivalently
41 pages
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
No ratings yet
MAS316/Math352 Regression Analysis: 1 Multiple Linear Regression Models
12 pages
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
No ratings yet
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
34 pages
Multiple Linear Regression Model: (Or Equivalently
No ratings yet
Multiple Linear Regression Model: (Or Equivalently
41 pages
Week 5 Lecture Q A
No ratings yet
Week 5 Lecture Q A
14 pages
CHAPTER THREE - Multiple Linear Regression Analysis
No ratings yet
CHAPTER THREE - Multiple Linear Regression Analysis
77 pages
Econometric Theory: Module - Iii
No ratings yet
Econometric Theory: Module - Iii
10 pages
Chapter 3 Multivariate Linear Regression
No ratings yet
Chapter 3 Multivariate Linear Regression
16 pages
Chapter3 Econometrics MultipleLinearRegressionModel
No ratings yet
Chapter3 Econometrics MultipleLinearRegressionModel
41 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
18 pages
Chapter 4 Multiple Regression Model
No ratings yet
Chapter 4 Multiple Regression Model
31 pages
Regression and Multiple Regression Analysis
100% (1)
Regression and Multiple Regression Analysis
21 pages
lecture_8
No ratings yet
lecture_8
29 pages
Mult Regression
No ratings yet
Mult Regression
28 pages
Multiple Linear Reegression
No ratings yet
Multiple Linear Reegression
21 pages
Chapter3-Econometrics-MultipleLinearRegressionModel
No ratings yet
Chapter3-Econometrics-MultipleLinearRegressionModel
40 pages
multiple regression edit_removed
No ratings yet
multiple regression edit_removed
14 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Econometrcs CHAP 3
No ratings yet
Econometrcs CHAP 3
16 pages
2b Multiple Linear Regression
No ratings yet
2b Multiple Linear Regression
14 pages
Lecture 3 Multiple Regression Model-Estimation
No ratings yet
Lecture 3 Multiple Regression Model-Estimation
40 pages
Lecture2 241007 162001
No ratings yet
Lecture2 241007 162001
11 pages
WST 311 Notes part 2 2024
No ratings yet
WST 311 Notes part 2 2024
21 pages
Stat 353 Study Guide
No ratings yet
Stat 353 Study Guide
44 pages
FCDS - RA ch1 Sp21
No ratings yet
FCDS - RA ch1 Sp21
14 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Module01 LinearRegression
No ratings yet
Module01 LinearRegression
41 pages
Ch3slides Multiple Linear Regression
No ratings yet
Ch3slides Multiple Linear Regression
61 pages
125.785 Module 2.2
No ratings yet
125.785 Module 2.2
95 pages
Complete Business Statistics: Multiple Regression
No ratings yet
Complete Business Statistics: Multiple Regression
64 pages
Econometrics Chapter Three
No ratings yet
Econometrics Chapter Three
35 pages
Topic 3 Multiple Regression Analysis Estimation
No ratings yet
Topic 3 Multiple Regression Analysis Estimation
31 pages
REg03
No ratings yet
REg03
39 pages
BS Classes V2
No ratings yet
BS Classes V2
70 pages
Multiple Regression Okk PDF
No ratings yet
Multiple Regression Okk PDF
19 pages
Supplement 5 - Multiple Regression
No ratings yet
Supplement 5 - Multiple Regression
19 pages
8.-Linear-Regression
No ratings yet
8.-Linear-Regression
25 pages
6 Multiple Regression
No ratings yet
6 Multiple Regression
26 pages
Applied Business Forecasting and Planning: Multiple Regression Analysis
No ratings yet
Applied Business Forecasting and Planning: Multiple Regression Analysis
100 pages
Econometrics for Finance Lecture III
No ratings yet
Econometrics for Finance Lecture III
54 pages
Chapter Three: Estimation of Multiple Linear Regression Model
No ratings yet
Chapter Three: Estimation of Multiple Linear Regression Model
18 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Multiple Regression
100% (1)
Multiple Regression
100 pages
Paper On Polynomial Regression
No ratings yet
Paper On Polynomial Regression
7 pages
C2 English
No ratings yet
C2 English
34 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Chapter Three Metrics (I)
No ratings yet
Chapter Three Metrics (I)
35 pages
Linear Regression
No ratings yet
Linear Regression
47 pages
Chap06 4
No ratings yet
Chap06 4
64 pages
Brm Unit 3 Mcom Sem1
No ratings yet
Brm Unit 3 Mcom Sem1
40 pages
4 - Multiple Linear Regressions
No ratings yet
4 - Multiple Linear Regressions
61 pages
Regression - Part III - 2021
No ratings yet
Regression - Part III - 2021
55 pages
Lect 10 801T
No ratings yet
Lect 10 801T
17 pages
Multiple Linear Regression
No ratings yet
Multiple Linear Regression
14 pages
07 Multiple Regression Analysis PDF
No ratings yet
07 Multiple Regression Analysis PDF
26 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Circles (Geometry) Mathematics Question Bank
From Everand
Circles (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Algebraic Equations
From Everand
Algebraic Equations
Demetrios P. Kanoussis
No ratings yet
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
From Everand
Application of Derivatives Tangents and Normals (Calculus) Mathematics E-Book For Public Exams
Mohmmad Khaja Shareef
5/5 (1)
Lotte-Project Report Draft - Training
No ratings yet
Lotte-Project Report Draft - Training
76 pages
Pearson Statistics Chapter 8 Tools
No ratings yet
Pearson Statistics Chapter 8 Tools
4 pages
Note On Panel Data
No ratings yet
Note On Panel Data
19 pages
Beerse Et Al 2019 Is There A Biofeedback Response To Art Therapy A Technology Assisted Approach For Reducing Anxiety
No ratings yet
Beerse Et Al 2019 Is There A Biofeedback Response To Art Therapy A Technology Assisted Approach For Reducing Anxiety
12 pages
The Student T Distribution and Its Use: James H. Steiger
No ratings yet
The Student T Distribution and Its Use: James H. Steiger
36 pages
State Council of Educational Research and Training, Chennai - 600 006
No ratings yet
State Council of Educational Research and Training, Chennai - 600 006
12 pages
Abaqus
No ratings yet
Abaqus
4 pages
Competency Mapping in Telecommunication Organisation
No ratings yet
Competency Mapping in Telecommunication Organisation
60 pages
Muestras de Distribucion
No ratings yet
Muestras de Distribucion
7 pages
T-Test Calculator With Step by Step Explanation
100% (1)
T-Test Calculator With Step by Step Explanation
4 pages
Appendix E: Statistical Tables
No ratings yet
Appendix E: Statistical Tables
8 pages
Chapter10 Stats
No ratings yet
Chapter10 Stats
7 pages
HW 4
No ratings yet
HW 4
7 pages
Controllers Review
No ratings yet
Controllers Review
9 pages
ADL 10 Marketing Research V3 2
67% (9)
ADL 10 Marketing Research V3 2
11 pages
20171025141013chapter-3 Chi-Square-Test PDF
No ratings yet
20171025141013chapter-3 Chi-Square-Test PDF
28 pages
SOURCE CODE Telecom
No ratings yet
SOURCE CODE Telecom
30 pages
Cyclic Symmetry Topics 0
No ratings yet
Cyclic Symmetry Topics 0
74 pages
Econometric Theory: Module - Ii
No ratings yet
Econometric Theory: Module - Ii
11 pages
Day of the week effects
No ratings yet
Day of the week effects
13 pages
Chapter Three Statistical Inference in Simple Linear Regression Model
No ratings yet
Chapter Three Statistical Inference in Simple Linear Regression Model
33 pages
Chater - Vi Marketing of Entertaiment Tourism Introduction: in This Chapter An Attempt Is Made To Discuss Marketing Strategies Adopted
No ratings yet
Chater - Vi Marketing of Entertaiment Tourism Introduction: in This Chapter An Attempt Is Made To Discuss Marketing Strategies Adopted
31 pages
James Tobin: Econometrica, Vol. 26, No. 1. (Jan., 1958), Pp. 24-36
No ratings yet
James Tobin: Econometrica, Vol. 26, No. 1. (Jan., 1958), Pp. 24-36
16 pages
EEGI 3131-Adjustment Computations-Lesson 4
No ratings yet
EEGI 3131-Adjustment Computations-Lesson 4
17 pages
HR Project
No ratings yet
HR Project
68 pages
Chapter 13
No ratings yet
Chapter 13
23 pages
Input Modelling: Discrete-Event System Simulation
No ratings yet
Input Modelling: Discrete-Event System Simulation
41 pages
11 Technical Analysis & Dow Theory
No ratings yet
11 Technical Analysis & Dow Theory
12 pages
Structural Equation Modeling
No ratings yet
Structural Equation Modeling
42 pages