Chap 6 MultipleLinearRegression Adjusted
Chap 6 MultipleLinearRegression Adjusted
Chap 6 MultipleLinearRegression Adjusted
Topics
Explanatory vs. predictive modeling with regression
Example: prices of Toyota Corollas
Fitting a predictive model
Assessing predictive accuracy
Explanatory Modeling
Goal: Explain relationship between predictors
(explanatory variables) and target
where:
0, 1, 2, . . . , p are the parameters, and
is a random variable called the error term
Estimation Process
Multiple Regression Model
Sample Data:
y = 0 + 1x1 + 2x2 +. . .+ pxp + x 1 x2 . . . x p y
Multiple Regression Equation . . . .
E(y) = 0 + 1x1 + 2x2 +. . .+ pxp . . . .
Unknown parameters are
0, 1, 2, . . . , p
Estimated Multiple
Regression Equation
b 0, b 1, b 2, . . . , b p
provide estimates of yˆ b0 b1 x1 b2 x2 ... bp x p
0, 1, 2, . . . , p Sample statistics are
b 0, b 1, b 2, . . . , b p
Multiple Regression Model
SALARY
SALARY == 3.174
3.174 ++ 1.404(EXPER)
1.404(EXPER) ++ 0.251(SCORE
0.251(SCORE))
bb11 == 1.
1. 404
404
bb22 == 0.251
0.251
Salary is expected to increase by $251 for each
additional point scored on the programmer aptitude
test (when the variable years of experience is held
constant).
Multiple Coefficient of Determination
i
( y y ) 2
i
( ˆ
y y ) 2
i i
( y ˆ
y ) 2
where:
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error
Multiple Coefficient of Determination
R2 = SSR/SST
R2 = 500.3285/599.7855 = .83418
Adjusted Multiple Coefficient
of Determination
n1
Ra2 2
1 (1 R )
np1
20 1
R 1 (1 .834179)
2
a .814671
20 2 1
Testing for Significance: F Test
Hypotheses H 0: 1 = 2 = . . . = p = 0
Ha: One or more of the parameters
is not equal to zero.
Hypotheses
bi
Test Statistics t
sbi
where:
^
y = annual salary ($1000)
x1 = years of experience
x2 = score on programmer aptitude test
x3 = 0 if individual does not have a graduate degree
1 if individual does have a graduate degree
x3 is a dummy variable
Qualitative Independent Variables
A B C D E
38
39 Coeffic. Std. Err. t Stat P-value
40 Intercept 7.94485 7.3808 1.0764 0.2977
41 Experience 1.14758 0.2976 3.8561 0.0014
42 Test Score 0.19694 0.0899 2.1905 0.04364
43 Grad. Degr. 2.28042 1.98661 1.1479 0.26789
44
Not significant
More Complex Qualitative Variables
If
If aa qualitative
qualitative variable
variable has
has kk levels,
levels, kk -- 11 dummy
dummy
variables
variables are
are required
required to
to be
be included
included in in the
the model
model ,,
with
with each
each dummy
dummy variable
variable being
being coded
coded as as 00 or
or 1.
1.
The
The excluded
excluded dummy
dummy variable
variable will
will serve
serve as as aa reference
reference
for
for comparison
comparison
For example, a variable indicating level of education could
be represented by x1 and x2 values as follows:
Highest
Degree d1 d2
Bachelor’s 0 0
Master’s 1 0
Ph.D. 0 1
Required Conditions for the Error
Variable
Fuel_Type_Di Fuel_Type_Pe
Id Model Price Age_08_04 Mfg_Month Mfg_Year KM
esel trol
Corolla 2.0 D4D HATCHB TERRA
1 2/3-Doors 13500 23 10 2002 46986 1 0
Corolla 2.0 D4D HATCHB TERRA
4 2/3-Doors 14950 26 7 2002 48000 1 0
TA Corolla 2.0 D4D HATCHB 5 SOL 2/3-Doors 13750 30 3 2002 38500 1 0
TA Corolla 2.0 D4D HATCHB 6 SOL 2/3-Doors 12950 32 1 2002 61000 1 0
OTA Corolla 1800 T SPORT9VVT I 2/3-Doors 21500 27 6 2002 19700 0 1
TA Corolla 1.9 D HATCHB 10
TERRA 2/3-Doors 12950 23 10 2002 71138 1 0
8 16V VVTLI 3DR T SPORT 12 BNS 2/3-Doors 19950 22 11 2002 43610 0 1
olla 1.8 16V VVTLI 3DR T 17
SPORT 2/3-Doors 22750 30 3 2002 34000 0 1
Total sum of
RMS Error Average Error
squared errors
Total sum of
RMS Error Average Error
squared errors