330 Lecture7 2014
330 Lecture7 2014
330 Lecture7 2014
Prediction
5.08.2014
Office hours
I Lecturers
Office auckland.ac.nz day time
Steffen Klaere 303.219 s.klaere Thu, 10:0012:00
Alan Lee 303S.265 aj.lee Tue, 10:3012:00
Thu, 10:3012:00
X1
X2
Prediction: how to do it
I Vector of regression coefficients: b = b0 , b1 , . . . , bk ;
I Inner product: xT b = b0 + b1 x1 + + bk xk .
Prediction error
or more precisely...
Prediction interval
xT b sP tnk1 (1 /2)
Variance of the predictor
Arrange covariate data (used for fitting the model) into a matrix X
such that
1 x11 . . . x1k
X = ... .. .. .. .
. . .
1 xn1 . . . xnk
Then the variance of the prediction for point b
x is
1
xT XT X
Var(predictor) = 2b x.
b
Doing it in R
# Do the prediction
> predict(cherry.lm,new.df)
# Output
1 2
22.63846 29.04288
Doing it in R
> predict(cherry.lm,new.df,se.fit=T,interval="prediction")
$fit
fit lwr upr
1 22.63846 13.94717 31.32976
2 29.04288 19.97235 38.11340
$se.fit
1 2
1.712901 2.130571
$df
[1] 28
$residual.scale
[1] 3.881832
Hand calculation
predictor = 22.63846
p
SE(predictor) = se.fit2 + residual.scale2
p
= 1.7129012 + 3.8818322 = 4.242953
b0 + b1 x1 + + bk xk
Standard error of the estimate
I Note that this is less than the standard error of the prediction!
q
SE(predictor) = Var(predictor) + 2 ,
p
SE(estimate) = Var(predictor).
I Confidence interval
> predict(cherry.lm,new.df,se.fit=T,interval="confidence")
$fit
fit lwr upr
1 22.63846 19.12974 26.14718
2 29.04288 24.67860 33.40716
$se.fit
1 2
1.712901 2.130571
$df
[1] 28
$residual.scale
[1] 3.881832
Example: Hydrocarbon data
20 30 40 50
hc
90
70
t.temp
50
30
80
p.temp
60
40
t.vp
5
p.vp
20 30 40 50 40 60 80 3 4 5 6 7
Pairs plot
30 50 70 90 3 4 5 6 7
20 30 40 50
hc
90
70
t.temp
50
30
80
p.temp
60
40
t.vp
5
p.vp
20 30 40 50 40 60 80 3 4 5 6 7
Pairs plot
30 50 70 90 3 4 5 6 7
hc
20 30 40 50
90
t.temp
70
0.81
50
30
p.temp
80
0.88
0.81
60
40
t.vp
7
p.vp
7
6
0.91 0.93 0.83 0.98
5
4
3
20 30 40 50 40 60 80 3 4 5 6 7
Preliminary conclusions
I No obvious outliers
Fitting the full model
Call:
lm(formula = hc ~ t.temp + p.temp + t.vp + p.vp,
data = vapour.df)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.16609 1.02198 0.163 0.87117
t.temp -0.07764 0.04801 -1.617 0.10850
p.temp 0.18317 0.04063 4.508 1.53e-05 ***
t.vp -4.45230 1.56614 -2.843 0.00526 **
p.vp 10.27271 1.60882 6.385 3.37e-09 ***
I Large R 2
I Move on to prediction