Lecture 4
Lecture 4
Lecture 4
Linear Regression
The Model
Hous
e
Cost
c o sts
o use
g ah
il di n
o0o0t.+
Bu t r2e5 0
f
u u=a
abo speecr ossqt
Most lots
$7o5u ize)
H
sell S
75(
for $25,000
House
size
However, house cost vary even among same size
houses! Since cost behave
Hous unpredictably,
e we add a random component.
Cost
Most lots
sell
for $25,000 House cost = 25000 + 75 +
(Size) ε
House
size
• The first order linear model
Y = dependent variable
X = independent variable
β0 = Y-intercept β0 and β1 are unknown
β1 = slope of the line Y population
ε = error variable parameters, therefore are
estimated
from the data.
Rise β1 =
β0 Ru Rise/Run
n
X
Estimating the Coefficients
Y ⬥
⬥ Question: What should
⬥ be
⬥
⬥ ⬥ ⬥ ⬥ ⬥ considered a good line?
⬥ ⬥ ⬥ ⬥ ⬥
⬥
X
The Least Squares (Regression) Line
Independent Dependent
variable X variable Y
• Solution
– Solving by hand: Calculate a number of statistics
where n =
100.
Error Variable: Required Conditions
• The error ε is a critical part of the regression model.
• Four requirements involving the distribution of ε must be satisfied.
• The probability distribution of ε is normal.
• The mean of ε is zero: E(ε) = 0.
• The standard deviation of ε is σε for all values of X.
• The set of errors associated with different values of Y are all independent.
Assessing the Model
– A shortcut formula
Standard Error of Estimate
• The mean error is equal to zero.
• If σε is small the errors tend to be close to zero (close to the mean error).
Then, the model fits the data well.
• Therefore, we can, use σε as a measure of the suitability of using a linear
model.
• An estimator of σε is given by sε
•Example
•Calculate the standard error of estimate for previous
Example, and describe what does it tell you about the
model fit?
•Solution
Calculated
before
❑ ❑
❑
❑ ❑ ❑
❑ ❑ ❑
❑ ❑
❑❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑
❑ ❑❑ ❑ ❑ ❑
❑ ❑ ❑
❑ ❑ ❑ ❑ ❑ ❑
❑ ❑ ❑ ❑ ❑ ❑
❑ ❑ ❑ ❑❑ ❑❑ ❑ ❑❑ ❑❑ ❑ ❑❑ ❑ ❑ ❑ ❑❑ ❑ ❑ ❑ ❑
❑❑ ❑ ❑ ❑ ❑ ❑
❑ ❑❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑❑ ❑ ❑
❑ ❑ ❑ ❑❑ ❑❑ ❑❑ ❑ ❑ ❑ ❑❑ ❑ ❑❑ ❑ ❑ ❑
❑❑
No linear
Linear
relationship.
Different inputs (X)
relationship.
Different inputs (X)
yield
yield
different
The slope outputs (Y). to
is not equal The
the slope
same is equal(Y).
output to
zero zero
• We can draw inference about β1 from b1 by testing
H0 : β 1 = 0
H1: β1 ≠ 0 (or < 0,or > 0)
• The test statistic is
wher
• If the error variable is normally distributed, the statistic has Student t
distribution with d.f. = n-2. e
The regression
model
Overall variability
in Y
The
error
y
2
Two data points (X1,Y1) and
(X2,Y2)
of a certain sample are shown.
x x
Variation explained by
Total variation in1 Y 2 + Unexplained variation
the
= (error)
regression line
• R2 measures the proportion of the variation in Y
that is explained by the variation in X.