Lecture 4

Lecture#4
Linear Regression
The Model
The model has a deterministic and a probabilistic components
Hous
e
Cost
c o sts
o use
g ah
il di n
o0o0t.+
Bu t r2e5 0
f
u u=a
abo speecr ossqt
Most lots
$7o5u ize)
H
sell S
75(
for $25,000
House
size
However, house cost vary even among same size
houses! Since cost behave
Hous unpredictably,
e we add a random component.
Cost
Most lots
sell
for $25,000 House cost = 25000 + 75 +
(Size) ε
House
size
• The first order linear model
Y = dependent variable
X = independent variable
β0 = Y-intercept β0 and β1 are unknown
β1 = slope of the line Y population
ε = error variable parameters, therefore are
estimated
from the data.
Rise β1 =
β0 Ru Rise/Run
n
X
Estimating the Coefficients
• The estimates are determined by

• drawing a sample from the population of interest,
• calculating sample statistics.
• producing a straight line that cuts into the data.
Y ⬥
⬥ Question: What should
⬥ be
⬥
⬥ ⬥ ⬥ ⬥ ⬥ considered a good line?
⬥ ⬥ ⬥ ⬥ ⬥
⬥
X
The Least Squares (Regression) Line
A good line is one that minimizes

the sum of squared differences
between the
points and the line.
Sum of squared differences (2 - 1)2 (4 - 2)2 (1.5 - 3)2 (3.2 - 4)2 =
=Sum of squared differences +(2 -2.5)2 + (4 - 2.5)
+ 2 (1.5 - 2.5)
6.892
(3.2 - 2.5)2 =
= + + Let +us compare 3.99 two
4 (2,4
)⬥ lines
The second line is
horizontal
⬥ (4,3.2
3
)
2.
52
(1,2 ⬥
) ⬥ (3,1.5
1 )
The smaller the sum

1 2 3 4 of
squared differences
the better the fit of
the
line to the data.
The Estimated Coefficients
To calculate the estimates of the line The regression equation that

coefficients, that minimize the estimates
differences between the data points the equation of the first order linear
and the line, use the formulas: model
is:
The Simple Linear Regression Line
• A car dealer wants to find

the relationship between
the odometer reading and
the selling price of used cars.
• A random sample of 100 cars is selected,
and the data
recorded.
• Find the regression line.
Independent Dependent
variable X variable Y
• Solution
– Solving by hand: Calculate a number of statistics
where n =
100.
Error Variable: Required Conditions
• The error ε is a critical part of the regression model.
• Four requirements involving the distribution of ε must be satisfied.
• The probability distribution of ε is normal.
• The mean of ε is zero: E(ε) = 0.
• The standard deviation of ε is σε for all values of X.
• The set of errors associated with different values of Y are all independent.
Assessing the Model
• The least squares method will produces a regression line whether or

not there are linear relationship between X and Y.
• Consequently, it is important to assess how well the linear model fits
the data.
• Several methods are used to assess the model. All are based on the
sum of squares for errors, SSE.
Sum of Squares for Errors
• This is the sum of differences between the points and the regression line.
• It can serve as a measure of how well the line fits the data. SSE is defined
by
– A shortcut formula
Standard Error of Estimate
• The mean error is equal to zero.
• If σε is small the errors tend to be close to zero (close to the mean error).
Then, the model fits the data well.
• Therefore, we can, use σε as a measure of the suitability of using a linear
model.
• An estimator of σε is given by sε
•Example
•Calculate the standard error of estimate for previous
Example, and describe what does it tell you about the
model fit?
•Solution
Calculated
before
It is hard to assess the model

based
on sε even when compared with
the
mean value of Y.
Testing the Slope
• When no linear relationship exists between two variables, the regression
line should be horizontal.
❑ ❑
❑
❑ ❑ ❑
❑ ❑ ❑
❑ ❑
❑❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑
❑ ❑❑ ❑ ❑ ❑
❑ ❑ ❑
❑ ❑ ❑ ❑ ❑ ❑
❑ ❑ ❑ ❑ ❑ ❑
❑ ❑ ❑ ❑❑ ❑❑ ❑ ❑❑ ❑❑ ❑ ❑❑ ❑ ❑ ❑ ❑❑ ❑ ❑ ❑ ❑
❑❑ ❑ ❑ ❑ ❑ ❑
❑ ❑❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑ ❑❑ ❑ ❑
❑ ❑ ❑ ❑❑ ❑❑ ❑❑ ❑ ❑ ❑ ❑❑ ❑ ❑❑ ❑ ❑ ❑
❑❑
No linear
Linear
relationship.
Different inputs (X)
relationship.
Different inputs (X)
yield
yield
different
The slope outputs (Y). to
is not equal The
the slope
same is equal(Y).
output to
zero zero
• We can draw inference about β1 from b1 by testing
H0 : β 1 = 0
H1: β1 ≠ 0 (or < 0,or > 0)
• The test statistic is
wher
• If the error variable is normally distributed, the statistic has Student t
distribution with d.f. = n-2. e
The standard error of

b1 .
• To understand the significance of this coefficient
note:
The regression
model
Overall variability
in Y
The
error
y
2
Two data points (X1,Y1) and
(X2,Y2)
of a certain sample are shown.
y Variation in Y = SSR + SSE

1
x x
Variation explained by
Total variation in1 Y 2 + Unexplained variation
the
= (error)
regression line
• R2 measures the proportion of the variation in Y
that is explained by the variation in X.
• R2 takes on any value between zero and one.

R2 = 1: Perfect match between the line and the data points.
R2 = 0: There are no linear relationship between X and Y.
Find the coefficient of determination; what does this statistic tell you
about the model?

Lecture 4

Uploaded by

Copyright:

Available Formats

Lecture 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 4

Uploaded by

Copyright:

Available Formats

Lecture#4

The model has a deterministic and a probabilistic components

• The estimates are determined by

A good line is one that minimizes

The smaller the sum

To calculate the estimates of the line The regression equation that

• A car dealer wants to find

• The least squares method will produces a regression line whether or

It is hard to assess the model

The standard error of

y Variation in Y = SSR + SSE

• R2 takes on any value between zero and one.

You might also like