13 Chapter14

Part 4
Chapter 14
Linear Regression
PowerPoints organized by Dr. Michael R. Gustafson II, Duke University

All images copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
Part Four Curve Fitting
– Regression
least-square regression
– Interpolation
linear interpolation curvilinear interpolation

Part Four Curve Fitting
• Chapter 14: Basic statistics and Linear regression
• Chapter 15: General linear least-squares &
Nonlinear regression
• Chapter 16: Fourier analysis
• Chapter 17: Polynomial interpolation
• Chapter 18: Splines & Piecewise interpolation
Chapter Objectives
• Familiarizing yourself with some basic descriptive
statistics and the normal distribution.
• Knowing how to compute the slope and intercept of
a best fit straight line with linear regression.
• Knowing how to compute and understand the
meaning of the coefficient of determination and the
standard error of the estimate.
• Understanding how to use transformations to
linearize nonlinear equations so that they can be fit
with linear regression.
• Knowing how to implement linear regression with
MATLAB.
Statistics Review
Measure of Location
• Arithmetic mean: the sum of the individual
data points (yi) divided by the number of
points n:
y=
∑ yi
n
• Median: the midpoint of a group of data.
• Mode: the value that occurs most frequently
in a group of data.
Statistics Review
Measures of Spread
• Standard deviation:
sy =
St
=
∑ ( yi − y )2
n −1 n −1
where St is the sum of the squares of the data residuals:
St = ∑ (yi − y )
2
and n-1 is referred to as the degrees of freedom.
( )
• Variance:
∑ (yi − y ) ∑ yi − ∑ yi / n
2 2
2
sy =
2
=
n −1 n −1
• Coefficient of variation:
sy
c.v. = ×100%
y
Statistics Review
• Example
Table
6.495 6.565 6.625 6.435 6.635 6.395 6.655 6.655
6.665 6.595 6.515 6.715 6.625 6.485 6.775 6.605
6.755 6.505 6.615 6.555 6.575 6.715 6.555 6.685
• Solution ☞
=y
∑
158.4
=
y
= 6.6
i = median
6.605 + 6.615
= 6.61
n 24 2
=sy 2 ∑ ( yi − y ) 2 0.217
= = 0.009435
n −1 23
= =
∑ yi 2 − (∑ yi ) 2 / n 1045.657 − 158.42 / 24
= 0.009435
n −1 23
=sy =
St ∑ ( yi − y ) 2
= = s y 2 0.097133
n −1 n −1
sy 0.097133
c.v. = ×100% = ×100 =1.47%
y 6.6
Normal Distribution
Normal Distribution
modal class interval [ 6.6, 6.64 ]

6.6 + 6.64
=
mode = 6.62
2
68% in [ y − s y , y + s y ]
95% in [ y − 2 s y , y + 2 s y ]
= [6.405734, 6.794266 ]
Descriptive Statistics in MATLAB
• MATLAB has several built-in commands to
compute and display descriptive statistics.
Assuming some column vector s:
– mean(s), median(s), mode(s)
• Calculate the mean, median, and mode of s. mode is a part of
the statistics toolbox.
– min(s), max(s)
• Calculate the minimum and maximum value in s.
– var(s), std(s)
• Calculate the variance and standard deviation of s
• Note - if a matrix is given, the statistics will be
returned for each column.
Histograms in MATLAB
• [n, x] = hist(s, x)
– Determine the number of elements in each bin of data in
s. x is a vector containing the center values of the bins.
• [n, x] = hist(s, m)
– Determine the number of elements in each bin of data in
s using m bins. x will contain the centers of the bins.
The default case is m=10
• hist(s, x) or hist(s, m) or hist(s)
– With no output arguments, hist will actually produce a
histogram.
Histogram Example
Linear Least-Squares Regression
y = a0 + a1 x + e = f ( x) + e
e = y − (a0 + a1 x) = y − f ( x)
Linear Least-Squares Regression
• Linear least-squares regression is a method to
determine the “best” coefficients in a linear model
for given data set.
• “Best” for least-squares regression means
minimizing the sum of the squares of the estimate
residuals. For a straight line model, this gives:
n n
Sr = ∑ e = ∑ (yi − a0 − a1 xi )
2 2
i
i=1 i=1
• This method will yield a unique line for a given set

of data.
Least-Squares Fit of a Straight Line
• Using the model:
y = a0 + a1 x
the slope and intercept producing the best fit

can be found using:
n∑ xi yi − ∑ xi ∑ yi
a1 =
n∑ x − (∑ x )
2
2
i i
a0 = y − a1 x
Example
V F n∑ xi yi − ∑ xi ∑ yi 8(312850) − (360)(5135)
a1 = = = 19.47024
(m/s) (N)
n∑ x − (∑ x ) 8(20400) − (360)
2 2
2
i i
i xi yi (xi)2 x iy i
a0 = y − a1 x = 641.875 −19.47024 (45) = −234.2857
1 10 25 100 250
2 20 70 400 1400
3 30 380 900 11400 Fest = −234.2857 +19.47024v

4 40 550 1600 22000
5 50 610 2500 30500
6 60 1220 3600 73200
7 70 830 4900 58100
8 80 1450 6400 116000
Σ 360 5135 20400 312850

Quantification of Error
• Recall for a straight line, the sum of the
squares of the estimate residuals:
n n
Sr = ∑ e = ∑ (yi − a0 − a1 xi )
2 2
i
i=1 i=1
St = ∑ i
( y − y )2
• Standard error of the estimate:

Sr St
s y/ x = sy =
n−2 n −1
Standard Error of the Estimate
• Regression data showing (a) the spread of data around the
mean of the dependent data and (b) the spread of the data
around the best fit line:
• The reduction in spread represents the improvement due to

linear regression.
Coefficient of Determination
• The coefficient of determination r2 is the difference between
the sum of the squares of the data residuals and the sum of
the squares of the estimate residuals, normalized by the
sum of the squares of the data residuals:
St − Sr
r =
2
St
• r2 represents the percentage of the original uncertainty
explained by the model.
• For a perfect fit, Sr=0 and r2=1.
• If r2=0, there is no improvement over simply picking the
mean.
• If r2<0, the model is worse than simply picking the mean!
Example
V F
(m/s) (N)
Fest = −234.2857 +19.47024v
St = ∑ (yi − y ) = 1808297
2
i xi yi a0+a1xi (yi- ȳ)2 (yi-a0-a1xi)2
Sr = ∑ (yi − a0 − a1 xi ) = 216118
1 10 25 -39.58 380535 4171 2
2 20 70 155.12 327041 7245

1808297
3 30 380 349.82 68579 911 sy = = 508.26
8 −1
4 40 550 544.52 8441 30 216118
s y/ x = = 189.79
5 50 610 739.23 1016 16699 8−2
6 60 1220 933.93 334229 81837 1808297 − 216118
r2 = = 0.8805
7 70 830 1128.63 35391 89180 1808297
8 80 1450 1323.33 653066 16044 88.05% of the original uncertainty
has been explained by the
Σ 360 5135 1808297 216118
linear model
Nonlinear Relationships
• Linear regression is predicated on the fact
that the relationship between the dependent
and independent variables is linear - this is
not always the case.
• Three common examples are:
exponential : y = α1eβ1 x
power : y = α2 x β2
x
saturation - growth - rate : y = α 3
β3 + x
Linearization of Nonlinear
Relationships
• One option for finding the coefficients for a
nonlinear fit is to linearize it. For the three
common models, this may involve taking
logarithms or inversion:
Model Nonlinear Linearized
exponential : y = α1eβ1 x ln y = ln α1 + β1 x
power : y = α2 x β2 log y = log α 2 + β2 log x

x 1 1 β3 1
saturation - growth - rate : y = α 3 = +
β3 + x y α3 α3 x
Transformation Examples
Table
v, m/s 10 20 30 40 50 60 70 80
F, N 25 70 380 550 610 1220 830 1450
• Solution ☞
=y α 2 x β2 =
log y log α 2 + β 2 log x
12.606 20.515
=
= 1.57657
X = Y = 2.5644
8 8
n∑ xi yi − ∑ xi ∑ yi 8(33.622) − 12.606(20.515)
=a1 = = 1.9842
n∑ xi − (∑ xi ) −
2 2
2 8(20.516) (12.606)
a0 =−
y a1 x =
2.5644 − 1.9842(1.5757) =− 0.5620
log y =
−0.5620 + 1.9842 log x
−0.562 1.9842
= =
F 10 v 0.274v1.9842
Table
v, m/s 10 20 30 40 50 60 70 80
F, N 25 70 380 550 610 1220 830 1450
• Solution ☞
=y α 2 x β2 =
log y log α 2 + β 2 log x
2,000
Data
1,600 Regression line F = 0.274 v1.9842
1,200
F [N]
800
400
0
0 20 40 60 80 100
-400
v [m/s]
Table
v, m/s 10 20 30 40 50 60 70 80
F, N 25 70 380 550 610 1220 830 1450
1,600
Data ☞
• Solution 2,000
Data
1,200 Regression line 1,600 Regression line
1,200
800
F [N]
F [N]
800
400
=F 19.470 v − 234.286 400 F = 0.274 v1.9842
0 0
0 20 40 60 80 100 0 20 40 60 80 100
-400 -400
v [m/s] v [m/s]
Linear Regression Program
MATLAB Functions
• MATLAB has a built-in function polyfit that fits a
least-squares nth order polynomial to data:
– p = polyfit(x, y, n)
• x: independent data
• y: dependent data
• n: order of polynomial to fit
• p: coefficients of polynomial
f(x)=p1xn+p2xn-1+…+pnx+pn+1
• MATLAB’s polyval command can be used to
compute a value using the coefficients.
– y = polyval(p, x)

13 Chapter14

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

13 Chapter14

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

13 Chapter14

Uploaded by

Copyright:

Available Formats

Part 4

PowerPoints organized by Dr. Michael R. Gustafson II, Duke University

linear interpolation curvilinear interpolation

and n-1 is referred to as the degrees of freedom.

modal class interval [ 6.6, 6.64 ]

• This method will yield a unique line for a given set

the slope and intercept producing the best fit

3 30 380 900 11400 Fest = −234.2857 +19.47024v

5 50 610 2500 30500

6 60 1220 3600 73200

7 70 830 4900 58100

8 80 1450 6400 116000

Σ 360 5135 20400 312850

• Standard error of the estimate:

• The reduction in spread represents the improvement due to

2 20 70 155.12 327041 7245

power : y = α2 x β2 log y = log α 2 + β2 log x

You might also like