Evans Analytics2e PPT 08
Evans Analytics2e PPT 08
Evans Analytics2e PPT 08
Trendlines and
Regression Analysis
y = a + bx
y = ln(x)
y = ax2 + bx + c
y = ax3 + bx2 + dx + e
y = axb
y = abx
R2
Exponential
Logarithmic
Polynomial 2
Polynomial 3
Power
y = 50.49e0.021x
R2 = 0.664
y = 13.02ln(x) + 39.60
R2 = 0.382
y = 0.13x2 2.399x + 68.01 R2 = 0.905
y = 0.005x3 0.111x2
+ 0.648x + 59.497
R2 = 0.928 *
y = 45.96x0.0169
R2 = 0.397
Figure 8.11
Regression Analysis
Regression analysis is a tool for building
mathematical and statistical models that
characterize relationships between a dependent
(ratio) variable and one or more independent, or
explanatory variables (ratio or categorical), all of
which are numerical.
Simple linear regression involves a single
independent variable.
Multiple regression involves two or more
independent variables.
Least-Squares Regression
Residuals
Excel functions:
=INTERCEPT(known_ys, known_xs)
=SLOPE(known_ys, known_xs)
Slope = b1 = 35.036
=SLOPE(C4:C45, B4:B45)
Intercept = b0 = 32,673
=INTERCEPT(C4:C45, B4:B45)
Regression Statistics
Multiple R - | r |, where r is the sample correlation
coefficient. The value of r varies from -1 to +1 (r is
negative if slope is negative)
R Square - coefficient of determination, R2, which
varies from 0 (no fit) to 1 (perfect fit)
Adjusted R Square - adjusts R2 for sample size
and number of X variables
Standard Error - variability between observed
and predicted Y values. This is formally called the
standard error of the estimate, SYX.
Checking Assumptions
Linearity
examine scatter diagram (should appear linear)
examine residual plot (should appear random)
Normality of Errors
view a histogram of standard residuals
regression is robust to departures from normality
Key differences:
Multiple R and R Square are called the multiple
correlation coefficient and the coefficient of multiple
determination, respectively, in the context of multiple
regression.
ANOVA tests for significance of the entire model. That
is, it computes an F-statistic for testing the hypotheses:
Regression model
2.
3.
4.
Banking Data
Alternate Criterion
Use the t-statistic.
If | t | < 1, then the standard error will decrease
and adjusted R2 will increase if the variable is
removed. If | t | > 1, then the opposite will occur.
You can follow the same systematic approach,
except using t-values instead of p-values.
Multicollinearity
Overfitting
Interactions
An interaction occurs when the effect of one
variable is dependent on another variable.
We can test for interactions by defining a new
variable as the product of the two variables,
X3 = X1 X2 , and testing whether this
variable is significant, leading to an
alternative model.
Add 3 columns to
the data, one for
each of the tool
type variables
Regression results
Surface finish = 24.49 + 0.098 RPM - 13.31 type B - 20.49 type C 26.04 type D
Best-Subsets Procedures
If you click Choose Subset, XLMiner will create a new worksheet with the results for this model.