DISC 212 Session 13

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

Introduction to Management

Science
Tuesday 24th October 2023
Recap
• Sensitivity Analysis
– Interpreting allowable increase and allowable
decrease columns of the objective function
coefficients
– Interpreting the shadow price of constraints
– Interpreting reduced costs of objective function
coefficients
Way Forward
• 2 class skipped last week
– Online make-up (over the weekend insha’Allah)
• Will not cover more network modeling
problems
• Exam 2 will include predictive models which
we will cover today
– Also sensitivity analysis and basic network models
and shortest path
Types of Mathematical Models
Regression
• Regression Analysis is used to estimate a function f( )
that describes the relationship between a
continuous dependent variable and one or more
independent variables.
Y = f(X1, X2, X3,…, Xn) + e
Note:
• f( ) describes systematic variation in the relationship.
 e represents the unsystematic variation (or random error) in the
relationship.
Advertising – Sales Example
• Consider the relationship between advertising
(X1) and sales (Y) for a company.
• There probably is a relationship...
...as advertising increases, sales should increase.
• But how would we measure and quantify this
relationship? Obs
Advertising
(in $1000s)
Actual Sales
(in $1000s)
1 30 184.4
2 40 279.1
3 40 244.0
4 50 314.2
5 60 382.2
6 70 450.2
7 70 423.6
8 70 410.2
9 80 500.4
10 90 505.3
Scatter Plot

600.0
$1000s
Actual Sales in

500.0

400.0

300.0

200.0

100.0

0.0
0 20 40 60 80 100

Advertising in $1000s
Nature of Statistical Relationship
Y

Regression
Curve

Probability distributions for


Y at different levels of X

X
Simple Linear Regression Model
 The scatter plot shows a linear relation between advertising and
sales.
 So the following regression model is suggested by the data,
Yi   0  1X1i  e i
This refers to the true relationship between the entire population of
advertising and sales values.

 The estimated regression function (based on our sample) will be


represented as,
  b b X
Yi 0 1 1i

ˆ is the estimated (of fitted) value of Y at a given level of X


Yi
How do you determine the best fit?
 Numerical values must be assigned to b0 and b1

 The method of “least squares” selects the


values that minimize:
n n
ESS  
i 1
(Yi  Y i ) 2  
i 1
(Yi  (b0  b1X1 )) 2
i

 If ESS=0 our estimated function fits the


data perfectly.
 We could solve this problem using
 Solver
 In-built Regression Tool
 Trend() function
TREND Function
TREND(Y-range, X-range, X-value for
prediction)
where:
Y-range is the spreadsheet range containing the dependent Y variable,
X-range is the spreadsheet range containing the independent X variable(s),
X-value for prediction is a cell (or cells) containing the values for the
independent X variable(s) for which we want an estimated value of Y.

Note: The TREND( ) function is dynamically updated whenever any inputs to the
function change. However, it does not provide the statistical information provided by
the regression tool. It is best two use these two different approaches to doing
regression in conjunction with one another.
R2 Statistic
• The R2 statistic indicates how well an estimated
regression function fits the data.
• 0 < R2 < 1
• It measures the proportion of the total variation
in Y around its mean that is accounted for by
the estimated regression equation.
• To understand this better, consider the following
graph...
Understanding R2

Yi (actual value)
Y
* ^
Yi - Y i

Yi - Y ^ (estimated value)
Yi
^ -Y
Y i
Y

^
Y = b0 + b1X

X
Understanding R2

n n n

 (Y  Y)   (Y  Y )   (Y  Y)
i 1
i
2

i 1
i i
2

i 1
i
2

or,
TSS = ESS + RSS

RSS ESS
R2   1
TSS TSS
Predicting the Dependent Value
 Suppose we want to estimate the average levels of
sales expected if $65,000 is spent on advertising.
  36.342  5.550X
Yi 1 i

 Estimated Sales = 36.342 + 5.550 * 65


= 397.092

 So when $65,000 is spent on advertising, we expect


the average sales level to be $397,092.
Standard Error
 The standard error measures the scatter in the actual
data around the estimate regression line.
n

 (Y  Y )
i 1
i i
2

Se 
n  k 1

where k = the number of independent variables

 For our example, Se = 20.421


 This is helpful in making predictions...
Approximate Prediction Interval
 An approximate 95% prediction interval for a new value
of Y when X1=X1h is given by

  2S
Y
where: h e

Y h  b0  b1X1
h

 Example: If $65,000 is spent on advertising:


95% lower prediction interval = 397.092 - 2*20.421 = 356.250
95% upper prediction interval = 397.092 + 2*20.421 = 437.934

 If we spend $65,000 on advertising we are approximately


95% confident actual sales will be between $356,250 and
$437,934.
Beware of Extrapolation …
• Predictions made using an estimated regression
function may have little or no validity for values
of the independent variables that are
substantially different from those represented
in the sample.
Multiple Regression Analysis
 Most regression problems involve more than one
independent variable.
 If each independent variables varies in a linear manner
with Y, the estimated regression function in this case is:

Y i  b0  b1X1  b2 X 2 bk X k
i i i

 The optimal values for the bi can again be found by


minimizing the ESS.
 The resulting function fits a hyperplane to our sample
data.
Regression Surface with 2 IVs
Y

*
* **
*
* * * *
* *
* * * *
* *
* *
*
* *
*

X2 X1
Real Estate Example
• A real estate appraiser wants to develop a model
to help predict the fair market values of residential
properties.
• Three independent variables will be used to
estimate the selling price of a house:
– Total square footage
– Number of bedrooms
– Size of garage
Selecting the Model
• We want to identify the simplest model that
adequately accounts for the systematic variation
in the Y variable.
• Arbitrarily using all the independent variables
may result in overfitting.
• A sample reflects characteristics:
– representative of the population
– specific to the sample
• We want to avoid fitting sample specific
characteristics -- or overfitting the data.
Models with One IV
• With simplicity in mind, suppose we fit three simple
linear regression functions:
Y i  b0  b1X1
i
Y  b b X
i 0 2 2i
Y  b b X
i 0 3 3i
 Key regression results are:
Variables Adjusted Parameter
in the Model R2 R2 Se Estimates
X1 0.870 0.855 10.299 b0=9.503, b1=56.394
X2 0.759 0.731 14.030 b0=78.290, b2=28.382
X3 0.793 0.770 12.982 b0=16.250, b3=27.607

 The model using X1 accounts for 87% of the variation in


Y, leaving 13% unaccounted for.
Multiple Regression in Excel

When using more than one independent


variable, all variables for the X-range must be in
one contiguous block of cells (that is, in adjacent
columns).
Models with 2 IVs
• Now suppose we fit the following models with two
independent variables:
Y i  b0  b1X1  b2 X 2
i i
Y i  b0  b1X1  b3 X 3
i i
 Key regression results are:
Variables Adjusted Parameter
in the Model R2 R2 Se Estimates
X1 0.870 0.855 10.299 b0=9.503, b1=56.394
X1 & X2 0.939 0.924 7.471 b0=27.684, b1=38.576 b2=12.875
X1 & X3 0.877 0.847 10.609 b0=8.311, b1=44.313 b3=6.743

 The model using X1 and X2 accounts for 93.9% of the


variation in Y, leaving 6.1% unaccounted for.
Adjusted R2

 As additional independent variables are added to a model:


 The R2 statistic can only increase.
 The Adjusted-R2 statistic can increase or decrease.

 The R2 statistic can be artificially inflated by adding any


independent variable to the model.
 We can compare adjusted-R2 values as a heuristic to tell if adding
an additional independent variable really helps.
Multicollinearity
• It should not be surprising that adding X3 (# of bedrooms) to
the model with X1 (total square footage) did not significantly
improve the model.
• Both variables represent the same (or very similar)
things -- the size of the house.
• These variables are highly correlated (or collinear).
• Multicollinearity should be avoided.
Model with 3 IVs
 Now suppose we fit the following model with three
independent variables:
Y i  b0  b1X1  b2 X 2  b3 X 3
i i i

 Key regression results are:


Variables Adjusted Parameter
in the Model R2 R2 Se Estimates
X1 0.870 0.855 10.299 b0=9.503, b1=56.394
X1 & X2 0.939 0.924 7.471 b0=27.684, b1=38.576, b2=12.875
X1, X2 & X3 0.943 0.918 7.762 b0=26.440, b1=30.803,
b2=12.567, b3=4.576

 The model using X1 and X2 appears to be best:


– Highest adjusted-R2
– Lowest Se (most precise prediction intervals)
Making Predictions
 Let’s estimate the avg selling price of a house with 2,100 square
feet and a 2-car garage:
  b b X b X
Yi 0 1 1i 2 2i
  27.684  38.576 * 2.1  12.875 * 2  134.444
Yi

 The estimated average selling price is $134,444

 A 95% prediction interval for the actual selling price is


approximately:
  2S
Yh e
95% lower prediction interval = 134.444 - 2*7.471 = $119,502
95% lower prediction interval = 134.444 + 2*7.471 = $149,386

You might also like