0% found this document useful (0 votes)

11 views21 pages

Lab Linear Regression

This document covers linear regression techniques using Python, specifically focusing on the Boston housing dataset. It includes sections on simple and multiple linear regression, interaction terms, and diagnostic plots, utilizing libraries such as statsmodels and matplotlib for analysis and visualization. Key results from regression analyses, including coefficients and statistical significance, are presented throughout the document.

Uploaded by

tedxitu2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views21 pages

Lab Linear Regression

Uploaded by

tedxitu2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

ISLR- Python: Chapter 3 -- Lab Linear

Regression
• Load Datasets
• 3.6.2 Simple Linear Regression
• 3.6.3 Multiple Linear Regression
• 3.6.4 Interaction Terms
• 3.6.5 Non-linear Transformations of the Predictors
• 3.6.6 Qualitative Predictors
## perform imports and set-up
import numpy as np
import pandas as pd
import statsmodels.api as sm

from scipy import stats

from matplotlib import pyplot as plt
from sklearn.datasets import load_boston # boston data set is part of
sklearn
from statsmodels.sandbox.regression.predstd import wls_prediction_std
from mpl_toolkits.mplot3d import Axes3D

%matplotlib inline
plt.style.use('ggplot') # emulate pretty r-style plots

Load Datasets
# Load Boston housing data set
boston = load_boston()

#Transform the data into a dataframe for analysis¶

# combine the predictors and responses for a dataframe

predictors = boston.data
response = boston.target
boston_data = np.column_stack([predictors,response])

# now get the column names of the data frame

col_names = np.append(boston.feature_names, 'MEDV')

# create the data frame

boston_df = pd.DataFrame(boston_data, columns = col_names)
boston_df.head()

# Load the carseats data set

carseats_df = pd.read_csv('../data/Carseats.csv',index_col = 0)
3.6.2 Simple Linear Regression
We will regress LSTAT (% of households with low economic status) onto the MEDV
(median home values). We will do this using two methods -- scipy.linregress and
statsmodels. We will look at several diagnostics that describe the quality of the fit.

Regress LSTAT onto MEDV (Scipy)

# Create a figure to plot our data and OLS estimate.
fig, ax = plt.subplots(figsize=(8,6))

ax.scatter(boston_df.LSTAT.values,
boston_df.MEDV.values,facecolors='none', edgecolors='b',\
label="data");
ax.set_ylabel('MEDV');
ax.set_xlabel('LSTAT');

# call scipy's linregress returning fit coefficients and simple

statistics
beta1, beta0, r_value, p_value, stderr =
stats.linregress(boston_df.LSTAT.values,\

boston_df.MEDV.values)

# add the estimation to the data plot

ax.plot(boston_df.LSTAT.values, beta0 +
beta1*boston_df.LSTAT.values,color='r', label="OLS");
ax.legend(loc='best');

# print the regression estimates returned from scipy

print('beta= [',round(beta0,3),',', round(beta1,3),']')
print('R={0:.3f}, p_value={1:.3f}, stderr={2:.3f}'.format(r_value,
p_value, stderr))

beta= [ 34.554 , -0.95 ]

R=-0.738, p_value=0.000, stderr=0.039
Regress LSTAT onto MEDV (Statsmodels)
# Another method is to use the package statsmodels.

# Create a design matrix

# set the independent variable as the LSTAT
X = boston_df.LSTAT
# We add a constant for the intercept term
X = sm.add_constant(X)

# set the dependent variable

Y = boston_df.MEDV

# create the model instance and fit

linear_model = sm.OLS(Y,X)
linear_results = linear_model.fit()
# data about the model is stored in summary
print(linear_results.summary())

OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.544
Model: OLS Adj. R-squared:
0.543
Method: Least Squares F-statistic:
601.6
Date: Fri, 24 Jun 2016 Prob (F-statistic):
5.08e-88
Time: 10:05:12 Log-Likelihood:
-1641.5
No. Observations: 506 AIC:
3287.
Df Residuals: 504 BIC:
3295.
Df Model: 1

Covariance Type: nonrobust

======================================================================
========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
--------
const 34.5538 0.563 61.415 0.000 33.448
35.659
LSTAT -0.9500 0.039 -24.528 0.000 -1.026
-0.874
======================================================================
========
Omnibus: 137.043 Durbin-Watson:
0.892
Prob(Omnibus): 0.000 Jarque-Bera (JB):
291.373
Skew: 1.453 Prob(JB):
5.36e-64
Kurtosis: 5.319 Cond. No.
29.7
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.

# statsmodels includes functions for getting the prediction and

confidence intervals.

# Get prediction interval for a given x

prstd, iv_l, iv_u = wls_prediction_std(linear_results)

# statsmodels also includes the confidence intervals for the fitted

values (i.e. the mean fits)
# but they are buried in the results of summary table function.
from statsmodels.stats.outliers_influence import summary_table

# call the summary table at a (1-alpha)100% confidence interval level

simpleTable, data, column_names = summary_table(linear_results,
alpha=0.05)

# Data contains the confidence intervals we want but we need to make

sure we get the right
# columns so lets print off the names
print('column_names: ', column_names)

# Get confidence intervals for a given x

predicted_mean_ci_low, predicted_mean_ci_high = data[:,4:6].T

column_names: ['Obs', 'Dep Var\nPopulation', 'Predicted\nValue', 'Std

Error\nMean Predict', 'Mean ci\n95% low', 'Mean ci\n95% upp', 'Predict
ci\n95% low', 'Predict ci\n95% upp', 'Residual', 'Std Error\
nResidual', 'Student\nResidual', "Cook's\nD"]

# Create a plot to plot the data, OLS estimate, prediction and

confidence intervals
fig, ax = plt.subplots(figsize=(8,6))

# get numpy array values from dataframe

x = boston_df.LSTAT.values
y = boston_df.MEDV.values

# Plot the data

ax.scatter(x, y, facecolors='none', edgecolors='b', label="data")
# plot the models fitted values
ax.plot(x, linear_results.fittedvalues, 'g', label="OLS")

# plot the high and low prediction intervals

ax.plot(x, iv_u, color='0.75',label="Prediction Interval")
ax.plot(x, iv_l, color='0.75')

# plot the high and low mean confidence intervals

ax.plot(x,predicted_mean_ci_low, 'r', label="Predicted Mean CI")
ax.plot(x,predicted_mean_ci_high,'r')

ax.legend(loc='best');

plt.xlabel('LSTAT');
plt.ylabel('MEDV');
Diagnostic Plots for Linear Model
# Create plots of residuals
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(12,6))

# RESIDUALS
# The results contain the residuals
fitted_values = linear_results.fittedvalues.values
residuals = linear_results.resid.values

# Plot the residual for each fitted value

ax1.scatter(fitted_values, residuals, facecolors='none',
edgecolors='b');
ax1.set_xlabel('fitted values');
ax1.set_ylabel('residuals');
# The residual plot indicates significant nonlinearity (a u-shape
pattern is clear)

# STUDENTIZED RESIDUALS
# To asses data outliers we will look at the studentized residuals.
This is in the data array
# returned from summary table (10th column)
studentized_residuals = data[:,10]

# Plot the studentized residuals

ax2.scatter(fitted_values,studentized_residuals, facecolors='none',
edgecolors='b');
ax2.set_ylabel('Studentized Residuals');
ax2.set_xlabel('fitted values');

# |studentized residual| > 3 are generally considered outliers

# We can also examine the leverages to identify points that may alter
the regression line
from statsmodels.stats.outliers_influence import OLSInfluence
leverage = OLSInfluence(linear_results).influence

fig, ax = plt.subplots(figsize=(8,6))
ax.scatter(leverage, studentized_residuals,facecolors='none',
edgecolors='b');
ax.set_xlabel('Leverage');
ax.set_ylabel('Studentized Residuals');
3.6.3 Multiple Linear Regression
Here we will estimate MEDV using multiple linear regression. In the first example we
will regress LSTAT and AGE onto MEDV.
# create our design matrix using LSTAT and AGE predictors
X = sm.add_constant(boston_df[['LSTAT','AGE']])

# set the dependent variable

Y = boston_df.MEDV

fig = plt.figure(figsize=(12, 8))

ax = fig.add_subplot(111, projection='3d',azim=-60, elev=5)

# create the model instance and estimate

model = sm.OLS(Y,X)
estimate = model.fit()
# data about the model is stored in summary
print('Model parameters:\n', estimate.params[:])

# Plot the data

ax.scatter(X.loc[:,'LSTAT'], X.loc[:,'AGE'], Y.values,
facecolors=(0,0,0,0),\
edgecolor='k', depthshade=True);
ax.set_xlabel('LSTAT');
ax.set_ylabel('AGE')
ax.set_zlabel('MEDV')

# Plot the OLS estimate

# create a grid of points
xx1, xx2 = np.meshgrid(np.linspace(X.LSTAT.min(), X.LSTAT.max(), 100),

np.linspace(X.AGE.min(), X.AGE.max(), 100))

# plot the plane by evaluating the parameters over the grid

Z = estimate.params[0] + estimate.params[1] * xx1 + estimate.params[2]
* xx2
# plot plane
surf = ax.plot_surface(xx1, xx2, Z, cmap=plt.cm.RdBu_r, alpha=0.75,
linewidth=0)

Model parameters:
const 33.222761
LSTAT -1.032069
AGE 0.034544
dtype: float64
Now we will perform the regression over all 13 predictors in the boston housing
dataset.
# create our design matrix using all the predictors (last column is
MEDV)
X = sm.add_constant(boston_df.iloc[:,0:-1])

# create the model instance and estimate

model = sm.OLS(Y,X)
est = model.fit()
# data about the model is stored in summary
print(est.summary())

OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.741
Model: OLS Adj. R-squared:
0.734
Method: Least Squares F-statistic:
108.1
Date: Fri, 24 Jun 2016 Prob (F-statistic):
6.95e-135
Time: 10:05:13 Log-Likelihood:
-1498.8
No. Observations: 506 AIC:
3026.
Df Residuals: 492 BIC:
3085.
Df Model: 13

Covariance Type: nonrobust

======================================================================
========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
--------
const 36.4911 5.104 7.149 0.000 26.462
46.520
CRIM -0.1072 0.033 -3.276 0.001 -0.171
-0.043
ZN 0.0464 0.014 3.380 0.001 0.019
0.073
INDUS 0.0209 0.061 0.339 0.735 -0.100
0.142
CHAS 2.6886 0.862 3.120 0.002 0.996
4.381
NOX -17.7958 3.821 -4.658 0.000 -25.302
-10.289
RM 3.8048 0.418 9.102 0.000 2.983
4.626
AGE 0.0008 0.013 0.057 0.955 -0.025
0.027
DIS -1.4758 0.199 -7.398 0.000 -1.868
-1.084
RAD 0.3057 0.066 4.608 0.000 0.175
0.436
TAX -0.0123 0.004 -3.278 0.001 -0.020
-0.005
PTRATIO -0.9535 0.131 -7.287 0.000 -1.211
-0.696
B 0.0094 0.003 3.500 0.001 0.004
0.015
LSTAT -0.5255 0.051 -10.366 0.000 -0.625
-0.426
======================================================================
========
Omnibus: 178.029 Durbin-Watson:
1.078
Prob(Omnibus): 0.000 Jarque-Bera (JB):
782.015
Skew: 1.521 Prob(JB):
1.54e-170
Kurtosis: 8.276 Cond. No.
1.51e+04
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.51e+04. This might indicate that
there are
strong multicollinearity or other numerical problems.

# Compute all the variance inflation factors

from statsmodels.stats.outliers_influence import
variance_inflation_factor

VIFs = [(predictor, variance_inflation_factor(X.values,_)) \

for _,predictor in enumerate(list(X))] # list(X) returns
column names of df

print('Variance Inflation Factors')

for tup in VIFs:
print('{:10}'.format(tup[0]), '{:.3f}'.format(tup[1]))

Variance Inflation Factors

const 585.425
CRIM 1.773
ZN 2.299
INDUS 3.991
CHAS 1.074
NOX 4.395
RM 1.934
AGE 3.101
DIS 3.957
RAD 7.481
TAX 9.008
PTRATIO 1.799
B 1.346
LSTAT 2.938

3.6.4 Interaction Terms

Statsmodels uses the patsy package to convert formulas to matrices for fitting. This
allows for easy implementation of arbitrary functions of the predictors.
# import statsmodels patsy api
import statsmodels.formula.api as smf
# Construct model and fit
model = smf.ols(formula='MEDV ~ LSTAT*AGE', data=boston_df)
estimate = model.fit()
print(estimate.summary())

OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.556
Model: OLS Adj. R-squared:
0.553
Method: Least Squares F-statistic:
209.3
Date: Fri, 24 Jun 2016 Prob (F-statistic):
4.86e-88
Time: 10:05:13 Log-Likelihood:
-1635.0
No. Observations: 506 AIC:
3278.
Df Residuals: 502 BIC:
3295.
Df Model: 3

Covariance Type: nonrobust

======================================================================
========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
--------
Intercept 36.0885 1.470 24.553 0.000 33.201
38.976
LSTAT -1.3921 0.167 -8.313 0.000 -1.721
-1.063
AGE -0.0007 0.020 -0.036 0.971 -0.040
0.038
LSTAT:AGE 0.0042 0.002 2.244 0.025 0.001
0.008
======================================================================
========
Omnibus: 135.601 Durbin-Watson:
0.965
Prob(Omnibus): 0.000 Jarque-Bera (JB):
296.955
Skew: 1.417 Prob(JB):
3.29e-65
Kurtosis: 5.461 Cond. No.
6.88e+03
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 6.88e+03. This might indicate that
there are
strong multicollinearity or other numerical problems.

3.6.5 Non-linear Transformations of the Predictors

Statsmodel patsy api allows us to include non-linear terms
model = smf.ols('MEDV ~ LSTAT + I(LSTAT**2)', data=boston_df)
quadratic_results = model.fit()
print(quadratic_results.summary())

OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.641
Model: OLS Adj. R-squared:
0.639
Method: Least Squares F-statistic:
448.5
Date: Fri, 24 Jun 2016 Prob (F-statistic):
1.56e-112
Time: 10:05:13 Log-Likelihood:
-1581.3
No. Observations: 506 AIC:
3169.
Df Residuals: 503 BIC:
3181.
Df Model: 2

Covariance Type: nonrobust

======================================================================
===========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
-----------
Intercept 42.8620 0.872 49.149 0.000
41.149 44.575
LSTAT -2.3328 0.124 -18.843 0.000 -
2.576 -2.090
I(LSTAT ** 2) 0.0435 0.004 11.628 0.000
0.036 0.051
======================================================================
========
Omnibus: 107.006 Durbin-Watson:
0.921
Prob(Omnibus): 0.000 Jarque-Bera (JB):
228.388
Skew: 1.128 Prob(JB):
2.55e-50
Kurtosis: 5.397 Cond. No.
1.13e+03
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.13e+03. This might indicate that
there are
strong multicollinearity or other numerical problems.

The near zero p-value for the quadratic term suggest an improved model. We will
plot and perform some diagnostics of the fit.
fig, ax = plt.subplots(figsize=(8,6))

# get numpy array values from dataframe

x = boston_df.LSTAT.values
y = boston_df.MEDV.values

# Plot the data

ax.scatter(x, y, facecolors='none', edgecolors='b', label="data");
# plot the models fitted values
ax.plot(x, quadratic_results.fittedvalues, 'g',
marker='o',linestyle='none', label="OLS");

ax.legend(loc='best');

plt.xlabel('LSTAT');
plt.ylabel('MEDV');
Diagnostic tests of quadratic estimate
# import anova function
from statsmodels.stats.api import anova_lm

# perform the hypothesis test (see https://en.wikipedia.org/wiki/F-

test regression section)
anova_table = anova_lm(linear_results, quadratic_results)
print(anova_table)

df_resid ssr df_diff ss_diff F

Pr(>F)
0 504 19472.381418 0 NaN NaN
NaN
1 503 15347.243158 1 4125.13826 135.199822 7.630116e-
28
The F-statistic is 135 with a p-value of ~0 indicating there is a large difference in the
unexplained variances of the two models. This is not too suprising given the plot
above. Now consider the residuals.
# Create plots of residuals
fig, (ax1,ax2) = plt.subplots(1,2,figsize=(12,6))

# RESIDUALS OF LINEAR MODEL

# The results contain the residuals
linear_fit_values = linear_results.fittedvalues.values
residuals = linear_results.resid.values

# Plot the residual for each fitted value for the linear model
ax1.scatter(linear_fit_values, residuals, facecolors='none',
edgecolors='b');
ax1.set_xlabel('fitted values');
ax1.set_ylabel('residuals');
ax1.set_title('Linear Model Residuals')

# RESIDUALS OF QUADRATIC MODEL

# The results contain the residuals
quadratic_fit_values = quadratic_results.fittedvalues.values
quadratic_residuals = quadratic_results.resid.values
ax2.scatter(quadratic_fit_values, quadratic_residuals,
facecolors='none', edgecolors='b');
ax2.set_title('Quadratic Model Residuals');
We can also try higher order polynomial fits:
formula = 'MEDV ~ 1 + ' + ' + '.join('I(LSTAT**{})'.format(i) for i in
range(1, 6))
print(formula)
model = smf.ols(formula, data=boston_df)
order_5_results = model.fit()
print(order_5_results.summary())

MEDV ~ 1 + I(LSTAT1) + I(LSTAT2) + I(LSTAT3) + I(LSTAT4) +

I(LSTAT**5)
OLS Regression Results

======================================================================
========
Dep. Variable: MEDV R-squared:
0.682
Model: OLS Adj. R-squared:
0.679
Method: Least Squares F-statistic:
214.2
Date: Fri, 24 Jun 2016 Prob (F-statistic):
8.73e-122
Time: 10:57:12 Log-Likelihood:
-1550.6
No. Observations: 506 AIC:
3113.
Df Residuals: 500 BIC:
3139.
Df Model: 5

Covariance Type: nonrobust

======================================================================
===========
coef std err t P>|t| [95.0%
Conf. Int.]
----------------------------------------------------------------------
-----------
Intercept 67.6997 3.604 18.783 0.000
60.618 74.781
I(LSTAT ** 1) -11.9911 1.526 -7.859 0.000 -
14.989 -8.994
I(LSTAT ** 2) 1.2728 0.223 5.703 0.000
0.834 1.711
I(LSTAT ** 3) -0.0683 0.014 -4.747 0.000 -
0.097 -0.040
I(LSTAT ** 4) 0.0017 0.000 4.143 0.000
0.001 0.003
I(LSTAT ** 5) -1.632e-05 4.42e-06 -3.692 0.000 -2.5e-
05 -7.63e-06
======================================================================
========
Omnibus: 144.085 Durbin-Watson:
0.987
Prob(Omnibus): 0.000 Jarque-Bera (JB):
494.545
Skew: 1.292 Prob(JB):
4.08e-108
Kurtosis: 7.096 Cond. No.
1.37e+08
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.37e+08. This might indicate that
there are
strong multicollinearity or other numerical problems.

3.6.6 Qualitative Predictors

# Examine the head of the carseats data set
carseats_df.head()

Sales CompPrice Income Advertising Population Price ShelveLoc

Age \
1 9.50 138 73 11 276 120 Bad
42
2 11.22 111 48 16 260 83 Good
65
3 10.06 113 35 10 269 80 Medium
59
4 7.40 117 100 4 466 97 Medium
55
5 4.15 141 64 3 340 128 Bad
38

Education Urban US
1 17 Yes Yes
2 10 Yes Yes
3 12 Yes Yes
4 14 Yes Yes
5 13 Yes No

# Construct the formula with two interaction terms

formula ='Sales ~' +
'+'.join(list(carseats_df.iloc[:,1:].columns.tolist()) +
['Income:Advertising'] + ['Price:Age'])
print(formula)
print()

model = smf.ols(formula, data=carseats_df)

carseat_results = model.fit()
print(carseat_results.summary())

Sales
~CompPrice+Income+Advertising+Population+Price+ShelveLoc+Age+Education
+Urban+US+Income:Advertising+Price:Age

OLS Regression Results

======================================================================
========
Dep. Variable: Sales R-squared:
0.876
Model: OLS Adj. R-squared:
0.872
Method: Least Squares F-statistic:
210.0
Date: Fri, 24 Jun 2016 Prob (F-statistic):
6.14e-166
Time: 14:38:28 Log-Likelihood:
-564.67
No. Observations: 400 AIC:
1157.
Df Residuals: 386 BIC:
1213.
Df Model: 13

Covariance Type: nonrobust

======================================================================
=================
coef std err t P>|t|
[95.0% Conf. Int.]
----------------------------------------------------------------------
-----------------
Intercept 6.5756 1.009 6.519 0.000
4.592 8.559
ShelveLoc[T.Good] 4.8487 0.153 31.724 0.000
4.548 5.149
ShelveLoc[T.Medium] 1.9533 0.126 15.531 0.000
1.706 2.201
Urban[T.Yes] 0.1402 0.112 1.247 0.213
-0.081 0.361
US[T.Yes] -0.1576 0.149 -1.058 0.291
-0.450 0.135
CompPrice 0.0929 0.004 22.567 0.000
0.085 0.101
Income 0.0109 0.003 4.183 0.000
0.006 0.016
Advertising 0.0702 0.023 3.107 0.002
0.026 0.115
Population 0.0002 0.000 0.433 0.665
-0.001 0.001
Price -0.1008 0.007 -13.549 0.000
-0.115 -0.086
Age -0.0579 0.016 -3.633 0.000
-0.089 -0.027
Education -0.0209 0.020 -1.063 0.288
-0.059 0.018
Income:Advertising 0.0008 0.000 2.698 0.007
0.000 0.001
Price:Age 0.0001 0.000 0.801 0.424
-0.000 0.000
======================================================================
========
Omnibus: 1.281 Durbin-Watson:
2.047
Prob(Omnibus): 0.527 Jarque-Bera (JB):
1.147
Skew: 0.129 Prob(JB):
0.564
Kurtosis: 3.050 Cond. No.
1.31e+05
======================================================================
========

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is
correctly specified.
[2] The condition number is large, 1.31e+05. This might indicate that
there are
strong multicollinearity or other numerical problems.

# The default treatment for categoricals in patsy is the reference=0.

In this case the dummy variable for bad is 0.
# more treatments can be found in the patsy documentation. ....from
patsy.contrasts import Treatment

T1 Worksheet 1
100% (1)
T1 Worksheet 1
5 pages
Management Science Syllabus
0% (1)
Management Science Syllabus
4 pages
Regression Anallysis Hands0n 1
100% (1)
Regression Anallysis Hands0n 1
3 pages
Assignment Solution 1
No ratings yet
Assignment Solution 1
11 pages
Pregunta 5
No ratings yet
Pregunta 5
2 pages
TP Regression
100% (1)
TP Regression
1 page
Simple and Multiple Regression
No ratings yet
Simple and Multiple Regression
9 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
5 pages
Machine Learning-Lecture 1 (Student)
No ratings yet
Machine Learning-Lecture 1 (Student)
14 pages
Assignment AI-ML
No ratings yet
Assignment AI-ML
13 pages
MLR-handson - Jupyter Notebook
No ratings yet
MLR-handson - Jupyter Notebook
5 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
TestExercise 3.ipynb - Colab
No ratings yet
TestExercise 3.ipynb - Colab
8 pages
An Introduction To Stadistical Learning-129-140-1-8
No ratings yet
An Introduction To Stadistical Learning-129-140-1-8
8 pages
Bda Assign
No ratings yet
Bda Assign
15 pages
En Tanagra Python StatsModels PDF
No ratings yet
En Tanagra Python StatsModels PDF
20 pages
Multiple Linear Regression
100% (1)
Multiple Linear Regression
14 pages
6th Lecture Note 108335647 230518 203102
No ratings yet
6th Lecture Note 108335647 230518 203102
35 pages
Week 2 MrSumanBera HandsOn
No ratings yet
Week 2 MrSumanBera HandsOn
9 pages
Exam 1 Notes
No ratings yet
Exam 1 Notes
4 pages
Assigment Regression
No ratings yet
Assigment Regression
9 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
Maxbox Starter149 Statistic Package Statsmodels
No ratings yet
Maxbox Starter149 Statistic Package Statsmodels
9 pages
Exercise 4: Simple and Multiple Linear Regression Analysis
No ratings yet
Exercise 4: Simple and Multiple Linear Regression Analysis
15 pages
20mia1006 FDA LAB REGRESSION TYPES
No ratings yet
20mia1006 FDA LAB REGRESSION TYPES
11 pages
New Text Document
No ratings yet
New Text Document
7 pages
The Arbitrage Pricing Theory Model
No ratings yet
The Arbitrage Pricing Theory Model
3 pages
Assignment 1 Tuneer
No ratings yet
Assignment 1 Tuneer
5 pages
Machine Learning Basics 1683717543
No ratings yet
Machine Learning Basics 1683717543
15 pages
Simpreg
No ratings yet
Simpreg
6 pages
04 - Notebook4 - Additional Information
No ratings yet
04 - Notebook4 - Additional Information
5 pages
HW 2
No ratings yet
HW 2
9 pages
Regression
No ratings yet
Regression
4 pages
Chapter 14
No ratings yet
Chapter 14
18 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Lecture Notes Week 3
No ratings yet
Lecture Notes Week 3
61 pages
Coding Activity 3.ipynb - Colaboratory
No ratings yet
Coding Activity 3.ipynb - Colaboratory
7 pages
Lab 5
No ratings yet
Lab 5
6 pages
Python - Codes - Regression - Jupyter Notebook
No ratings yet
Python - Codes - Regression - Jupyter Notebook
7 pages
Ps 3
No ratings yet
Ps 3
16 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
Regression Analysis
100% (1)
Regression Analysis
280 pages
Kata Pengantar Vano
No ratings yet
Kata Pengantar Vano
86 pages
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
Lab 4
No ratings yet
Lab 4
7 pages
Using R For Linear Regression
No ratings yet
Using R For Linear Regression
9 pages
Seu Ds610 Mod03
No ratings yet
Seu Ds610 Mod03
45 pages
Sla4a 21im30005
No ratings yet
Sla4a 21im30005
11 pages
BA Soln
No ratings yet
BA Soln
9 pages
Lecture 2 Multivariate Linear Regression Models
No ratings yet
Lecture 2 Multivariate Linear Regression Models
15 pages
Linear Regression Lab Report Share
No ratings yet
Linear Regression Lab Report Share
9 pages
Lab 8
No ratings yet
Lab 8
13 pages
Section 2
No ratings yet
Section 2
22 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
21brs1474 ML Lab 2
No ratings yet
21brs1474 ML Lab 2
25 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet
Eighteenth Edition, Global Edition: Marketing: Creating Customer Value and Engagement
No ratings yet
Eighteenth Edition, Global Edition: Marketing: Creating Customer Value and Engagement
58 pages
Ch02 Statlearn Lab
No ratings yet
Ch02 Statlearn Lab
58 pages
Chapter 4 Details
No ratings yet
Chapter 4 Details
5 pages
Eighteenth Edition, Global Edition: Customer Value-Driven Marketing Strategy: Creating Value For Target Customers
No ratings yet
Eighteenth Edition, Global Edition: Customer Value-Driven Marketing Strategy: Creating Value For Target Customers
51 pages
Week 4 Survey
No ratings yet
Week 4 Survey
56 pages
Lec 02 Ch02 TheBasicsofSupplyandDemand
No ratings yet
Lec 02 Ch02 TheBasicsofSupplyandDemand
71 pages
PP Outline - Updated
No ratings yet
PP Outline - Updated
3 pages
Esad
No ratings yet
Esad
14 pages
xzcxzdsadW3-1-Hypothesis Testing - Two Sample Tests
No ratings yet
xzcxzdsadW3-1-Hypothesis Testing - Two Sample Tests
47 pages
W2-1-Dasdasdasd Testing - One Sample
No ratings yet
W2-1-Dasdasdasd Testing - One Sample
50 pages
W1 Part1 Sampling Distributions
No ratings yet
W1 Part1 Sampling Distributions
18 pages
W1 - Part2 - Confidence Interval
No ratings yet
W1 - Part2 - Confidence Interval
36 pages
Digital Transformation in Sustainability
No ratings yet
Digital Transformation in Sustainability
49 pages
Statistics II Homework II Week 2-Week 3 Two-Sample Tests
No ratings yet
Statistics II Homework II Week 2-Week 3 Two-Sample Tests
3 pages
Governance (Bitmedi)
No ratings yet
Governance (Bitmedi)
19 pages
Saddsakdsfaspof
No ratings yet
Saddsakdsfaspof
1 page
The Evolution of Interstellar Travel in Science Fiction
No ratings yet
The Evolution of Interstellar Travel in Science Fiction
2 pages
2 e 312321 Qwe
No ratings yet
2 e 312321 Qwe
2 pages
2
No ratings yet
2
2 pages
To Compress or Not To Compress - Self-Supervised Learning and Information Theory: A Review
No ratings yet
To Compress or Not To Compress - Self-Supervised Learning and Information Theory: A Review
38 pages
Worksheet 2.1 (1) SPSS
No ratings yet
Worksheet 2.1 (1) SPSS
11 pages
Materials and Manufacturing Processes
No ratings yet
Materials and Manufacturing Processes
9 pages
The Handbook of Computational Linguistics and Natural Language Processing
No ratings yet
The Handbook of Computational Linguistics and Natural Language Processing
5 pages
Using Genetic Programming To Improve Software Effort Estimation Based On General Data Sets
No ratings yet
Using Genetic Programming To Improve Software Effort Estimation Based On General Data Sets
11 pages
Module 2 DAA
No ratings yet
Module 2 DAA
5 pages
Simscape Ug
No ratings yet
Simscape Ug
970 pages
Bubble Sort
No ratings yet
Bubble Sort
3 pages
Introduction To Deep Learning
No ratings yet
Introduction To Deep Learning
2 pages
Reinforcement Learning 2
No ratings yet
Reinforcement Learning 2
41 pages
Analysis of Ecg Signals Main Edited
No ratings yet
Analysis of Ecg Signals Main Edited
15 pages
AArado - Simplex Method Quiz
No ratings yet
AArado - Simplex Method Quiz
3 pages
Learning Cyber Security and Machine Engineering at The University
No ratings yet
Learning Cyber Security and Machine Engineering at The University
6 pages
Esaform 2008
No ratings yet
Esaform 2008
4 pages
Solved Hugo Has A Concave Utility Function of U W w0 5
No ratings yet
Solved Hugo Has A Concave Utility Function of U W w0 5
1 page
Bayesian Games
No ratings yet
Bayesian Games
23 pages
Machine Learning-Based Antenna Selection in Wireless Communications
No ratings yet
Machine Learning-Based Antenna Selection in Wireless Communications
4 pages
Wine Quality Analysis
No ratings yet
Wine Quality Analysis
27 pages
A List of Computer Science Journals (ISI Indexed)
87% (15)
A List of Computer Science Journals (ISI Indexed)
12 pages
Grade 7 Test 2 Worksheet
No ratings yet
Grade 7 Test 2 Worksheet
6 pages
Time Independent Perturb
No ratings yet
Time Independent Perturb
9 pages
POM
No ratings yet
POM
15 pages
Motion Control of Single Link Flexible Joint Robot Manipulator Using ANFIS MATLAB Simulation
No ratings yet
Motion Control of Single Link Flexible Joint Robot Manipulator Using ANFIS MATLAB Simulation
10 pages
Elimination
No ratings yet
Elimination
12 pages
Matrices Well
No ratings yet
Matrices Well
4 pages
Practice Paper 4 - For Teachers
No ratings yet
Practice Paper 4 - For Teachers
16 pages
Successive Over Relaxation Method
No ratings yet
Successive Over Relaxation Method
5 pages

Lab Linear Regression

Uploaded by

Lab Linear Regression

Uploaded by

ISLR- Python: Chapter 3 -- Lab Linear

from scipy import stats

#Transform the data into a dataframe for analysis¶

# combine the predictors and responses for a dataframe

# now get the column names of the data frame

# create the data frame

# Load the carseats data set

Regress LSTAT onto MEDV (Scipy)

# call scipy's linregress returning fit coefficients and simple

# add the estimation to the data plot

# print the regression estimates returned from scipy

beta= [ 34.554 , -0.95 ]

# Create a design matrix

# set the dependent variable

# create the model instance and fit

OLS Regression Results

Covariance Type: nonrobust

# statsmodels includes functions for getting the prediction and

# Get prediction interval for a given x

# statsmodels also includes the confidence intervals for the fitted

# call the summary table at a (1-alpha)100% confidence interval level

# Data contains the confidence intervals we want but we need to make

# Get confidence intervals for a given x

column_names: ['Obs', 'Dep Var\nPopulation', 'Predicted\nValue', 'Std

# Create a plot to plot the data, OLS estimate, prediction and

# get numpy array values from dataframe

# Plot the data

# plot the high and low prediction intervals

# plot the high and low mean confidence intervals

# Plot the residual for each fitted value

# Plot the studentized residuals

# |studentized residual| > 3 are generally considered outliers

# set the dependent variable

fig = plt.figure(figsize=(12, 8))

# create the model instance and estimate

# Plot the data

# Plot the OLS estimate

np.linspace(X.AGE.min(), X.AGE.max(), 100))

# plot the plane by evaluating the parameters over the grid

# create the model instance and estimate

OLS Regression Results

Covariance Type: nonrobust

# Compute all the variance inflation factors

VIFs = [(predictor, variance_inflation_factor(X.values,_)) \

print('Variance Inflation Factors')

Variance Inflation Factors

3.6.4 Interaction Terms

OLS Regression Results

Covariance Type: nonrobust

3.6.5 Non-linear Transformations of the Predictors

OLS Regression Results

Covariance Type: nonrobust

# get numpy array values from dataframe

# Plot the data

# perform the hypothesis test (see https://en.wikipedia.org/wiki/F-

df_resid ssr df_diff ss_diff F

# RESIDUALS OF LINEAR MODEL

# RESIDUALS OF QUADRATIC MODEL

MEDV ~ 1 + I(LSTAT**1) + I(LSTAT**2) + I(LSTAT**3) + I(LSTAT**4) +

Covariance Type: nonrobust

3.6.6 Qualitative Predictors

Sales CompPrice Income Advertising Population Price ShelveLoc

# Construct the formula with two interaction terms

model = smf.ols(formula, data=carseats_df)

OLS Regression Results

Covariance Type: nonrobust

# The default treatment for categoricals in patsy is the reference=0.

You might also like

MEDV ~ 1 + I(LSTAT1) + I(LSTAT2) + I(LSTAT3) + I(LSTAT4) +