0% found this document useful (0 votes)
7 views

Python_Codes_Regression - Jupyter Notebook

Uploaded by

termp89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Python_Codes_Regression - Jupyter Notebook

Uploaded by

termp89
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

Simple Regression
In [1]:  1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import warnings
5 warnings.filterwarnings('ignore')
6 import statsmodels.formula.api as smf
7 import statsmodels.api as sm

In [2]:  1 data = pd.DataFrame({'RDE': [2,3,5,4,11,5],'AP': [20,25,34,30,40,3


2 data.plot('RDE', 'AP', kind='scatter')
3 plt.title("Annual Profit against R&D Expenditure")
4
5 plt.xlabel("R&D Expenditure (Millions)")
6 ​
7 plt.ylabel("Annual Profit (Millions)")

Out[2]: Text(0, 0.5, 'Annual Profit (Millions)')

localhost:8888/notebooks/Python_Codes_Regression.ipynb 1/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [3]:  1 df=pd.DataFrame({'RDE': [2,3,5,4,11,5],'AP': [20,25,34,30,40,31]})


2 df.plot('RDE','AP', kind='scatter')
3 lm = smf.ols("AP ~ RDE", data=df).fit()
4 xmin = df.RDE.min()
5 xmax = df.RDE.max()
6 ​
7 X = np.linspace(xmin, xmax, 100)
8 ​
9 # params[0] is the intercept (w₀)
10 # params[1] is the slope (w₁)
11 Y = lm.params[0] + lm.params[1] * X
12 plt.plot(X, Y, color="darkgreen")
13 plt.xlabel("R&D Expenditure (Millions)")
14 plt.ylabel("Annual Profit (Millions)")

Out[3]: Text(0, 0.5, 'Annual Profit (Millions)')

In [4]:  1 df = pd.DataFrame({'RDE': [2,3,5,4,11,5,10,8],'AP': [20,25,34,30,40


2 # create and fit the linear model
3 lm = smf.ols(formula='AP ~ RDE', data=df).fit()
4 print(lm.params)

Intercept 20.157895
RDE 1.973684
dtype: float64

In [5]:  1 # use the fitted model for prediction


2 lm.predict({'RDE': 10})
3 # Expected Annual Profit (Millons) for R&D Expenditure of 10 (Mill

Out[5]: 0 39.894737
dtype: float64

localhost:8888/notebooks/Python_Codes_Regression.ipynb 2/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [6]:  1 df_rd=pd.read_excel("R&D_Profit.xlsx")
2 df_rd

Out[6]:
R&D Expenditure (Millions) Annual Profit (Millions)

0 2 20

1 3 25

2 5 34

3 4 30

4 11 40

5 5 31

localhost:8888/notebooks/Python_Codes_Regression.ipynb 3/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [7]:  1 X=df_rd['R&D Expenditure (Millions)']


2 y=df_rd['Annual Profit (Millions)']
3 # Add a constant to the X variable for the intercept term
4 X = sm.add_constant(X)
5 ​
6 # Fit the model
7 model = sm.OLS(y, X).fit()
8 ​
9 # Print model summary
10 print(model.summary())

OLS Regression Results


=====================================================================
===============
Dep. Variable: Annual Profit (Millions) R-squared:
0.826
Model: OLS Adj. R-squared:
0.783
Method: Least Squares F-statistic:
19.05
Date: Wed, 23 Oct 2024 Prob (F-statistic):
0.0120
Time: 18:18:33 Log-Likelihood:
-14.351
No. Observations: 6 AIC:
32.70
Df Residuals: 4 BIC:
32.29
Df Model: 1
Covariance Type: nonrobust
=====================================================================
=========================
coef std err t P>|t
| [0.025 0.975]
---------------------------------------------------------------------
-------------------------
const 20.0000 2.646 7.559 0.00
2 12.654 27.346
R&D Expenditure (Millions) 2.0000 0.458 4.364 0.01
2 0.728 3.272
=====================================================================
=========
Omnibus: nan Durbin-Watson:
1.500
Prob(Omnibus): nan Jarque-Bera (JB):
0.327
Skew: -0.000 Prob(JB):
0.849
Kurtosis: 1.857 Cond. No.
11.8
=====================================================================
=========

Notes:
[1] Standard Errors assume that the covariance matrix of the errors i
s correctly specified.

localhost:8888/notebooks/Python_Codes_Regression.ipynb 4/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

C:\Users\91941\anaconda3\lib\site-packages\statsmodels\stats\stattool
s.py:74: ValueWarning: omni_normtest is not valid with less than 8 ob
servations; 6 samples were given.
warn("omni_normtest is not valid with less than 8 observations; %i
"

In [8]:  1 lm=smf.ols(formula='AP~RDE', data=df).fit()


2 lm.summary()

Out[8]:
OLS Regression Results

Dep. Variable: AP R-squared: 0.871

Model: OLS Adj. R-squared: 0.849

Method: Least Squares F-statistic: 40.42

Date: Wed, 23 Oct 2024 Prob (F-statistic): 0.000710

Time: 18:48:59 Log-Likelihood: -18.166

No. Observations: 8 AIC: 40.33

Df Residuals: 6 BIC: 40.49

Df Model: 1

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

Intercept 20.1579 2.094 9.626 0.000 15.034 25.282

RDE 1.9737 0.310 6.358 0.001 1.214 2.733

Omnibus: 0.039 Durbin-Watson: 1.564

Prob(Omnibus): 0.980 Jarque-Bera (JB): 0.151

Skew: -0.053 Prob(JB): 0.927

Kurtosis: 2.336 Cond. No. 15.0

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.

In [9]:  1 data = pd.read_excel("Store_Data.xlsx")


2 data.head()

Out[9]:
Bars Price Promotion

0 4141 59 200

1 3842 59 200

2 3056 59 200

3 3519 59 200

4 4226 59 400

localhost:8888/notebooks/Python_Codes_Regression.ipynb 5/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [10]:  1 data.describe()

Out[10]:
Bars Price Promotion

count 34.000000 34.000000 34.000000

mean 3098.676471 77.823529 388.235294

std 1256.422018 16.286210 162.862102

min 675.000000 59.000000 200.000000

25% 2125.250000 59.000000 200.000000

50% 3430.500000 79.000000 400.000000

75% 3968.750000 99.000000 600.000000

max 5120.000000 99.000000 600.000000

In [11]:  1 lm = smf.ols(formula='Bars ~ Price + Promotion', data=data).fit()


2 print(lm.params)

Intercept 5837.520759
Price -53.217336
Promotion 3.613058
dtype: float64

localhost:8888/notebooks/Python_Codes_Regression.ipynb 6/7
11/24/24, 2:31 PM Python_Codes_Regression - Jupyter Notebook

In [12]:  1 lm.summary()

Out[12]:
OLS Regression Results

Dep. Variable: Bars R-squared: 0.758

Model: OLS Adj. R-squared: 0.742

Method: Least Squares F-statistic: 48.48

Date: Wed, 23 Oct 2024 Prob (F-statistic): 2.86e-10

Time: 18:52:31 Log-Likelihood: -266.26

No. Observations: 34 AIC: 538.5

Df Residuals: 31 BIC: 543.1

Df Model: 2

Covariance Type: nonrobust

coef std err t P>|t| [0.025 0.975]

Intercept 5837.5208 628.150 9.293 0.000 4556.400 7118.642

Price -53.2173 6.852 -7.766 0.000 -67.193 -39.242

Promotion 3.6131 0.685 5.273 0.000 2.216 5.011

Omnibus: 1.418 Durbin-Watson: 2.282

Prob(Omnibus): 0.492 Jarque-Bera (JB): 0.486

Skew: -0.034 Prob(JB): 0.784

Kurtosis: 3.582 Cond. No. 2.45e+03

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
[2] The condition number is large, 2.45e+03. This might indicate that there are
strong multicollinearity or other numerical problems.

In [13]:  1 #Predcited average/mean sales for price of 79 cents and promotiona


2 lm.predict({'Price': 79, 'Promotion': 400})

Out[13]: 0 3078.574405
dtype: float64

In [ ]:  1 ​

localhost:8888/notebooks/Python_Codes_Regression.ipynb 7/7

You might also like