Exp 1 121a1047 Lavanya Kurup ML

NAME: LAVANYA KURUP
ROLL NO : 121A1047
C3
EXPERIMENT 1: TO IMPLEMENT LINEAR AND

MULTIPLE LINEAR REGRESSION
AIM: In this experiment we will learn to implement linear and

multiple regression
THEORY :
What is Linear Regression?

Linear regression is a statistical technique used to model and analyze the
relationship between a dependent variable (outcome) and one independent
variable (predictor). It assumes a linear relationship between these variables,
meaning that changes in the independent variable result in proportional
changes in the dependent variable.
Key Concepts
1. Dependent and Independent Variables:

 Dependent Variable (Y): The outcome or response variable
that you want to predict or explain.
 Independent Variable (X): The predictor or explanatory
variable used to predict the dependent variable.
2. Linear Relationship:
This line is described by the linear equation: Y=β0+β1X+ϵY = \

 The relationship between X and Y is modeled as a straight line.
beta_0 + \beta_1 X + \epsilonY=β0 +β1 X+ϵ

 β0\beta_0β0 : The intercept of the line, representing the value of
 β1\beta_1β1 : The slope of the line, representing the change in Y

Y when X is zero.
for a one-unit change in X.

 ϵ\epsilonϵ: The error term, capturing the deviations of observed
values from the predicted values.
Steps in Linear Regression
3. Formulate the Model:
simple linear regression, the model is: Y=β0+β1X+ϵY = \beta_0

 Decide the form of the linear relationship you want to model. For
+ \beta_1 X + \epsilonY=β0 +β1 X+ϵ

4. Estimate Parameters:
for estimating β0\beta_0β0 and β1\beta_1β1 . It aims to

 Ordinary Least Squares (OLS) is the most common method
minimize the sum of the squared differences between observed
 The estimated parameters (β^0\hat{\beta}_0β^ 0 and β^1\

values and predicted values (errors).
hat{\beta}_1β^ 1 ) are found by solving: minimize∑i=1n(Yi−

(β^0+β^1Xi))2\text{minimize} \sum_{i=1}^n (Y_i - (\hat{\
beta}_0 + \hat{\beta}_1 X_i))^2minimize∑i=1n (Yi −(β^ 0 +β^ 1
Xi ))2
5. Evaluate the Model:
values (Yi−Y^iY_i - \hat{Y}_iYi −Y^i ).

 Residuals: The differences between observed and predicted
 R-squared (R2R^2R2): The proportion of variance in the

 Goodness of Fit:
dependent variable that is predictable from the

independent variable. It ranges from 0 to 1, with higher
 Adjusted R-squared: Adjusts R2R^2R2 for the number of

values indicating a better fit.
predictors, providing a more accurate measure for models

with multiple predictors.
6. Check Assumptions:
 Linearity: The relationship between X and Y should be linear.
 Independence: Observations should be independent of each
other.
 Homoscedasticity: The variance of residuals should be
constant across all levels of X.
 Normality: Residuals should be approximately normally
distributed (mainly for hypothesis testing).
7. Interpret the Results:
 Slope (β1\beta_1β1 ): Indicates how much Y changes for a one-
Intercept (β0\beta_0β0 ): Indicates the value of Y when X is

unit change in X.

zero. In some contexts, the intercept may not have a meaningful
interpretation if X=0 is not within the range of observed data.
Applications
 Economics: Predicting GDP growth based on investment levels.

 Healthcare: Predicting patient outcomes based on treatment
variables.
 Marketing: Analyzing how advertising spending affects sales.
What is Multiple Linear Regression?

Multiple Linear Regression (MLR) is a statistical technique used to model
the relationship between one dependent variable and two or more
independent variables. It generalizes simple linear regression to account for
more than one predictor, helping you understand how multiple factors
simultaneously influence an outcome.
Model Equation
The equation for multiple linear regression is: Y=β0+β1X1+β2X2+⋯

+βnXn+ϵY = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + \cdots + \beta_n X_n +
\epsilonY=β0 +β1 X1 +β2 X2 +⋯+βn Xn +ϵ
Where:
YYY is the dependent variable.

X1,X2,…,XnX_1, X_2, \ldots, X_nX1 ,X2 ,…,Xn are the independent


β0\beta_0β0 is the intercept of the regression plane.

variables (predictors).
β1,β2,…,βn\beta_1, \beta_2, \ldots, \beta_nβ1 ,β2 ,…,βn are the



change in YYY for a one-unit change in each corresponding XiX_iXi .

coefficients (slopes) of the independent variables, indicating the
 ϵ\epsilonϵ is the error term (residual), representing the deviation of

the observed values from the predicted values.
Steps in Multiple Linear Regression
8. Formulate the Model:

 Decide which independent variables to include in the model
based on theory, prior research, or exploratory data analysis.
9. Estimate Parameters:
estimating the coefficients β\betaβ. OLS minimizes the sum of

 Ordinary Least Squares (OLS): The most common method for
values: minimize∑i=1n(Yi−(β^0+β^1Xi1+β^2Xi2+⋯
the squared differences between observed values and predicted
+β^nXin))2\text{minimize} \sum_{i=1}^n (Y_i - (\hat{\beta}_0 +

\hat{\beta}_1 X_{i1} + \hat{\beta}_2 X_{i2} + \cdots + \hat{\
beta}_n X_{in}))^2minimize∑i=1n (Yi −(β^ 0 +β^ 1 Xi1 +β^ 2
Xi2 +⋯+β^ n Xin ))2
10. Evaluate the Model:
 Residuals: Analyze the differences between observed values
and the values predicted by the model.
 R-squared (R2R^2R2): Measures the proportion of the

 Goodness of Fit:
variance in the dependent variable that is predictable from

the independent variables. It indicates how well the model
 Adjusted R-squared: Adjusts R2R^2R2 for the number of

explains the variability of the response data.
predictors in the model. It’s useful for comparing models

with different numbers of predictors.
11. Check Assumptions:
 Linearity: The relationship between the dependent variable and
each independent variable should be linear.
 Independence: Observations should be independent of each
other.
 Homoscedasticity: The residuals should have constant
variance at every level of the independent variables.
 Normality of Residuals: Residuals should be approximately
normally distributed, which is important for hypothesis testing
and constructing confidence intervals.
Coefficients (β\betaβ): Each coefficient represents the change

12. Interpret the Results:

in the dependent variable for a one-unit change in the
corresponding independent variable, holding all other predictors
Intercept (β0\beta_0β0 ): The value of YYY when all

constant.

independent variables are zero. Its interpretation may not always
be meaningful if zero is outside the range of the data.
CODE:
1. LINEAR REGRESSION
 Using Formula
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
loc = "/content/Salary_Data.csv"
df = pd.read_csv(loc)
X = df.iloc[:, 0]
y = df.iloc[:, 1]
mean_X = np.mean(X)
mean_y = np.mean(y)
n = len(X)
numer = 0
denom = 0
for i in range(n):
numer += (X[i] - mean_X) * (y[i] - mean_y)
denom += (X[i] - mean_X) ** 2
b1 = numer / denom
b0 = mean_y - (b1 * mean_X)
print("Intercept b0:", b0)

print("Slope b1:", b1)
plt.scatter(X, y, color='blue', label='Scatter Plot')

plt.plot(X, b0 + b1*X, color='red', label='Regression Line')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.title('Linear Regression Fit')
plt.legend()
plt.show()
 Using SkLearn
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score,

explained_variance_score
import sklearn.metrics as sm
loc = "/content/Salary_Data.csv"
df = pd.read_csv(loc)
#Print first few lines
df.head()
#check for missing values
print(df.isnull().sum())
#drop anmy rows with missing values
df.dropna(inplace=True)
x = df['YearsExperience']
y = df['Salary']
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,

random_state=42)
x_train=np.array(x_train).reshape(len(x_train),1)
x_test=np.array(x_test).reshape(len(x_test),1)
y_train=np.array(y_train).reshape(len(y_train),1)
y_test=np.array(y_test).reshape(len(y_test),1)
model=LinearRegression()
model.fit(x_train,y_train)
y_train_pred=model.predict(x_train)
plt.figure()
plt.scatter(x_train,y_train, color='blue', label='True Values')
plt.plot(x_train,y_train_pred, color='red', label='Prediction')
y_test_pred=model.predict(x_test)
plt.figure()
plt.scatter(x_test,y_test, color='green', label='True Values')
plt.plot(x_test,y_test_pred, color='black', label='Prediction')
plt.xlabel("years of Experience")
plt.ylabel("Salary")
print("Mean squared error =", round(mean_squared_error(y_test,

y_test_pred), 2))
print("Explained Variance score =",

round(explained_variance_score(y_test, y_test_pred), 2))
print("R2 score =", round(r2_score(y_test, y_test_pred), 2))
2. MULTIPLE LINEAR REGRESSION
import numpy as np
from sklearn.linear_model import LinearRegression
np.random.seed(0)
X1 = 2 * np.random.rand(100, 1)
X2 = 3 * np.random.rand(100, 1)
X = np.hstack((X1, X2))
y = 4 + 3*X1 + 2*X2 + np.random.randn(100, 1)
model_X1 = LinearRegression()
model_X1.fit(X1, y)
model_X2 = LinearRegression()
model_X2.fit(X2, y)
y_pred_X1 = model_X1.predict(X1)
y_pred_X2 = model_X2.predict(X2)
plt.figure(figsize=(10, 6))
plt.scatter(X1, y, c='b', label='Actual data (X1)')

plt.plot(X1, y_pred_X1, color='r', label='Regression line (X1)')
plt.scatter(X2, y, c='g', label='Actual data (X2)')

plt.plot(X2, y_pred_X2, color='y', label='Regression line (X2)')
plt.xlabel('X1 and X2')

plt.ylabel('Y')
plt.title('Multiple Linear Regression')
plt.legend()
plt.show()
OUTPUT:
1. LINEAR REGRESSION:
 USING FORMULA:
 USING SKLEARN MODEL :

2. MULTIPLE LINEAR REGRESSION:
CONCLUSION:
In this experiment we learnt how to implement linear and multiple regression

Exp 1 121a1047 Lavanya Kurup ML

Uploaded by

Copyright:

Available Formats

Exp 1 121a1047 Lavanya Kurup ML

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Exp 1 121a1047 Lavanya Kurup ML

Uploaded by

Copyright:

Available Formats

NAME: LAVANYA KURUP

EXPERIMENT 1: TO IMPLEMENT LINEAR AND

AIM: In this experiment we will learn to implement linear and

What is Linear Regression?

1. Dependent and Independent Variables:

This line is described by the linear equation: Y=β0+β1X+ϵY = \

beta_0 + \beta_1 X + \epsilonY=β0 +β1 X+ϵ

 β1\beta_1β1 : The slope of the line, representing the change in Y

for a one-unit change in X.

Steps in Linear Regression

3. Formulate the Model:

simple linear regression, the model is: Y=β0+β1X+ϵY = \beta_0

+ \beta_1 X + \epsilonY=β0 +β1 X+ϵ

for estimating β0\beta_0β0 and β1\beta_1β1 . It aims to

minimize the sum of the squared differences between observed

 The estimated parameters (β^0\hat{\beta}_0β^ 0 and β^1\

hat{\beta}_1β^ 1 ) are found by solving: minimize∑i=1n(Yi−

values (Yi−Y^iY_i - \hat{Y}_iYi −Y^i ).

 R-squared (R2R^2R2): The proportion of variance in the

dependent variable that is predictable from the

 Adjusted R-squared: Adjusts R2R^2R2 for the number of

predictors, providing a more accurate measure for models

Intercept (β0\beta_0β0 ): Indicates the value of Y when X is

 Economics: Predicting GDP growth based on investment levels.

What is Multiple Linear Regression?

The equation for multiple linear regression is: Y=β0+β1X1+β2X2+⋯

YYY is the dependent variable.

β0\beta_0β0 is the intercept of the regression plane.

β1,β2,…,βn\beta_1, \beta_2, \ldots, \beta_nβ1 ,β2 ,…,βn are the

change in YYY for a one-unit change in each corresponding XiX_iXi .

 ϵ\epsilonϵ is the error term (residual), representing the deviation of

8. Formulate the Model:

estimating the coefficients β\betaβ. OLS minimizes the sum of

+β^nXin))2\text{minimize} \sum_{i=1}^n (Y_i - (\hat{\beta}_0 +

 R-squared (R2R^2R2): Measures the proportion of the

variance in the dependent variable that is predictable from

 Adjusted R-squared: Adjusts R2R^2R2 for the number of

predictors in the model. It’s useful for comparing models

Coefficients (β\betaβ): Each coefficient represents the change

Intercept (β0\beta_0β0 ): The value of YYY when all

print("Intercept b0:", b0)

plt.scatter(X, y, color='blue', label='Scatter Plot')

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression

from sklearn.metrics import mean_squared_error, r2_score,

#Print first few lines

#check for missing values

#drop anmy rows with missing values

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2,

plt.scatter(x_train,y_train, color='blue', label='True Values')

plt.plot(x_train,y_train_pred, color='red', label='Prediction')

plt.scatter(x_test,y_test, color='green', label='True Values')

plt.plot(x_test,y_test_pred, color='black', label='Prediction')

print("Mean squared error =", round(mean_squared_error(y_test,

print("Explained Variance score =",

print("R2 score =", round(r2_score(y_test, y_test_pred), 2))

2. MULTIPLE LINEAR REGRESSION

plt.scatter(X1, y, c='b', label='Actual data (X1)')