Skip to content

Commit a6c38db

Browse files
authored
Merge pull request animator#353 from Yogeshkarma/main
Added Regression in Machine Learning
2 parents 3b4692c + d51207d commit a6c38db

File tree

2 files changed

+172
-1
lines changed

2 files changed

+172
-1
lines changed
Lines changed: 171 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,171 @@
1+
# Regression
2+
3+
4+
* Regression is a supervised machine learning technique which is used to predict continuous values.
5+
6+
7+
> Now, Supervised learning is a category of machine learning that uses labeled datasets to train algorithms to predict outcomes and recognize patterns.
8+
9+
* Regression is a statistical method used to model the relationship between a dependent variable (often denoted as 'y') and one or more independent variables (often denoted as 'x'). The goal of regression analysis is to understand how the dependent variable changes as the independent variables change.
10+
# Types Of Regression
11+
12+
1. Linear Regression
13+
2. Polynomial Regression
14+
3. Stepwise Regression
15+
4. Decision Tree Regression
16+
5. Random Forest Regression
17+
6. Ridge Regression
18+
7. Lasso Regression
19+
8. ElasticNet Regression
20+
9. Bayesian Linear Regression
21+
10. Support Vector Regression
22+
23+
But, we'll first start with Linear Regression
24+
# Linear Regression
25+
26+
* Linear regression is a fundamental statistical method used to model the relationship between a dependent variable (often denoted as
27+
𝑌) and one or more independent variables (often denoted as
28+
𝑋). The relationship is assumed to be linear, meaning that changes in the independent variables are associated with changes in the dependent variable in a straight-line fashion.
29+
30+
The basic form of linear regression for a single independent variable is:
31+
32+
**𝑌=𝛽0+𝛽1𝑋+𝜖**
33+
34+
Where:
35+
36+
* Y is the dependent variable.
37+
* X is the independent variable.
38+
* 𝛽0 is the intercept, representing the value of Y when X is zero
39+
* 𝛽1 is the slope coefficient, representing the change in Y for a one-unit change in X
40+
* ϵ is the error term, representing the variability in Y that is not explained by the linear relationship with X.
41+
42+
# Basic Code of Linear Regression
43+
44+
* This line imports the numpy library, which is widely used for numerical operations in Python. We use np as an alias for numpy, making it easier to reference functions and objects from the library.
45+
```
46+
import numpy as np
47+
```
48+
49+
* This line imports the LinearRegression class from the linear_model module of the scikit-learn library.scikit-learn is a powerful library for machine learning tasks in Python, and LinearRegression is a class provided by it for linear regression.
50+
```
51+
from sklearn.linear_model import LinearRegression
52+
```
53+
* This line creates a NumPy array X containing the independent variable values. In this example, we have a simple one-dimensional array representing the independent variable. The reshape(-1, 1) method reshapes the array into a column vector, necessary for use with scikit-learn
54+
55+
```
56+
X = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)
57+
```
58+
* This line creates a NumPy array Y containing the corresponding dependent variable values. These are the observed values of the dependent variable corresponding to the independent variable values in X.
59+
```
60+
Y = np.array([2, 4, 5, 8, 5])
61+
```
62+
63+
* This line creates an instance of the LinearRegression class, which represents the linear regression model. We'll use this object to train the model and make predictions.
64+
```
65+
model = LinearRegression()
66+
```
67+
68+
* This line fits the linear regression model to the data. The fit() method takes two arguments: the independent variable (X) and the dependent variable (Y). This method estimates the coefficients of the linear regression equation that best fit the given data.
69+
```
70+
model.fit(X, Y)
71+
```
72+
* These lines print out the intercept (beta_0) and coefficient (beta_1) of the linear regression model. model.intercept_ gives the intercept value, and model.coef_ gives an array of coefficients, where model.coef_[0] corresponds to the coefficient of the first independent variable (in this case, there's only one).
73+
```
74+
print("Intercept:", model.intercept_)
75+
print("Coefficient:", model.coef_[0])
76+
```
77+
78+
* These lines demonstrate how to use the trained model to make predictions for new data.
79+
* We create a new NumPy array new_data containing the values of the independent variable for which we want to predict the dependent variable values.
80+
* We then use the predict() method of the model to obtain the predictions for these new data points. Finally, we print out the predicted values.
81+
```
82+
new_data = np.array([[6], [7]])
83+
predictions = model.predict(new_data)
84+
print("Predictions:", predictions)
85+
```
86+
# Assumptions of Linear Regression
87+
88+
# Linearity:
89+
90+
* To assess the linearity assumption, we can visually inspect a scatter plot of the observed values versus the predicted values.
91+
* If the relationship between them appears linear, it suggests that the linearity assumption is reasonable.
92+
```
93+
import matplotlib.pyplot as plt
94+
predictions = model.predict(X)
95+
plt.scatter(predictions,Y)
96+
plt.xlabel("Predicted Values")
97+
plt.ylabel("Observed Values")
98+
plt.title("Linearity Check: Observed vs Predicted")
99+
plt.show()
100+
```
101+
# Homoscedasticity:
102+
* Homoscedasticity refers to the constant variance of the residuals across all levels of the independent variable(s). We can visually inspect a plot of residuals versus predicted values to check for homoscedasticity.
103+
```
104+
residuals = Y - predictions
105+
plt.scatter(predictions, residuals)
106+
plt.xlabel("Predicted Values")
107+
plt.ylabel("Residuals")
108+
plt.title("Homoscedasticity Check: Residuals vs Predicted Values")
109+
plt.axhline(y=0, color='red', linestyle='--') # Add horizontal line at y=0
110+
plt.show()
111+
112+
```
113+
# Normality of Residuals:
114+
* To assess the normality of residuals, we can visually inspect a histogram or a Q-Q plot of the residuals.
115+
```
116+
import seaborn as sns
117+
118+
sns.histplot(residuals, kde=True)
119+
plt.xlabel("Residuals")
120+
plt.ylabel("Frequency")
121+
plt.title("Normality of Residuals: Histogram")
122+
plt.show()
123+
124+
import scipy.stats as stats
125+
126+
stats.probplot(residuals, dist="norm", plot=plt)
127+
plt.title("Normal Q-Q Plot")
128+
plt.show()
129+
130+
```
131+
# Metrics for Regression
132+
133+
134+
# Mean Absolute Error (MAE)
135+
136+
* MAE measures the average magnitude of the errors in a set of predictions, without considering their direction. It is the average of the absolute differences between predicted and actual values.
137+
```
138+
from sklearn.metrics import mean_absolute_error
139+
140+
mae = mean_absolute_error(Y, predictions)
141+
print(f"Mean Absolute Error (MAE): {mae}")
142+
143+
```
144+
# Mean Squared Error (MSE)
145+
146+
* MSE measures the average of the squares of the errors. It gives more weight to larger errors, making it sensitive to outliers.
147+
```
148+
from sklearn.metrics import mean_squared_error
149+
150+
mse = mean_squared_error(Y, predictions)
151+
print(f"Mean Squared Error (MSE): {mse}")
152+
```
153+
# Root Mean Squared Error (RMSE)
154+
* RMSE is the square root of the MSE. It provides an error metric that is in the same units as the dependent variable, making it more interpretable.
155+
```
156+
rmse = np.sqrt(mse)
157+
print(f"Root Mean Squared Error (RMSE): {rmse}")
158+
159+
```
160+
# R-squared (Coefficient of Determination)
161+
* R-squared measures the proportion of the variance in the dependent variable that is predictable from the independent variables. It ranges from 0 to 1, where 1 indicates a perfect fit.
162+
```
163+
from sklearn.metrics import r2_score
164+
165+
r2 = r2_score(Y, predictions)
166+
print(f"R-squared (R^2): {r2}")
167+
```
168+
169+
> In this tutorial, The sample dataset is there for learning purpose only
170+
171+

contrib/machine-learning/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
# List of sections
22

3-
- [Section title](filename.md)
3+
- [Regression in Machine Learning](Regression.md)

0 commit comments

Comments
 (0)