MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
MACHINE LEARNING LAB WORD 12-1-2025. DOCUMENT
# Measures of Dispersion
variance = statistics.variance(data)
std_deviation = statistics.stdev(data)
# Example Data
data = [10, 12, 12, 14, 16, 18, 18, 20, 20, 22]
OUTPUT :
Mean: 16.2
Median: 16.0
Mode: 12
Variance: 16.666666666666668
Standard Deviation: 4.08248290463863
Explanation:
1. Mean: The average of the numbers.
2. Median: The middle number when the data is sorted. If the list has an even number of elements, the median is
the average of the two middle numbers.
3. Mode: The number that appears most frequently in the data.
4. Variance: Measures the spread of the numbers in the dataset. It's the average of the squared differences from
the mean.
5. Standard Deviation: The square root of the variance, representing the spread of the dataset in the same units
as the data
2) Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy
CODE :
Statistics Library
The statistics module in Python provides functions for basic statistical operations.
Common Functions:
data = [1, 2, 2, 3, 4, 5, 5, 5]
mean_val = stats.mean(data)
median_val = stats.median(data)
mode_val = stats.mode(data)
OUTPUT :
Mean: 3.375, Median: 3.5, Mode: 5
Math Library
The math module provides mathematical functions such as trigonometric functions, logarithms, and
constants.
Common Functions:
import math
x = 16
y=2
sqrt_val = math.sqrt(x)
power_val = math.pow(x, y)
log_val = math.log(x, 2)
Numpy Library
numpy (Numerical Python) is a powerful library for numerical computing. It is particularly useful for
handling large datasets, multidimensional arrays, and performing operations on them.
Common Functions:
array(data): Converts a list or other data structure into a NumPy array.
mean(arr): Returns the mean of a NumPy array.
std(arr): Returns the standard deviation of a NumPy array.
var(arr): Returns the variance of a NumPy array.
sum(arr): Computes the sum of all elements in the array.
reshape(): Reshapes the array.
linspace(start, stop, num): Returns num evenly spaced values between start and stop.
import numpy as np
OUTPUT :
Mean: 3.0, Standard Deviation: 1.4142135623730951, Sum: 15
Scipy Library
scipy is a scientific computing library that builds on numpy and provides additional functionality for
optimization, integration, interpolation, and other advanced mathematical and statistical functions.
Common Functions:
OUTPUT:
T-statistic: -1.0, P-value: 0.34659350708733416
3)Study of Python Libraries for ML application such as Pandas and Matplotlib
Pandas Library
Pandas is one of the most popular Python libraries for data manipulation and analysis. It is built on top of
NumPy and provides efficient, easy-to-use data structures (like DataFrames) that allow you to handle large
datasets for data preprocessing, exploration, and cleaning in ML applications.
import pandas as pd
# Example: Loading and analyzing a dataset
data = pd.DataFrame({
'Age': [22, 25, 27, 30, 22],
'Salary': [50000, 60000, 65000, 70000, 55000]
})
OUTPUT :
Age Salary
count 5.000000 5.00000
mean 25.200000 60000.00000
std 3.420526 7905.69415
min 22.000000 50000.00000
25% 22.000000 55000.00000
50% 25.000000 60000.00000
75% 27.000000 65000.00000
max 30.000000 70000.00000
Correlation Matrix:
Age Salary
Age 1.000000 0.970725
Salary 0.970725 1.000000
Matplotlib Library
Matplotlib is a powerful plotting library used for data visualization. In machine learning, visualizing data
and model performance (e.g., through plots, graphs, and charts) is crucial for understanding patterns,
identifying issues, and presenting findings.
Basic Plotting: Line plots, bar plots, scatter plots, histograms, etc.
Customization: Extensive options for customizing labels, titles, colors, legends, and axes.
Multiple Plots: Support for creating subplots and combining multiple graphs.
Interactive Plots: Integration with interactive environments like Jupyter Notebooks.
Save Plots: Save plots as images in various formats (e.g., PNG, JPEG, SVG).
3D Plotting: Capabilities for creating 3D graphs.
Styling: Use of styles for consistent looks across plots (e.g., dark background, gridlines).
OUTPUT:
Use Case in ML:
Data Visualization: Understanding the distribution of data using histograms, scatter plots, and box plots.
Model Evaluation: Visualizing metrics like accuracy, loss curves, confusion matrices, ROC curves, etc.
Exploratory Data Analysis (EDA): Visualizing relationships between variables, checking for patterns, and
spotting outliers.
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, confusion_matrix
import seaborn as sns
plt.figure(figsize=(8, 6))
sns.boxplot(data=data.drop('target', axis=1))
plt.title("Boxplot of Features")
plt.show()
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
CODE:
Simple Linear Regression
In Simple Linear Regression, we try to model the relationship between two variables xxx and yyy by
fitting a straight line to the data. The equation of the line is:
Where:
The goal is to find the values of mmm and bbb that minimize the difference between the predicted and
actual values (i.e., minimize the error).
We use the least squares method to compute the slope mmm and the intercept bbb.
Formula:
Slope mmm is given by: m=n⋅∑xy−∑x⋅∑yn⋅∑x2−(∑x)2m = \frac{n \cdot \sum{xy} - \sum{x} \cdot \
sum{y}}{n \cdot \sum{x^2} - (\sum{x})^2}m=n⋅∑x2−(∑x)2n⋅∑xy−∑x⋅∑y
Intercept bbb is given by: b=∑y−m⋅∑xnb = \frac{\sum{y} - m \cdot \sum{x}}{n}b=n∑y−m ⋅∑x
Where:
Explanation:
1. Data: We have an array x (independent variable) and an array y (dependent variable).
2. simple_linear_regression function: This function calculates the slope mmm and the intercept bbb using the
formulas.
3. predict function: This function predicts the yyy-values for given xxx-values using the learned model.
4. Matplotlib Plot: We use matplotlib to visualize the data points and the fitted line.
5 ) Implementation of Multiple Linear Regression for House Price Prediction using
sklearn
Where:
Steps:
1. Prepare the data: This includes multiple features like square footage, number of bedrooms,
location, etc.
2. Train the model: Use sklearn.linear_model.LinearRegression to train the model.
3. Predict house prices: Use the trained model to predict prices based on new data.
4. Evaluate the model: We can use metrics like R² (coefficient of determination) to evaluate the model.
# Model Evaluation
print(f"Intercept (b0): {model.intercept_}")
print(f"Coefficients (b1, b2, b3, b4): {model.coef_}")
Data Preparation:
We create a dataset with multiple features like Square_Feet, Num_Bedrooms, Num_Bathrooms,
Age_of_House, and the target variable Price (house price).
Train/Test Split:
We split the data into training and testing sets using train_test_split() from sklearn. This is
crucial for evaluating the model's performance on unseen data.
After fitting the model, we predict house prices for the test data ( X_test).
Model Evaluation:
We print the intercept and coefficients of the model to understand how each feature affects the house
price.
The Mean Squared Error (MSE) is computed using mean_squared_error(), which gives us an idea
of how much the predicted prices deviate from the actual prices.
The R-squared value (coefficient of determination) is calculated using r2_score(). This tells us how
well the model explains the variance in the data (a value closer to 1 is better).
Visualization:
We plot the actual house prices (y_test) vs predicted house prices (y_pred) using a scatter plot to
visually evaluate the model's predictions.
Interpretation:
Intercept (b0): This is the predicted house price when all features are zero. It should be interpreted as the
baseline house price.
Coefficients: Each coefficient corresponds to the effect of each feature on the house price. For instance:
o For every 1 square foot increase in the house size, the price increases by $50.
o For every additional bedroom, the price increases by $35,000, and so on.
R-squared: A value of 0.91 means the model explains 91% of the variance in house prices, which is quite
good.
Mean Squared Error: This value tells us the average squared difference between predicted and actual prices.
A lower value indicates better performance.
Conclusion:
This Python program implements Multiple Linear Regression using scikit-learn to predict house prices
based on multiple features. You can extend this approach to larger datasets or more complex models for
real-world applications. The evaluation metrics (like R-squared and MSE) help assess the model's accuracy
and how well it fits the data.
6 ) Implementation of Decision tree using sklearn and its parameter tuning
A Decision Tree is a supervised machine learning algorithm that is used for both classification and
regression tasks. It splits the data into subsets based on the most significant features and continues splitting
until the data in each subset are as homogenous as possible.
Steps:
1. Create a Decision Tree model using DecisionTreeRegressor.
2. Train the model using a dataset (we will use a simple dataset here).
3. Parameter Tuning using GridSearchCV to optimize hyperparameters like max_depth,
min_samples_split, min_samples_leaf, etc.
4. Model Evaluation: Evaluate performance using metrics like Mean Squared Error (MSE) and R-squared.
CODE :
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, classification_report
from sklearn import tree
import matplotlib.pyplot as plt
OUTPUT:
Accuracy of the Decision Tree model: 100.00%
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Explanation:
Data Preparation:
We create a dataset that includes multiple features of houses like Square_Feet, Num_Bedrooms,
Num_Bathrooms, and Age_of_House, along with the target variable Price (house price).
This data is split into training and testing sets using train_test_split.
Training the Model:
After training the model, we predict the house prices on the test data ( X_test) and evaluate the model
using Mean Squared Error (MSE) and R-squared.
We use GridSearchCV to perform an exhaustive search over the parameter grid and evaluate the model
with 5-fold cross-validation.
The best hyperparameters and the best score are printed, and the model with the best parameters is used
for prediction and evaluation.
Model Evaluation:
After tuning, we evaluate the performance of the best model (selected from the grid search) on the test
set.
We also plot Actual vs Predicted House Prices to visually inspect the model’s predictions.
Visual Output:
The scatter plots compare the actual and predicted house prices. If the model is well-tuned, the points will be
closer to a 45-degree line (where predicted prices equal actual prices).
Conclusion:
This implementation shows how to use a Decision Tree Regressor to predict house prices, and how to tune
the model’s hyperparameters using GridSearchCV for optimal performance.
By tuning the hyperparameters, you can improve the model’s ability to generalize and reduce overfitting.
The final evaluation metrics (like R-squared and MSE) help us assess the quality of the model.
For a given input, find the K closest data points (neighbors) in the training set based on a distance metric
(usually Euclidean distance).
Predict the output by taking the average of the outputs (house prices) of those K nearest neighbors.
Steps to Implement:
1. Prepare the dataset: Define features (X) and the target variable (y).
2. Split the data: Use train_test_split to divide the dataset into training and test sets.
3. Train the KNN model: Use KNeighborsRegressor from scikit-learn.
4. Make predictions: Use the trained model to predict house prices.
5. Evaluate the model: Measure the model’s performance using Mean Squared Error (MSE) and R-squared
(R2).
6. Tune the model: Try different values of K (number of neighbors) and find the best one.
CODE :
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score, classification_report
import matplotlib.pyplot as plt
# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
for k in k_values:
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
accuracies.append(accuracy_score(y_test, y_pred))
# Best value of k
best_k = k_values[np.argmax(accuracies)]
print(f"\nBest value of k: {best_k} with accuracy: {max(accuracies) * 100:.2f}%")
OUTPUT :
Accuracy of the KNN model (k=5): 100.00%
Classification Report:
precision recall f1-score support
0 1.00 1.00 1.00 19
1 1.00 1.00 1.00 13
2 1.00 1.00 1.00 13
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Data Preparation:
1. A simple dataset is created for house prices with features like Square_Feet, Num_Bedrooms,
Num_Bathrooms, and Age_of_House.
2. We separate the features (X) from the target variable (y).
Train/Test Split:
1. The dataset is split into training and testing sets using train_test_split() to ensure the model
is evaluated on unseen data.
1. We use the trained model to predict the house prices on the test data ( X_test).
2. We evaluate the model's performance using Mean Squared Error (MSE) and R-squared (R2)
metrics.
CODE :
import numpy as np
import pandas as pd
iris = load_iris()
X = iris.data # Features (sepal length, sepal width, petal length, petal width)
log_reg.fit(X_train, y_train)
y_pred = log_reg.predict(X_test)
# Evaluate accuracy
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
X_train_2d = X_train[:, :2] # Take only the first two features (sepal length and sepal width)
log_reg_2d = LogisticRegression(max_iter=200)
log_reg_2d.fit(X_train_2d, y_train)
Z = log_reg_2d.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)
plt.xlabel('Sepal Length')
plt.ylabel('Sepal Width')
plt.show()
OUTPUT :
Classification Report:
precision recall f1-score support
accuracy 1.00 45
macro avg 1.00 1.00 1.00 45
weighted avg 1.00 1.00 1.00 45
Explanation:
Data Preparation:
1. We use the Iris dataset from sklearn.datasets. The Iris dataset contains 3 classes of flowers, but
we convert it into a binary classification task by keeping only class 0 and class 1.
2. The dataset consists of four features: sepal length, sepal width, petal length, and petal width. We use
the first two features (sepal length and sepal width) for easier visualization.
Model Training:
Prediction:
1. After training the model, we use it to predict the target variable ( y_test) for the test set (X_test)
with the predict() method.
Evaluation:
1. We calculate the Accuracy using accuracy_score() and print the Confusion Matrix and
Classification Report using confusion_matrix() and classification_report(),
respectively.
1. Confusion Matrix: Shows the counts of true positives, true negatives, false positives, and
false negatives.
2. Classification Report: Provides precision, recall, F1-score, and support for each class.
Visualization:
1. We plot a heatmap of the confusion matrix using seaborn.heatmap() to visually evaluate the
classification performance.
2. We also visualize the decision boundary of the logistic regression model using just two features
(sepal length and sepal width) for simplicity. The decision boundary separates the two classes.
The accuracy will show how well the model performs on the test data (1.0 means perfect accuracy).
Key Concepts:
Logistic Function (Sigmoid): Logistic regression uses a sigmoid function to map predicted values
(log odds) to probabilities. The output lies between 0 and 1, and we classify samples based on a
threshold (commonly 0.5).
Confusion Matrix: The confusion matrix helps us understand how well the model is distinguishing
between classes. The diagonal elements represent the correct predictions, while off-diagonal
elements represent misclassifications.
Conclusion:
This implementation of Logistic Regression demonstrates the process of training a logistic regression
model for binary classification tasks using scikit-learn. We evaluated the model using common
classification metrics (accuracy, confusion matrix, classification report), and visualized the decision
boundary for two features.
8) Implementation of Logistic Regression using sklearn
# Model Evaluation
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")
# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print(f"Confusion Matrix:\n{conf_matrix}")
# Classification Report
class_report = classification_report(y_test, y_pred)
print(f"Classification Report:\n{class_report}")
# Visualizing the decision boundary for two features (e.g., sepal length and sepal width)
plt.figure(figsize=(10, 6))
# Fit logistic regression model again for only two features for easy visualization
logreg.fit(X_train[['sepal length (cm)', 'sepal width (cm)']], y_train)
OUTPUT :
Accuracy: 1.0
Confusion Matrix:
[[15 0]
[ 0 15]]
Classification Report:
precision recall f1-score support
accuracy 1.00 30
macro avg 1.00 1.00 1.00 30
weighted avg 1.00 1.00 1.00 30
Explanation:
Data Preparation:
1. The Iris dataset is loaded from sklearn.datasets. This dataset contains 150 samples of 3
different species of Iris flowers. However, to convert it to a binary classification problem, we filter
the data to only include class 0 and class 1.
2. We separate the feature columns (sepal length, sepal width, petal length, and petal width) into the
feature matrix X and the target column (target with classes 0 and 1) into the variable y.
Model Training:
Prediction:
1. After training, we use the model to predict the class labels for the test set ( X_test).
Evaluation:
2.
1. Accuracy: We calculate the accuracy of the model by comparing the predicted labels ( y_pred) with
the actual test labels (y_test).
2. Confusion Matrix: We compute and print the confusion matrix, which shows the counts of true
positives, true negatives, false positives, and false negatives.
3. Classification Report: We generate a classification report which includes precision, recall, F1-score,
and support for each class.
Visualization:
1. We visualize the Confusion Matrix using a heatmap with seaborn.
2. For better understanding, we visualize the decision boundary of the logistic regression model by
plotting it using only two features (sepal length and sepal width). This shows how the logistic
regression model separates the two classes.
Key Concepts:
Logistic Function (Sigmoid): Logistic regression uses the logistic function (or sigmoid function) to
convert the output of a linear model into probabilities. The output lies between 0 and 1, and we
classify the sample based on a threshold (commonly 0.5).
Confusion Matrix: The confusion matrix is a powerful tool for evaluating the performance of a
classification model. It shows the number of true positives (correctly predicted class 1), true
negatives (correctly predicted class 0), false positives (incorrectly predicted as class 1), and false
negatives (incorrectly predicted as class 0).
Conclusion:
This implementation of Logistic Regression demonstrates how to train and evaluate a binary classifier
using scikit-learn. We used the Iris dataset to show how to convert a multi-class classification problem
to a binary classification problem, and we visualized the results using confusion matrix heatmaps and
decision boundaries.
Logistic Regression is a simple and interpretable algorithm for binary classification problems, and scikit-
learn makes it easy to implement and evaluate.
Python Implementation
In this example, we will use the Iris dataset and perform K-Means clustering to group the data into K
clusters. Although the dataset has three classes, we will let the algorithm find clusters without using any
class labels.
CODE :
# Importing necessary libraries
# Plot the clusters
plt.figure(figsize=(8, 6))
# Features (X)
X = df
# Apply KMeans clustering to the data (we choose K=3 for three clusters)
kmeans = KMeans(n_clusters=3, random_state=42)
# Get the cluster labels (which cluster each data point belongs to)
labels = kmeans.labels_
# Print out the first few rows with the cluster labels
print(df.head())
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
plt.figure(figsize=(8, 6))
plt.scatter(X_2d[:, 0], X_2d[:, 1], c=labels, cmap='viridis', s=50, alpha=0.6)
plt.scatter(centroids[:, 0], centroids[:, 1], c='red', marker='X', s=200,
label='Centroids')
plt.title('K-Means Clustering (2D Visualization)')
plt.xlabel('Sepal Length (scaled)')
plt.ylabel('Sepal Width (scaled)')
plt.legend()
plt.show()
Explanation:
Dataset:
1. We use the Iris dataset from sklearn.datasets. The Iris dataset consists of 150 data points, each
with four features: sepal length, sepal width, petal length, and petal width.
2. The dataset has three classes of Iris flowers, but we will use K-Means clustering to group the data
into clusters based on the features without any labels.
K-Means Clustering:
1. We initialize the KMeans model with n_clusters=3 because we assume there are 3 clusters in the
dataset (based on the known structure of the Iris dataset).
2. We fit the model using kmeans.fit(X) where X is the feature matrix (sepal and petal dimensions).
Cluster Centroids:
1. After fitting the model, we can access the cluster centroids (the mean position of all the points in each
cluster) using kmeans.cluster_centers_.
Cluster Labels:
1. Each data point is assigned a cluster label using kmeans.labels_. This gives us the cluster
assignment for each data point in the dataset.
PCA for Dimensionality Reduction:
1. Since the Iris dataset has 4 features, we reduce the dimensionality to 2D using Principal Component
Analysis (PCA). This allows us to visualize the clusters in 2D space.
2. The data points are then plotted using plt.scatter(), and we use different colors to represent the
different clusters.
1. The data points are displayed in a scatter plot, where each point is colored based on its assigned
cluster.
2. The centroids (red "X" markers) are plotted to show the center of each cluster.
Evaluation:
1. We print the inertia value, which measures how well the model has fit the data. Inertia is the sum of
squared distances from each point to its assigned cluster center. A lower inertia value indicates better
clustering.
Key Concepts:
K-Means Algorithm:
o Initialization: Randomly select K data points as initial cluster centroids.
o Assignment Step: Assign each data point to the closest centroid.
o Update Step: Recalculate the centroids by averaging the points in each cluster.
o Repeat: Repeat the assignment and update steps until convergence (i.e., centroids don't change
significantly).
Choosing K (Number of Clusters):
o The number of clusters (K) is a hyperparameter. It can be chosen based on prior knowledge, domain
expertise, or methods like the Elbow Method or Silhouette Score.
o The Elbow Method involves plotting the inertia (sum of squared distances to centroids) for different
values of K and looking for an "elbow" where the inertia decreases at a slower rate. The
corresponding K is often a good choice.
Conclusion:
This implementation of K-Means Clustering demonstrates how to cluster data into groups based on feature
similarities using the KMeans algorithm from scikit-learn. We visualized the clusters in 2D using PCA,
which helps in understanding the clustering structure. K-Means is a powerful and simple algorithm for
unsupervised learning and can be applied to various types of data for pattern discovery and data
segmentation.
10 ) Performance analysis of Classification Algorithms on a specific dataset (Mini
Project)
# Step 2: Split the data into training and testing sets (70% train, 30% test)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)
# Step 3: Standardize the features (important for some models like SVM, KNN)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Make predictions
y_pred = model.predict(X_test_scaled)
# Evaluate metrics
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
# Store results
results[model_name] = {
'accuracy': accuracy,
'precision': precision,
'recall': recall,
'f1_score': f1,
'confusion_matrix': confusion_matrix(y_test, y_pred),
'model': model
}
# Confusion Matrix
cm = metrics['confusion_matrix']
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
display_labels=data.target_names)
disp.plot(ax=ax, cmap='Blues')
ax.set_title(f"{model_name} - Confusion Matrix")
# ROC Curve
fpr, tpr, thresholds = roc_curve(y_test,
metrics['model'].predict_proba(X_test_scaled)[:, 1])
roc_auc = auc(fpr, tpr)
ax.plot(fpr, tpr, color='blue', lw=2, label=f'AUC = {roc_auc:.2f}')
ax.plot([0, 1], [0, 1], color='gray', lw=2, linestyle='--')
ax.set_xlabel('False Positive Rate')
ax.set_ylabel('True Positive Rate')
ax.set_title(f"{model_name} - ROC Curve")
ax.legend(loc='lower right')
plt.tight_layout()
plt.show()
1. The Iris dataset is loaded using load_iris() from sklearn.datasets. The features (sepal
length, sepal width, petal length, and petal width) are stored in X, and the target (species of Iris) is
stored in y.
1. We split the dataset into training and testing sets using train_test_split(). 70% of the data is
used for training, and 30% is used for testing.
Initializing Models:
1. Logistic Regression
2. K-Nearest Neighbors (KNN)
3. Decision Tree Classifier
4. Support Vector Machine (SVM)
5. Random Forest Classifier
1. For each model, we train it using the fit() method on the training set and predict the test set using
the predict() method.
2. We calculate performance metrics such as accuracy, precision, recall, and F1-score using functions
from sklearn.metrics. The metrics are stored in dictionaries for easy comparison.
3. We print a classification report and confusion matrix for each model to better understand the
model's performance.
Performance Comparison:
1. We create a DataFrame to store and display the performance metrics for each model.
2. A bar chart is plotted to visually compare the performance of the models.
:
..