0% found this document useful (0 votes)

1 views31 pages

BAI3552 DataScienceWithPython

The document is a lab manual for a Data Science course at Babu Banarasi Das University, detailing various programming implementations using Python. It covers topics such as reading different data formats, basic statistics, K-means clustering, K-nearest neighbors, association rules, and regression techniques. Each section includes code examples and explanations for practical applications in data science.

Uploaded by

botalpha50

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views31 pages

BAI3552 DataScienceWithPython

Uploaded by

botalpha50

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

BABU BANARSI DAS UNIVERSITY

LUCKNOW U.P.

Lab Manual
on
Data Science with Python
(BAI3552)
For
B.Tech3rdYear(Sem. 5th)

Session(2024-
25)
Session(2024-25)

B.Tech CS33 and CS34

Page 1 of 28
Babu Banarasi Das University

Subject:DS Lab(BAI3552) Program:B.Tech.CSAI III Year(Sem-V)

INDEX
Sr. No. Title Page No.
1. 3-4
Implementation of a program for reading of different types of data sets
(.txt, .csv) from web and disk and writing in file in specific disk location.
2. 5-6
Implementation of a program for reading of EXCEL and XML data sheets.

3. 7-9
Implementation of Basic Statistics functions and performs visualization.

4. 10-12
Implementation of K-means Clustering and K-nearest Neighbor algorithm.

5. 13-14
Implementation of Association Rules.

6. 15-17
Implementation of Linear Regression and Logistic Regression.

7. 18-19
Implementation of Naive Bayesian Classifier.

8. 20-22
Implementation of Decision Trees.

9. 23-24
Implementation of Random Forest.

10. 25-26
Implementation of Principal component analysis.

11. 27-28
Implementation of Singular Value Decomposition.

Page 2 of 28
Program No. : 1
Implementation of a program for reading of different types of data sets (.txt, .csv) from
web and disk and writing in file in specific disk location.

#Step 1: Import Necessary Libraries

import os
import requests
import pandas as pd
#Step 2: Define Functions to Handle Different File Types
def read_txt_from_disk(file_path):
with open(file_path, 'r') as file:
data = file.read()
return data
def read_csv_from_disk(file_path):
df = pd.read_csv(file_path)
return df
def read_txt_from_web(url):
response = requests.get(url)
response.raise_for_status() # Ensure we got a valid response
return response.text
def read_csv_from_web(url):
df = pd.read_csv(url)
return df
#Step 3: Write Data to a Specific Disk Location
def write_txt_to_disk(data, output_path):
with open(output_path, 'w') as file:
file.write(data)
def write_csv_to_disk(df, output_path):
df.to_csv(output_path, index=False)
#Step 4: Implement a Main Function to Combine All Operations
def process_file(file_type, source, destination):
# Read the data based on file type and source type
Page 3 of 28
if file_type == 'txt':
if source.startswith('http'):
data = read_txt_from_web(source)
else:
data = read_txt_from_disk(source)
# Write the data to the destination
write_txt_to_disk(data, destination)
elif file_type == 'csv':
if source.startswith('http'):
df = read_csv_from_web(source)
else:
df = read_csv_from_disk(source)
# Write the DataFrame to the destination
write_csv_to_disk(df, destination)
else:
print(f"Unsupported file type: {file_type}")
#Step 5: Example Usage
if __name__ == "__main__":
# Example for reading a .txt file from the web and saving it locally
txt_url = "https://example.com/sample.txt"
process_file('txt', txt_url, "output/sample.txt")

# Example for reading a .csv file from the disk and saving it to another location
csv_file_path = "data/sample.csv"
process_file('csv', csv_file_path, "output/sample_output.csv")

Page 4 of 28
Program No. : 2

Implementation of a program for reading of EXCEL and XML data sheets.

#Step 1: Import Necessary Libraries

import pandas as pd
import xml.etree.ElementTree as ET
#Step 2: Define Functions to Handle Different File Types
def read_excel_file(file_path):
df = pd.read_excel(file_path)
return df
#Function to Read an XML File
def read_xml_file(file_path):
tree = ET.parse(file_path)
root = tree.getroot()
return root
#Step 3: Write Data to a Specific Disk Location
def write_excel_to_disk(df, output_path):
df.to_excel(output_path, index=False)
#Function to Write XML Data to Disk
def write_xml_to_disk(root, output_path):
tree = ET.ElementTree(root)
tree.write(output_path, encoding="utf-8", xml_declaration=True)
#Step 4: Implement a Main Function to Combine All Operations
def process_file(file_type, source, destination):
# Read the data based on file type and source type
if file_type == 'excel':
df = read_excel_file(source)
# Write the DataFrame to the destination
write_excel_to_disk(df, destination)
elif file_type == 'xml':
root = read_xml_file(source)
# Write the XML tree to the destination

Page 5 of 28
write_xml_to_disk(root, destination)
else:
print(f"Unsupported file type: {file_type}")
#Step 5: Example Usage
if __name__ == "__main__":
# Example for reading an Excel file and saving it locally
excel_file_path = "data/sample.xlsx"
process_file('excel', excel_file_path, "output/sample_output.xlsx")

# Example for reading an XML file and saving it locally

xml_file_path = "data/sample.xml"
process_file('xml', xml_file_path, "output/sample_output.xml")

Page 6 of 28
Program No. : 3

Implementation of Basic Statistics functions and performs visualization.

#Step 1: Import Necessary Libraries
import numpy as np
import pandas as pd
from scipy import stats
import matplotlib.pyplot as plt
import seaborn as sns

#Step 2: Implement Basic Statistics Functions

#1. Mean
def calculate_mean(data):
return np.mean(data)

2. Median
def calculate_median(data):
return np.median(data)

3. Mode
def calculate_mode(data):
mode, count = stats.mode(data)
return mode[0]

4. Standard Deviation
def calculate_std_dev(data):
return np.std(data)

5. Variance
def calculate_variance(data):
return np.var(data)

6. Percentiles
def calculate_percentile(data, percentile):
return np.percentile(data, percentile)
Step 3: Perform Data Visualization

1. Histogram
def plot_histogram(data, title='Histogram', xlabel='Values', ylabel='Frequency'):
plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.show()
Page 7 of 28
2. Box Plot
def plot_boxplot(data, title='Boxplot'):
plt.figure(figsize=(10, 6))
sns.boxplot(data)
plt.title(title)
plt.show()

3. Scatter Plot
def plot_scatter(x, y, title='Scatter Plot', xlabel='X', ylabel='Y'):
plt.figure(figsize=(10, 6))
plt.scatter(x, y, color='purple')
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.show()

4. Line Plot
def plot_line(x, y, title='Line Plot', xlabel='X', ylabel='Y'):
plt.figure(figsize=(10, 6))
plt.plot(x, y, color='green')
plt.title(title)
plt.xlabel(xlabel)
plt.ylabel(ylabel)
plt.show()
#Step 4: Example Usage
if __name__ == "__main__":
# Example data
data = np.random.normal(loc=0, scale=1, size=1000)

# Calculate basic statistics

mean = calculate_mean(data)
median = calculate_median(data)
mode = calculate_mode(data)
std_dev = calculate_std_dev(data)
variance = calculate_variance(data)
percentile_25 = calculate_percentile(data, 25)
percentile_75 = calculate_percentile(data, 75)

# Print calculated statistics

print(f"Mean: {mean}")
print(f"Median: {median}")
print(f"Mode: {mode}")
print(f"Standard Deviation: {std_dev}")
print(f"Variance: {variance}")
Page 8 of 28
print(f"25th Percentile: {percentile_25}")
print(f"75th Percentile: {percentile_75}")

# Visualize data
plot_histogram(data)
plot_boxplot(data)

# Example scatter plot with another dataset

x = np.random.normal(loc=0, scale=1, size=100)
y = x * 2 + np.random.normal(loc=0, scale=1, size=100)
plot_scatter(x, y)

# Example line plot

plot_line(x, y)

Page 9 of 28
Program No. : 4

Implementation of K-means Clustering and K-nearest Neighbor algorithm.

#Step 1: Import Necessary Libraries

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

#Step 2: K-means Clustering Implementation

1. Generate Sample Data

def generate_data_for_clustering(n_samples=300, n_features=2, centers=4, cluster_std=1.0):
X, y = make_blobs(n_samples=n_samples, n_features=n_features, centers=centers,
cluster_std=cluster_std, random_state=42)
return X, y

2. Implement K-means Clustering

def kmeans_clustering(X, n_clusters):
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
kmeans.fit(X)
return kmeans

3. Visualize K-means Clustering

def plot_kmeans_clusters(X, kmeans):
plt.figure(figsize=(10, 6))
plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap='viridis', marker='o')
plt.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s=300, c='red', marker='x')
plt.title("K-means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

#Step 3: K-Nearest Neighbor (KNN) Implementation

def generate_data_for_knn(n_samples=300, n_features=2, centers=4, cluster_std=1.0):
X, y = make_blobs(n_samples=n_samples, n_features=n_features, centers=centers,
cluster_std=cluster_std, random_state=42)
return X, y

1. Generate Sample Data for KNN

2. Implement K-Nearest Neighbor Classifier

def knn_classifier(X_train, y_train, X_test, n_neighbors=5):
knn = KNeighborsClassifier(n_neighbors=n_neighbors)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
return y_pred, knn

3. Evaluate KNN Classifier

def evaluate_knn(y_test, y_pred):
accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)
return accuracy, cm, cr

#Step 4: Example Usage

1. K-means Clustering Example

if __name__ == "__main__":
# K-means Clustering Example
X, y = generate_data_for_clustering(n_samples=300, n_features=2, centers=4, cluster_std=1.0)
kmeans = kmeans_clustering(X, n_clusters=4)
plot_kmeans_clusters(X, kmeans)

2. K-Nearest Neighbor Example

if __name__ == "__main__":
# K-Nearest Neighbor Example
X, y = generate_data_for_knn(n_samples=300, n_features=2, centers=4, cluster_std=1.0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

y_pred, knn = knn_classifier(X_train, y_train, X_test, n_neighbors=5)

accuracy, cm, cr = evaluate_knn(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(cm)
print("Classification Report:")
print(cr)

# Visualize the KNN decision boundary

plot_knn_decision_boundary(X_train, y_train, knn)

#Step 5: Visualize KNN Decision Boundary

def plot_knn_decision_boundary(X, y, knn, h=0.02):
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1

Page 11 of 28
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Z = knn.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.4, cmap='viridis')
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', marker='o', cmap='viridis')
plt.title("K-Nearest Neighbor Decision Boundary")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Page 12 of 28
Program No. : 5

Implementation of Association Rules.

#Step 1: Install Necessary Libraries

pip install mlxtend

#Step 2: Import Necessary Libraries

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

#Step 3: Prepare the Dataset

def load_dataset():
data = {'Milk': [1, 0, 1, 1, 0],
'Bread': [1, 1, 0, 1, 1],
'Butter': [0, 1, 1, 1, 0],
'Beer': [1, 0, 0, 1, 1],
'Diapers': [0, 1, 1, 1, 0]}

df = pd.DataFrame(data)
return df

#Step 4: Generate Frequent Itemsets

def generate_frequent_itemsets(df, min_support=0.6):
frequent_itemsets = apriori(df, min_support=min_support, use_colnames=True)
return frequent_itemsets

#Step 5: Generate Association Rules

def generate_association_rules(frequent_itemsets, metric="lift", min_threshold=1.0):
rules = association_rules(frequent_itemsets, metric=metric, min_threshold=min_threshold)
return rules

#Step 6: Example Usage

if __name__ == "__main__":
# Load dataset
df = load_dataset()

# Generate frequent itemsets with a minimum support of 0.6

frequent_itemsets = generate_frequent_itemsets(df, min_support=0.6)

print("Frequent Itemsets:")
print(frequent_itemsets)

# Generate association rules with a minimum lift of 1.0

rules = generate_association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)

print("\nAssociation Rules:")
Page 13 of 28
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

Page 14 of 28
Program No. : 6

Implementation of Linear Regression and Logistic Regression.

#Step 1: Import Necessary Libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.metrics import mean_squared_error, r2_score, accuracy_score, confusion_matrix,
classification_report
import matplotlib.pyplot as plt
import seaborn as sns

#Step 2: Linear Regression Implementation

#1. Generate or Load Dataset

def generate_linear_data(n_samples=100, noise=10):
np.random.seed(42)
X = 2 * np.random.rand(n_samples, 1)
y = 4 + 3 * X + np.random.randn(n_samples, 1) * noise
return X, y

X, y = generate_linear_data()

#2. Split the Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#3. Train the Linear Regression Model

linear_model = LinearRegression()
linear_model.fit(X_train, y_train)

#4. Predict and Evaluate the Model

y_pred = linear_model.predict(X_test)

# Calculate Mean Squared Error and R2 Score

mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)

print(f"Mean Squared Error: {mse}")

print(f"R2 Score: {r2}")

#5. Visualize the Results

plt.figure(figsize=(10, 6))
plt.scatter(X_test, y_test, color='blue', label='Actual')
plt.plot(X_test, y_pred, color='red', linewidth=2, label='Predicted')

Page 15 of 28
plt.title('Linear Regression')
plt.xlabel('X')
plt.ylabel('y')
plt.legend()
plt.show()

#Step 3: Logistic Regression Implementation

#1. Generate or Load Dataset

from sklearn.datasets import make_classification

def generate_logistic_data(n_samples=100, n_features=2, n_classes=2):

X, y = make_classification(n_samples=n_samples, n_features=n_features, n_classes=n_classes,
random_state=42)
return X, y

X, y = generate_logistic_data()

#2.Split the Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#3.Train the Logistic Regression Model

logistic_model = LogisticRegression()
logistic_model.fit(X_train, y_train)

#4. Predict and Evaluate the Model

y_pred = logistic_model.predict(X_test)

# Calculate Accuracy and Confusion Matrix

accuracy = accuracy_score(y_test, y_pred)
cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(cm)
print("Classification Report:")
print(cr)

#5. Visualize the Decision Boundary

def plot_decision_boundary(X, y, model):
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

Page 16 of 28
plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.4, cmap='viridis')
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap='viridis')
plt.title("Logistic Regression Decision Boundary")
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

plot_decision_boundary(X_test, y_test, logistic_model)

Page 17 of 28
Program No. : 7

Implementation of Naive Bayesian Classifier.

#Step 1: Import Necessary Libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

#Step 2: Generate or Load Dataset

from sklearn.datasets import make_classification

def generate_naive_bayes_data(n_samples=100, n_features=2, n_classes=2):

X, y = make_classification(n_samples=n_samples, n_features=n_features, n_classes=n_classes,
n_informative=2, random_state=42)
return X, y

X, y = generate_naive_bayes_data()

#Step 3: Split the Data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

#Step 4: Train the Naive Bayesian Classifier

nb_classifier = GaussianNB()
nb_classifier.fit(X_train, y_train)

#Step 5: Predict and Evaluate the Model

y_pred = nb_classifier.predict(X_test)

# Calculate Accuracy
accuracy = accuracy_score(y_test, y_pred)

# Generate Confusion Matrix and Classification Report

cm = confusion_matrix(y_test, y_pred)
cr = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print("Confusion Matrix:")
print(cm)
print("Classification Report:")
print(cr)

Page 18 of 28
#Step 6: Visualize the Decision Boundary
def plot_decision_boundary(X, y, model):
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, 0.01),
np.arange(y_min, y_max, 0.01))
Z = model.predict(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.figure(figsize=(10, 6))
plt.contourf(xx, yy, Z, alpha=0.4, cmap='viridis')
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap='viridis')
plt.title("Naive Bayesian Classifier Decision Boundary")
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.show()

plot_decision_boundary(X_test, y_test, nb_classifier)

Page 19 of 28
Program No. : 8

Implementation of Decision Trees.

#Step 1: Import Necessary Libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier, DecisionTreeRegressor, plot_tree
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report,
mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns

#Step 2: Generate or Load Dataset

#1. Classification Dataset

from sklearn.datasets import make_classification

def generate_classification_data(n_samples=100, n_features=4, n_classes=2):

X, y = make_classification(n_samples=n_samples, n_features=n_features, n_classes=n_classes,
random_state=42)
return X, y

X_class, y_class = generate_classification_data()

#2. Regression Dataset

from sklearn.datasets import make_regression

def generate_regression_data(n_samples=100, n_features=1, noise=0.1):

X, y = make_regression(n_samples=n_samples, n_features=n_features, noise=noise,
random_state=42)
return X, y

X_reg, y_reg = generate_regression_data()

#Step 3: Split the Data

X_train_class, X_test_class, y_train_class, y_test_class = train_test_split(X_class, y_class, test_size=0.3,
random_state=42)
X_train_reg, X_test_reg, y_train_reg, y_test_reg = train_test_split(X_reg, y_reg, test_size=0.3,
random_state=42)

#Step 4: Train the Decision Tree Models

1. Decision Tree for Classification

dt_classifier = DecisionTreeClassifier(random_state=42)
dt_classifier.fit(X_train_class, y_train_class)
Page 20 of 28
2. Decision Tree for Regression
dt_regressor = DecisionTreeRegressor(random_state=42)
dt_regressor.fit(X_train_reg, y_train_reg)

#Step 5: Predict and Evaluate the Models

#1. Classification Model Evaluation

y_pred_class = dt_classifier.predict(X_test_class)

# Accuracy and Confusion Matrix

accuracy_class = accuracy_score(y_test_class, y_pred_class)
cm_class = confusion_matrix(y_test_class, y_pred_class)
cr_class = classification_report(y_test_class, y_pred_class)

print(f"Classification Accuracy: {accuracy_class}")

print("Confusion Matrix:")
print(cm_class)
print("Classification Report:")
print(cr_class)

#2. Regression Model Evaluation

y_pred_reg = dt_regressor.predict(X_test_reg)

# Mean Squared Error and R2 Score

mse_reg = mean_squared_error(y_test_reg, y_pred_reg)
r2_reg = r2_score(y_test_reg, y_pred_reg)

print(f"Regression Mean Squared Error: {mse_reg}")

print(f"Regression R2 Score: {r2_reg}")

#Step 6: Visualize the Decision Trees

#1. Plot the Decision Tree for Classification

plt.figure(figsize=(20,10))
plot_tree(dt_classifier, filled=True, feature_names=[f'Feature {i}' for i in range(X_class.shape[1])],
class_names=['Class 0', 'Class 1'])
plt.title("Decision Tree - Classification")
plt.show()

#2. Plot the Decision Tree for Regression

plt.figure(figsize=(20,10))
plot_tree(dt_regressor, filled=True, feature_names=[f'Feature {i}' for i in range(X_reg.shape[1])])
plt.title("Decision Tree - Regression")
plt.show()

#Step 7: Visualize the Predictions (for Regression)

plt.figure(figsize=(10, 6))
plt.scatter(X_test_reg, y_test_reg, color='blue', label='Actual')
Page 21 of 28
plt.scatter(X_test_reg, y_pred_reg, color='red', label='Predicted')
plt.title('Decision Tree Regression')
plt.xlabel('Feature')
plt.ylabel('Target')
plt.legend()
plt.show()

Page 22 of 28
Program No. : 9

Implementation of Random Forest.

#Step 1: Import Necessary Libraries

import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier, RandomForestRegressor
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report,
mean_squared_error, r2_score
import matplotlib.pyplot as plt
import seaborn as sns

#Step 2: Generate or Load Dataset

#1. Classification Dataset

from sklearn.datasets import make_classification

def generate_classification_data(n_samples=100, n_features=4, n_classes=2):

X, y = make_classification(n_samples=n_samples, n_features=n_features, n_classes=n_classes,
random_state=42)
return X, y

X_class, y_class = generate_classification_data()

#2. Regression Dataset

from sklearn.datasets import make_regression

def generate_regression_data(n_samples=100, n_features=1, noise=0.1):

X, y = make_regression(n_samples=n_samples, n_features=n_features, noise=noise,
random_state=42)
return X, y

X_reg, y_reg = generate_regression_data()

#Step 3: Split the Data

#Step 4: Train the Random Forest Models

#1. Random Forest for Classification

rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
rf_classifier.fit(X_train_class, y_train_class)
Page 23 of 28
#2. Random Forest for Regression
rf_regressor = RandomForestRegressor(n_estimators=100, random_state=42)
rf_regressor.fit(X_train_reg, y_train_reg)

#Step 5: Predict and Evaluate the Models

#1. Classification Model Evaluation

y_pred_class = rf_classifier.predict(X_test_class)

# Accuracy and Confusion Matrix

accuracy_class = accuracy_score(y_test_class, y_pred_class)
cm_class = confusion_matrix(y_test_class, y_pred_class)
cr_class = classification_report(y_test_class, y_pred_class)

print(f"Classification Accuracy: {accuracy_class}")

print("Confusion Matrix:")
print(cm_class)
print("Classification Report:")
print(cr_class)

#2. Regression Model Evaluation

y_pred_reg = rf_regressor.predict(X_test_reg)

# Mean Squared Error and R2 Score

mse_reg = mean_squared_error(y_test_reg, y_pred_reg)
r2_reg = r2_score(y_test_reg, y_pred_reg)

print(f"Regression Mean Squared Error: {mse_reg}")

print(f"Regression R2 Score: {r2_reg}")

#Step 6: Feature Importance

#1. Feature Importance for Classification

feature_importance_class = rf_classifier.feature_importances_

plt.figure(figsize=(10, 6))
sns.barplot(x=feature_importance_class, y=[f'Feature {i}' for i in range(X_class.shape[1])])
plt.title("Feature Importance - Classification")
plt.show()

#2. Feature Importance for Regression

feature_importance_reg = rf_regressor.feature_importances_

plt.figure(figsize=(10, 6))
sns.barplot(x=feature_importance_reg, y=[f'Feature {i}' for i in range(X_reg.shape[1])])
plt.title("Feature Importance - Regression")
plt.show()

Page 24 of 28
Program No. : 10

Implementation of Principal component analysis.

#Step 1: Import Necessary Libraries

import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import seaborn as sns

#Step 2: Load or Generate Dataset

from sklearn.datasets import load_iris

def load_iris_data():
iris = load_iris()
X = iris.data
y = iris.target
feature_names = iris.feature_names
return X, y, feature_names

X, y, feature_names = load_iris_data()

#Step 3: Standardize the Data

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

#Step 4: Perform PCA

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

#Step 5: Analyze the Explained Variance

explained_variance = pca.explained_variance_ratio_

print(f"Explained Variance Ratio: {explained_variance}")

print(f"Total Variance Explained by the first 2 components: {np.sum(explained_variance)}")

#Step 6: Visualize the PCA Results

plt.figure(figsize=(10, 6))
sns.scatterplot(x=X_pca[:, 0], y=X_pca[:, 1], hue=y, palette='viridis', s=100)
plt.title('PCA of Iris Dataset')
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()

Page 25 of 28
#Step 7: (Optional) Cumulative Explained Variance Plot
pca_full = PCA().fit(X_scaled)
cumulative_variance = np.cumsum(pca_full.explained_variance_ratio_)

plt.figure(figsize=(8, 5))
plt.plot(range(1, len(cumulative_variance) + 1), cumulative_variance, marker='o', linestyle='--')
plt.title('Cumulative Explained Variance by Number of Principal Components')
plt.xlabel('Number of Principal Components')
plt.ylabel('Cumulative Explained Variance')
plt.grid(True)
plt.show()

Page 26 of 28
Program No. : 11

Implementation of Singular Value Decomposition.

#Step 1: Import Necessary Libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from numpy.linalg import svd

#Step 2: Generate or Load a Dataset

def create_sample_matrix():
return np.array([
[1, 2, 3],
[4, 5, 6],
[7, 8, 9],
[10, 11, 12]
])

matrix = create_sample_matrix()
print("Original Matrix:")
print(matrix)

#Step 3: Apply Singular Value Decomposition

U, Sigma, VT = svd(matrix)

print("U Matrix:")
print(U)

print("Sigma Values:")
print(Sigma)

print("V^T Matrix:")
print(VT)

#Step 4: Reconstruct the Original Matrix

Sigma_matrix = np.diag(Sigma)
reconstructed_matrix = np.dot(U, np.dot(Sigma_matrix, VT))

print("Reconstructed Matrix:")
print(reconstructed_matrix)

#Step 5: Truncated SVD for Dimensionality Reduction

k = 2 # Number of singular values to keep
U_k = U[:, :k]

Page 27 of 28
Sigma_k = np.diag(Sigma[:k])
VT_k = VT[:k, :]

reduced_matrix = np.dot(U_k, np.dot(Sigma_k, VT_k))

print(f"Reduced Matrix using {k} singular values:")

print(reduced_matrix)

#Step 6: Visualize the Singular Values

plt.figure(figsize=(8, 5))
plt.plot(Sigma, marker='o')
plt.title('Singular Values')
plt.xlabel('Index')
plt.ylabel('Singular Value')
plt.grid(True)
plt.show()

Page 28 of 28
Page 29 of 28
Page 30 of 28
By- Ms. Shailja Chaurasiya(Asst. Prof.- Deptt. of CSE)

Page 31 of 24

50 Days of Data Analysis With Python - Sample Document
0% (1)
50 Days of Data Analysis With Python - Sample Document
14 pages
IMDB Movie Dataset Analysis: Sarada Saripalli
No ratings yet
IMDB Movie Dataset Analysis: Sarada Saripalli
9 pages
Assignment 1 Answers
No ratings yet
Assignment 1 Answers
5 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
No ratings yet
Sarkar, DR Tirthajyoti - Roychowdhury, Shubhadeep - Data Wrangling With Python - Creating Actionable Data From Raw Sources-Packt Publishing (2019)
538 pages
EXP1-siddhant Gupta (23 - SE - 148)
No ratings yet
EXP1-siddhant Gupta (23 - SE - 148)
17 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Practical 1
No ratings yet
Practical 1
5 pages
PDS Exp 13 To 16
No ratings yet
PDS Exp 13 To 16
14 pages
Program 4: Public
No ratings yet
Program 4: Public
10 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Digital principal and system design
No ratings yet
Digital principal and system design
17 pages
FDSA Lab Manual Aim Algorithm
No ratings yet
FDSA Lab Manual Aim Algorithm
32 pages
Foundation of Data Science Lab Manual Full
No ratings yet
Foundation of Data Science Lab Manual Full
8 pages
Rest of The Ip Project
No ratings yet
Rest of The Ip Project
26 pages
Project File 12
No ratings yet
Project File 12
22 pages
Ip Project - Docx1
100% (4)
Ip Project - Docx1
22 pages
MCP Lab-2023 ContentForPythonLibrariesTopic
No ratings yet
MCP Lab-2023 ContentForPythonLibrariesTopic
9 pages
Programming Notes 2
No ratings yet
Programming Notes 2
9 pages
CO-367 Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
CO-367 Machine Learning Lab File: Submitted To: Submitted by
12 pages
Unit 3 (FODS)
No ratings yet
Unit 3 (FODS)
34 pages
ML Lab Manual
No ratings yet
ML Lab Manual
28 pages
Practical Record File X - DS
No ratings yet
Practical Record File X - DS
12 pages
01 Python For Data Analysis (Ziad)
No ratings yet
01 Python For Data Analysis (Ziad)
53 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Python Lab PRG
No ratings yet
Python Lab PRG
20 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
27 pages
DS3 1
No ratings yet
DS3 1
8 pages
Python For Data Science Quickstart Guide
No ratings yet
Python For Data Science Quickstart Guide
13 pages
Fdsa Lab Manual Final
No ratings yet
Fdsa Lab Manual Final
70 pages
Machine Learning Lab File: Submitted To: Submitted by
No ratings yet
Machine Learning Lab File: Submitted To: Submitted by
9 pages
Data Wrangling With Python Lab Manual
No ratings yet
Data Wrangling With Python Lab Manual
29 pages
Dav 2 Unit
No ratings yet
Dav 2 Unit
55 pages
Pandas
No ratings yet
Pandas
50 pages
NumPy and Pandas
No ratings yet
NumPy and Pandas
12 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
Data Analysis Lab - Final - 23-24
No ratings yet
Data Analysis Lab - Final - 23-24
11 pages
Intro2python Part2
No ratings yet
Intro2python Part2
26 pages
ML IU48prac1,2
No ratings yet
ML IU48prac1,2
16 pages
FDS Record-1-4
No ratings yet
FDS Record-1-4
18 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Python For Data Science Unit 3: DR Kruti Dangarwala CSE & IT Department Svmit
No ratings yet
Python For Data Science Unit 3: DR Kruti Dangarwala CSE & IT Department Svmit
113 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
ML Lab File Vijay Kumar
No ratings yet
ML Lab File Vijay Kumar
16 pages
Data Science - A First Introduction With Python (Z-Lib - Io)
No ratings yet
Data Science - A First Introduction With Python (Z-Lib - Io)
452 pages
DS Final
No ratings yet
DS Final
46 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
Data Processing With Python and R
No ratings yet
Data Processing With Python and R
6 pages
ML Lab Manual With Statistical Formulas
No ratings yet
ML Lab Manual With Statistical Formulas
9 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
Pert Q Python
No ratings yet
Pert Q Python
3 pages
Data Science
No ratings yet
Data Science
42 pages
Report
No ratings yet
Report
18 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
Dsa Lab
No ratings yet
Dsa Lab
28 pages
Data Analysis Python Read The Docs Io en Latest
No ratings yet
Data Analysis Python Read The Docs Io en Latest
79 pages
Black and White Blank Note Document
No ratings yet
Black and White Blank Note Document
57 pages
FDS Final Manual
No ratings yet
FDS Final Manual
41 pages
Class X - A.I. - Practical Lab Manual - VVA 2024-25
No ratings yet
Class X - A.I. - Practical Lab Manual - VVA 2024-25
50 pages
2A - Python+Data Analysis For Pyhton2 v2
No ratings yet
2A - Python+Data Analysis For Pyhton2 v2
38 pages
Python Syntax and Functions for Data Mining
No ratings yet
Python Syntax and Functions for Data Mining
6 pages
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
Mann-Whitney Worked Example
No ratings yet
Mann-Whitney Worked Example
5 pages
Students Perception of Computer Based Exams
No ratings yet
Students Perception of Computer Based Exams
11 pages
Worksheet 2
No ratings yet
Worksheet 2
6 pages
Jackknife Bias Estimator, Standard Error and Pseudo-Value
No ratings yet
Jackknife Bias Estimator, Standard Error and Pseudo-Value
14 pages
Statistics For Business and Economics: Inferences Based On Two Samples: Confidence Intervals & Tests of Hypotheses
No ratings yet
Statistics For Business and Economics: Inferences Based On Two Samples: Confidence Intervals & Tests of Hypotheses
112 pages
Sta 205 QNS
No ratings yet
Sta 205 QNS
7 pages
Statistical Models and Experimental Design: A.P. Holmes K.J. Friston
No ratings yet
Statistical Models and Experimental Design: A.P. Holmes K.J. Friston
47 pages
Regression: by Vijeta Gupta Amity University
No ratings yet
Regression: by Vijeta Gupta Amity University
15 pages
Jurnal 2021
No ratings yet
Jurnal 2021
12 pages
Fall 2023 DS-GA 1002 Probability and Statistics (Website)
No ratings yet
Fall 2023 DS-GA 1002 Probability and Statistics (Website)
9 pages
21-Z-Test - Single Mean and Difference of Means-07!03!2024
No ratings yet
21-Z-Test - Single Mean and Difference of Means-07!03!2024
24 pages
Z test: The value of the standard normal random variable Z that cuts off a right tail of area α is denoted z
No ratings yet
Z test: The value of the standard normal random variable Z that cuts off a right tail of area α is denoted z
2 pages
44 Predicting Customer Lifetime Value For A Subscription-Based Business Presentation
No ratings yet
44 Predicting Customer Lifetime Value For A Subscription-Based Business Presentation
19 pages
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
No ratings yet
R - (2017) Understanding and Applying Basic Statistical Methods Using R (Wilcox - R - R) (Sols.)
91 pages
Practical Research 2 Fourth Summative Test-Second Quarter
No ratings yet
Practical Research 2 Fourth Summative Test-Second Quarter
2 pages
Polynomial Regression
No ratings yet
Polynomial Regression
6 pages
Lecture 3 - Numerical Statistics
No ratings yet
Lecture 3 - Numerical Statistics
7 pages
PSED18 02 Descriptive Statistics, Distributions
No ratings yet
PSED18 02 Descriptive Statistics, Distributions
54 pages
Quality Control
No ratings yet
Quality Control
19 pages
A Study On Customers' Satisfaction and Loyalty From The Service Quality of Agrani Bank Limited: A Case Study of Gopalganj District
No ratings yet
A Study On Customers' Satisfaction and Loyalty From The Service Quality of Agrani Bank Limited: A Case Study of Gopalganj District
23 pages
Of Abbay River Basin, Ethiopia
No ratings yet
Of Abbay River Basin, Ethiopia
10 pages
Lampiran 7 Analisa Statistik 2
No ratings yet
Lampiran 7 Analisa Statistik 2
2 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
43 pages
01 Handout 1
No ratings yet
01 Handout 1
16 pages
Practice
No ratings yet
Practice
3 pages
Metropolitan Research Inc. Case Study
No ratings yet
Metropolitan Research Inc. Case Study
6 pages
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
100% (1)
Machine Learning (Analytics Vidhya) : What Is Logistic Regression?
5 pages
Unit 4 Test Review Answers
No ratings yet
Unit 4 Test Review Answers
3 pages