0% found this document useful (0 votes)
2 views29 pages

Ml record_merged (1)

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 29

ROCHIS VALLEY, MANIKBANDAR

NIZAMABD -503 003

CERTIFICATE

NAME OF THE LABORATORY________________________________

ACADEMIC YEAR:20___-20___

Certified that is the bonafide record of work done in the_______________


____________laboratory by Mr/Ms _______________________________
__________________________of____Year B.Tech, ___SEM____Branch
With HALLTICKET NO:_____________________during Academic year
20___-20___ and has performed_____no.Of Experiments out of_____no.
Of experiment under my Supervision

LECTURER INCHARGE HEAD OF THE DEPARTMENT


WITH SEAL

DATE EXTERNAL EXAMINER


VIJAY RURAL ENGINEERING COLLEGE
MACHINE LEARNING LAB INDEX
DATE OF
DATE OF LECTURER
S.NO NAME OF THE EXPERIMENT EXPERIMENT PAGE NO. REMARK
SUBMISSION SIGN
PERFORMED

Write a python program to compute Central


Tendency Measures: Mean, Median,
1
Mode Measure of Dispersion: Variance,
Standard Deviation

Study of Python Basic Libraries such as


2
Statistics, Math, Numpy and Scipy

Study of Python Libraries for ML application


3
such as Pandas and Matplotlib

Write a Python program to implement Simple


4
Linear Regression

Implementation of Multiple Linear Regression


5
for House Price Prediction using sklearn

Implementation of Decision tree using sklearn


6
and its parameter tuning

7 Implementation of KNN using sklearn


Implementation of Logistic Regression using
8
sklearn
9 Implementation of K-Means Clustering

Performance analysis of Classification


10
Algorithms on a specific dataset (Mini Project)
MEACHINE LEARNING
LAB MANUAL
III - II
1)Write a python program to compute Central Tendency Measures: Mean, Median,

Mode Measure of Dispersion: Variance, Standard Deviation

Program:
import statistics

def ctm(data):

if not data:

return("no data found")

mean = statistics.mean(data)

median = statistics.median(data)

try:

mode = statistics.mode(data)

except statistics.StatisticsError:

mode= "No unique mode found"

variance = statistics.variance(data)

sd = statistics.stdev(data)

print(f"mean: {mean}")

print(f"median: {median}")

print(f"mode: {mode}")

print(f"varience: {variance}")

print(f"standard division: {sd}")

if __name__ == "__main__":

data = [10,20,30,40,40,50,60,70,80,90]

ctm(data)
Output:
mean: 49

median: 45.0

mode: 40
varience: 676.6666666666666
standard division: 26.01281735350223
2) Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy

1. Statistics Library

The statistics module in Python provides functions for calculating mathematical statistics of
numeric data.

Key Features:

 Central Tendency Measures:


o mean(data): Arithmetic mean.
o median(data): Median value.
o mode(data): Most common value.
 Spread Measures:
o variance(data): Variance of the data.
o stdev(data): Standard deviation.
 Additional Functions:
o median_low(data): Low median of data.
o median_high(data): High median of data.
o harmonic_mean(data): Harmonic mean of data.

Example:

import statistics as stats

data = [1, 2, 2, 3, 4]
print("Mean:", stats.mean(data))
print("Median:", stats.median(data))
print("Mode:", stats.mode(data))

2. Math Library

The math module provides access to mathematical functions defined by the C standard.

Key Features:

 Basic Math Operations:


o sqrt(x): Square root.
o pow(x, y): x raised to the power y.
o factorial(x): Factorial of x.
 Trigonometric Functions:
o sin(x), cos(x), tan(x): Trigonometric functions.
 Logarithmic Functions:
o log(x, base): Logarithm of x to the specified base.
o log10(x): Base-10 logarithm.
 Constants:
o pi: Mathematical constant π.
o e: Euler's number.

Example:
import math

print("Square root of 16:", math.sqrt(16))


print("Value of Pi:", math.pi)
print("Sine of 90 degrees:", math.sin(math.radians(90)))

3. NumPy

NumPy is a powerful library for numerical computations.

Key Features:

 Arrays:
o numpy.array(): Create arrays.
o numpy.zeros(shape): Create an array of zeros.
o numpy.ones(shape): Create an array of ones.
 Mathematical Operations:
o Element-wise operations on arrays (+, -, *, /).
o Linear algebra functions (dot, cross, linalg).
 Statistical Functions:
o numpy.mean(): Mean of array elements.
o numpy.std(): Standard deviation.
o numpy.median(): Median value.
 Indexing and Slicing:
o Access specific elements or subarrays.

Example:

import numpy as np

arr = np.array([1, 2, 3, 4])


print("Array:", arr)
print("Mean:", np.mean(arr))
print("Standard Deviation:", np.std(arr))

4. SciPy

SciPy builds on NumPy and provides additional functionality for scientific computing.

Key Features:

 Optimization:
o scipy.optimize.minimize(): Minimize a scalar function.
 Integration:
o scipy.integrate.quad(): Numerical integration.
 Linear Algebra:
o scipy.linalg.solve(): Solve linear systems.
 Statistics:
o scipy.stats: Statistical distributions and functions.
 Signal Processing:
o scipy.signal: Signal processing utilities.
Example:

from scipy import stats

data = [1, 2, 2, 3, 4]
print("Mode:", stats.mode(data).mode[0])

Applications:

1. Statistics: Data analysis and summarization.


2. Math: Solving equations, trigonometry, and logarithmic calculations.
3. NumPy: High-performance array manipulation.
4. SciPy: Advanced computation in engineering, machine learning, and scientific domains.
3) Study of Python Libraries for ML application such as Pandas and Matplotlib

Python libraries like Pandas and Matplotlib are essential for Machine Learning (ML)
applications, as they help with data manipulation, analysis, and visualization. Here’s a detailed
overview:

1. Pandas

Pandas is a powerful library for data manipulation and analysis. It provides data structures like
Series and DataFrame, which are widely used in ML for preprocessing and exploration.

Key Features:

 Data Structures:
o Series: One-dimensional labeled array.
o DataFrame: Two-dimensional labeled data structure (like a table).
 Data Manipulation:
o read_csv(), read_excel(): Load datasets from files.
o to_csv(), to_excel(): Save datasets to files.
o Filtering and indexing using .loc[] and .iloc[].
 Data Cleaning:
o Handling missing values: dropna(), fillna().
o Duplicates: drop_duplicates().
 Data Analysis:
o Aggregation: groupby(), pivot_table().
o Statistical methods: .mean(), .std(), .describe().
 Integration with ML Libraries:
o Easily convert DataFrames to NumPy arrays or directly use them in ML libraries
like Scikit-learn.

Example:

import pandas as pd

# Load data
data = pd.read_csv('sample.csv')

# Display first few rows


print(data.head())

# Basic statistics
print(data.describe())

# Handling missing values


data.fillna(0, inplace=True)
2. Matplotlib

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations.
It is often used to visualize data trends and patterns in ML.

Key Features:

 Basic Plotting:
o plot(): Line plots.
o scatter(): Scatter plots.
o bar(): Bar charts.
 Customizations:
o Title, labels, and legends: title(), xlabel(), ylabel(), legend().
o Colors, markers, and styles.
 Subplots:
o Multiple plots in one figure: subplot().
 Integration:
o Works seamlessly with Pandas: Direct plotting from DataFrames.

Example:

import matplotlib.pyplot as plt

# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]

# Line plot
plt.plot(x, y, label='Line Plot')
plt.title('Basic Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()

Combined Use of Pandas and Matplotlib in ML:

The combination of Pandas and Matplotlib is particularly useful in the exploratory data analysis
(EDA) phase of ML, where you examine your dataset to identify trends, correlations, and
anomalies.

Example: Data Analysis and Visualization

import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
data = pd.read_csv('sample.csv')

# Check missing values


print(data.isnull().sum())

# Visualize a specific column


data['Age'].hist(bins=10)
plt.title('Age Distribution')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

# Scatter plot
plt.scatter(data['Height'], data['Weight'], color='blue', alpha=0.5)
plt.title('Height vs. Weight')
plt.xlabel('Height')
plt.ylabel('Weight')
plt.show()

Applications in ML:

1. Pandas:
o Preprocessing: Data cleaning, normalization, and transformation.
o Feature engineering: Creating new features from existing ones.
o Handling time-series data.
2. Matplotlib:
o Visualizing data distributions, trends, and outliers.
o Understanding relationships between features.
o Plotting model performance (e.g., training and validation loss curves)
4)Write a Python program to implement Simple Linear Regression

PROGRAM

Example as pizza

import statistics

#defining sizes

data_items_collected = int(input("enter the parameter of data:"))

pizza_sizes = []

for i in range (data_items_collected):

size_of_pizza = int(input("enter the pizza sizes: "))

pizza_sizes.append(size_of_pizza)

print("list of pizza sizes collected are: ")

for size in pizza_sizes:

print(size)

#defining prizes

prizes = []

for i in range (data_items_collected):

no_of_prizes = int(input("enter the pizza prizes: "))

prizes.append(no_of_prizes)

print("list of pizza prizes collected are ")

for prize in prizes:

print(prize)
#mean of sizes(x)n

x_mean = statistics.mean(pizza_sizes)

print(f"mean of sizes:{x_mean}")

#mean of costs(y)

y_mean = statistics.mean(prizes)

print(f"mean of prizes:{y_mean}")

#diviation of sizes

diviation_sizes = []

for i in range (data_items_collected):

dev_x = pizza_sizes[i]-x_mean

diviation_sizes.append(dev_x)

print("diviation of x: ")

for dev_s in diviation_sizes:

print(dev_s)

#diviation of prizes

diviation_prizes = []

for i in range (data_items_collected):

dev_y = prizes[i]-y_mean

diviation_prizes.append(dev_y)

print("diviation of y:")

for dev_p in diviation_prizes:

print(dev_p)

#product of divitions(pod)

product_of_diviation = []
for i in range (data_items_collected):

p_o_d = diviation_sizes[i] * diviation_prizes[i]

product_of_diviation.append(p_o_d)

print("product_of_diviation")

for pod in product_of_diviation:

print(pod)

#sum of product

sum_of_product_of_diviation = sum(product_of_diviation)

print(f"sum of the product: {sum_of_product_of_diviation}")

#square_of_diviation_of_sizes

square_of_diviation_of_sizes = []

for i in range (data_items_collected):

sod = diviation_sizes[i]** 2

square_of_diviation_of_sizes.append(sod)

print("square_of_diviation_of_sizes:")

for sodos in square_of_diviation_of_sizes:

print(sodos)

#sum of square_of_diviation_of_sizes_x(sod)

sum_of_square_of_diviation_of_sizes_x = sum(square_of_diviation_of_sizes)

print(f"sum of the product: {sum_of_square_of_diviation_of_sizes_x}")

#m vlaue

slopp_m = sum_of_product_of_diviation/sum_of_square_of_diviation_of_sizes_x

print(f"m:{slopp_m}")
# Y = m * mean of x = > (mean of y)-m * (mean of x) final output is denoted as flag

flag = y_mean - slopp_m * x_mean

print(f"flag value:{flag}")

#doing a pridiction

new_size = int(input("enter the size of pizza you created: "))

prize_pridiction = slopp_m * new_size + flag

print(f"size of {new_size} pizza can be: {prize_pridiction} ")

OUTPUT

enter the parameter of data:3

enter the pizza sizes: 8

enter the pizza sizes: 10

enter the pizza sizes: 12

list of pizza sizes collected are:

10

12

enter the pizza prizes: 10

enter the pizza prizes: 13

enter the pizza prizes: 16

list of pizza prizes collected are

10

13

16

mean of sizes:10
mean of prizes:13

diviation of x:

-2

diviation of y:

-3

product_of_diviation

sum of the product: 12

square_of_diviation_of_sizes:

sum of the product: 8

m:1.5

flag value:-2.0

enter the size of pizza you created: 20

size of 20 pizza can be: 28.0


5. Implementation of Multiple Linear Regression for House Price Prediction using sklearn

Program:

import numpy as np

import pandas as pd

from sklearn.linear_model import LinearRegression

# Sample Data

data = {

'square_feets': [1000, 1500, 2000, 2500, 3000],

'bedrooms': [1, 2, 3, 4, 5],

'bathrooms': [1, 2, 2, 2.5, 3],

'prize': [5000, 8000, 12000, 18000, 25000]

df = pd.DataFrame(data)

print(df)

# Features and Target Variable

X = df[['square_feets', 'bedrooms', 'bathrooms']]

y = df[['prize']]

# Train the Model

model = LinearRegression()

model.fit(X, y)

# User Input
sq_feet = float(input("Enter the area of the plot: "))

bdrooms = float(input("Enter the number of bedrooms: ")) # Converted to float

btrooms = float(input("Enter the number of bathrooms: ")) # Converted to float

# Predict Rent

predict_rent = model.predict(np.array([[sq_feet, bdrooms, btrooms]]))

# Print Prediction

print(f"Predicted rent for {sq_feet} square feet, {bdrooms} bedrooms, and {btrooms} bathrooms
is: ₹{predict_rent[0][0]:,.2f}")

Output:
6. Implementation of Decision tree using sklearn and its parameter tuning
Program

from sklearn.tree import DecisionTreeClassifier

b_tech = ["cse","aiml","ece","eee","mech","civil"]
bse = ["mpcs", "mecs", "ba", "ca", "bba"]

X=[

[1, 450],
[1, 800],
[0, 0],

[1, 250],

[1, 600],
]

y = [0, 1, 2, 3, 1]

clf = DecisionTreeClassifier()
clf.fit(X, y)

emcet_rank = int(input("did you get rank(yes(1)/no(0))"))

if emcet_rank == 1 or emcet_rank == 0:

score_1 = int(input("enter the score you got in emcet: "))


prediction = clf.predict([[emcet_rank, score_1]])[0]

if prediction == 0:
print("You can apply for this courses::", b_tech[len(b_tech)//2:])

elif prediction == 1:
print("You can apply for this courses:", b_tech[:len(b_tech)//2])
elif prediction == 2:
print(bse)
elif prediction == 3:

print("You are not eligible for application")

else:
print("Invalid decision")
else:
print("Invalid input")

Output:
7. Implementation of KNN using sklearn

Program:

import numpy as np

from sklearn.neighbors import KNeighborsClassifier

marks = [95, 75, 69, 51, 39, 21, 0]

grade = ["a1", "a", "b", "c", "d", "e", "f"]

testing_score = int(input("enter your score: "))

marks_array = np.array(marks).reshape(-1, 1)

k=3

knn = KNeighborsClassifier(n_neighbors=k)

knn.fit(marks_array, grade)

distances, indices = knn.kneighbors([[testing_score]])

print(f"Distances: {distances[0]}")

print(f"Indices of the nearest neighbors: {indices[0]}")

predicted_grade = [grade[i] for i in indices[0]]

print(f"The grade for {testing_score} marks is: {predicted_grade[0]}")

Output:
8. Implementation of Logistic Regression using sklearn
from sklearn.linear_model import LogisticRegression

import numpy as np

import pandas as pd
data = {
'in_study_hours' : [2,4,6,8,10,12],
'y_n':[0,0,1,1,1,1]
}

df = pd.DataFrame(data)
print(data)
x = df[['in_study_hours']]

y = df['y_n']

model = LogisticRegression()
model.fit(x,y)

study_hours = float(input("enter the no of study hours: "))


predict = model.predict(np.array([[study_hours]]))

if predict == 1:

print("pass")
else:

print("fail")
Output:
9. Implementation of K-Means Clustering
import numpy as np

import pandas as pd

from sklearn.cluster import KMeans


data = {
'sr_no':["cr1","cr2","cr3","cr4","cr5","cr6","cr7"],
'age':[20,40,30,18,28,35,45],
'amount':[500,1000,800,300,1200,1400,1800]

}
df = pd.DataFrame(data)
print(df)

x = df[['age','amount']].values

nc = KMeans(n_clusters=3)
nc.fit(x)

test_data = np.array([[13,750]])
pridect = nc.predict(test_data)

print(f"the cloeset class are: {pridect[0]}")

Output:
10. Performance analysis of Classification Algorithms on a specific dataset (Mini Project)
Program:

# Import necessary libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split


from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

from sklearn.tree import DecisionTreeClassifier

from sklearn.neighbors import KNeighborsClassifier


from sklearn.svm import SVC

from sklearn.ensemble import RandomForestClassifier


from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score,
confusion_matrix

# Load Iris dataset

iris = load_iris()
X = iris.data

y = iris.target

# Split the data into training and test sets


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Standardize the features (important for some models like SVM, KNN)

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)

X_test = scaler.transform(X_test)
# Initialize classifiers
models = {
"Logistic Regression": LogisticRegression(),

"Decision Tree": DecisionTreeClassifier(),

"KNN": KNeighborsClassifier(),
"SVM": SVC(),
"Random Forest": RandomForestClassifier()
}

# Function to evaluate models


def evaluate_model(model, X_train, X_test, y_train, y_test):
model.fit(X_train, y_train)

y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)

precision = precision_score(y_test, y_pred, average='weighted')


recall = recall_score(y_test, y_pred, average='weighted')

f1 = f1_score(y_test, y_pred, average='weighted')

cm = confusion_matrix(y_test, y_pred)

return accuracy, precision, recall, f1, cm

# Create a DataFrame to store results

results = []

# Loop through each model, evaluate and store results

for model_name, model in models.items():


accuracy, precision, recall, f1, cm = evaluate_model(model, X_train, X_test, y_train, y_test)
results.append([model_name, accuracy, precision, recall, f1, cm])

# Convert the results into a DataFrame for easy display


results_df = pd.DataFrame(results, columns=["Model", "Accuracy", "Precision", "Recall", "F1-
Score", "Confusion Matrix"])

# Print the results


print(results_df)

# Plot the performance comparison

metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']


for metric in metrics:
plt.figure(figsize=(10, 6))
sns.barplot(x="Model", y=metric, data=results_df)

plt.title(f'Comparison of {metric} across Models')


plt.show()

# Plot confusion matrix for the best model (based on accuracy)

best_model_index = results_df['Accuracy'].idxmax()
best_model_cm = results_df.iloc[best_model_index]['Confusion Matrix']

plt.figure(figsize=(6,6))
sns.heatmap(best_model_cm, annot=True, fmt="d", cmap='Blues', xticklabels=iris.target_names,
yticklabels=iris.target_names)
plt.title(f'Confusion Matrix of {results_df.iloc[best_model_index]["Model"]}')

plt.ylabel('True Label')

plt.xlabel('Predicted Label')
plt.show()

Output

You might also like