Ml record_merged (1)
Ml record_merged (1)
Ml record_merged (1)
CERTIFICATE
ACADEMIC YEAR:20___-20___
Program:
import statistics
def ctm(data):
if not data:
mean = statistics.mean(data)
median = statistics.median(data)
try:
mode = statistics.mode(data)
except statistics.StatisticsError:
variance = statistics.variance(data)
sd = statistics.stdev(data)
print(f"mean: {mean}")
print(f"median: {median}")
print(f"mode: {mode}")
print(f"varience: {variance}")
if __name__ == "__main__":
data = [10,20,30,40,40,50,60,70,80,90]
ctm(data)
Output:
mean: 49
median: 45.0
mode: 40
varience: 676.6666666666666
standard division: 26.01281735350223
2) Study of Python Basic Libraries such as Statistics, Math, Numpy and Scipy
1. Statistics Library
The statistics module in Python provides functions for calculating mathematical statistics of
numeric data.
Key Features:
Example:
data = [1, 2, 2, 3, 4]
print("Mean:", stats.mean(data))
print("Median:", stats.median(data))
print("Mode:", stats.mode(data))
2. Math Library
The math module provides access to mathematical functions defined by the C standard.
Key Features:
Example:
import math
3. NumPy
Key Features:
Arrays:
o numpy.array(): Create arrays.
o numpy.zeros(shape): Create an array of zeros.
o numpy.ones(shape): Create an array of ones.
Mathematical Operations:
o Element-wise operations on arrays (+, -, *, /).
o Linear algebra functions (dot, cross, linalg).
Statistical Functions:
o numpy.mean(): Mean of array elements.
o numpy.std(): Standard deviation.
o numpy.median(): Median value.
Indexing and Slicing:
o Access specific elements or subarrays.
Example:
import numpy as np
4. SciPy
SciPy builds on NumPy and provides additional functionality for scientific computing.
Key Features:
Optimization:
o scipy.optimize.minimize(): Minimize a scalar function.
Integration:
o scipy.integrate.quad(): Numerical integration.
Linear Algebra:
o scipy.linalg.solve(): Solve linear systems.
Statistics:
o scipy.stats: Statistical distributions and functions.
Signal Processing:
o scipy.signal: Signal processing utilities.
Example:
data = [1, 2, 2, 3, 4]
print("Mode:", stats.mode(data).mode[0])
Applications:
Python libraries like Pandas and Matplotlib are essential for Machine Learning (ML)
applications, as they help with data manipulation, analysis, and visualization. Here’s a detailed
overview:
1. Pandas
Pandas is a powerful library for data manipulation and analysis. It provides data structures like
Series and DataFrame, which are widely used in ML for preprocessing and exploration.
Key Features:
Data Structures:
o Series: One-dimensional labeled array.
o DataFrame: Two-dimensional labeled data structure (like a table).
Data Manipulation:
o read_csv(), read_excel(): Load datasets from files.
o to_csv(), to_excel(): Save datasets to files.
o Filtering and indexing using .loc[] and .iloc[].
Data Cleaning:
o Handling missing values: dropna(), fillna().
o Duplicates: drop_duplicates().
Data Analysis:
o Aggregation: groupby(), pivot_table().
o Statistical methods: .mean(), .std(), .describe().
Integration with ML Libraries:
o Easily convert DataFrames to NumPy arrays or directly use them in ML libraries
like Scikit-learn.
Example:
import pandas as pd
# Load data
data = pd.read_csv('sample.csv')
# Basic statistics
print(data.describe())
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations.
It is often used to visualize data trends and patterns in ML.
Key Features:
Basic Plotting:
o plot(): Line plots.
o scatter(): Scatter plots.
o bar(): Bar charts.
Customizations:
o Title, labels, and legends: title(), xlabel(), ylabel(), legend().
o Colors, markers, and styles.
Subplots:
o Multiple plots in one figure: subplot().
Integration:
o Works seamlessly with Pandas: Direct plotting from DataFrames.
Example:
# Sample data
x = [1, 2, 3, 4, 5]
y = [2, 3, 5, 7, 11]
# Line plot
plt.plot(x, y, label='Line Plot')
plt.title('Basic Plot')
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.legend()
plt.show()
The combination of Pandas and Matplotlib is particularly useful in the exploratory data analysis
(EDA) phase of ML, where you examine your dataset to identify trends, correlations, and
anomalies.
import pandas as pd
import matplotlib.pyplot as plt
# Load dataset
data = pd.read_csv('sample.csv')
# Scatter plot
plt.scatter(data['Height'], data['Weight'], color='blue', alpha=0.5)
plt.title('Height vs. Weight')
plt.xlabel('Height')
plt.ylabel('Weight')
plt.show()
Applications in ML:
1. Pandas:
o Preprocessing: Data cleaning, normalization, and transformation.
o Feature engineering: Creating new features from existing ones.
o Handling time-series data.
2. Matplotlib:
o Visualizing data distributions, trends, and outliers.
o Understanding relationships between features.
o Plotting model performance (e.g., training and validation loss curves)
4)Write a Python program to implement Simple Linear Regression
PROGRAM
Example as pizza
import statistics
#defining sizes
pizza_sizes = []
pizza_sizes.append(size_of_pizza)
print(size)
#defining prizes
prizes = []
prizes.append(no_of_prizes)
print(prize)
#mean of sizes(x)n
x_mean = statistics.mean(pizza_sizes)
print(f"mean of sizes:{x_mean}")
#mean of costs(y)
y_mean = statistics.mean(prizes)
print(f"mean of prizes:{y_mean}")
#diviation of sizes
diviation_sizes = []
dev_x = pizza_sizes[i]-x_mean
diviation_sizes.append(dev_x)
print("diviation of x: ")
print(dev_s)
#diviation of prizes
diviation_prizes = []
dev_y = prizes[i]-y_mean
diviation_prizes.append(dev_y)
print("diviation of y:")
print(dev_p)
#product of divitions(pod)
product_of_diviation = []
for i in range (data_items_collected):
product_of_diviation.append(p_o_d)
print("product_of_diviation")
print(pod)
#sum of product
sum_of_product_of_diviation = sum(product_of_diviation)
#square_of_diviation_of_sizes
square_of_diviation_of_sizes = []
sod = diviation_sizes[i]** 2
square_of_diviation_of_sizes.append(sod)
print("square_of_diviation_of_sizes:")
print(sodos)
#sum of square_of_diviation_of_sizes_x(sod)
sum_of_square_of_diviation_of_sizes_x = sum(square_of_diviation_of_sizes)
#m vlaue
slopp_m = sum_of_product_of_diviation/sum_of_square_of_diviation_of_sizes_x
print(f"m:{slopp_m}")
# Y = m * mean of x = > (mean of y)-m * (mean of x) final output is denoted as flag
print(f"flag value:{flag}")
#doing a pridiction
OUTPUT
10
12
10
13
16
mean of sizes:10
mean of prizes:13
diviation of x:
-2
diviation of y:
-3
product_of_diviation
square_of_diviation_of_sizes:
m:1.5
flag value:-2.0
Program:
import numpy as np
import pandas as pd
# Sample Data
data = {
df = pd.DataFrame(data)
print(df)
y = df[['prize']]
model = LinearRegression()
model.fit(X, y)
# User Input
sq_feet = float(input("Enter the area of the plot: "))
# Predict Rent
# Print Prediction
print(f"Predicted rent for {sq_feet} square feet, {bdrooms} bedrooms, and {btrooms} bathrooms
is: ₹{predict_rent[0][0]:,.2f}")
Output:
6. Implementation of Decision tree using sklearn and its parameter tuning
Program
b_tech = ["cse","aiml","ece","eee","mech","civil"]
bse = ["mpcs", "mecs", "ba", "ca", "bba"]
X=[
[1, 450],
[1, 800],
[0, 0],
[1, 250],
[1, 600],
]
y = [0, 1, 2, 3, 1]
clf = DecisionTreeClassifier()
clf.fit(X, y)
if emcet_rank == 1 or emcet_rank == 0:
if prediction == 0:
print("You can apply for this courses::", b_tech[len(b_tech)//2:])
elif prediction == 1:
print("You can apply for this courses:", b_tech[:len(b_tech)//2])
elif prediction == 2:
print(bse)
elif prediction == 3:
else:
print("Invalid decision")
else:
print("Invalid input")
Output:
7. Implementation of KNN using sklearn
Program:
import numpy as np
marks_array = np.array(marks).reshape(-1, 1)
k=3
knn = KNeighborsClassifier(n_neighbors=k)
knn.fit(marks_array, grade)
print(f"Distances: {distances[0]}")
Output:
8. Implementation of Logistic Regression using sklearn
from sklearn.linear_model import LogisticRegression
import numpy as np
import pandas as pd
data = {
'in_study_hours' : [2,4,6,8,10,12],
'y_n':[0,0,1,1,1,1]
}
df = pd.DataFrame(data)
print(data)
x = df[['in_study_hours']]
y = df['y_n']
model = LogisticRegression()
model.fit(x,y)
if predict == 1:
print("pass")
else:
print("fail")
Output:
9. Implementation of K-Means Clustering
import numpy as np
import pandas as pd
}
df = pd.DataFrame(data)
print(df)
x = df[['age','amount']].values
nc = KMeans(n_clusters=3)
nc.fit(x)
test_data = np.array([[13,750]])
pridect = nc.predict(test_data)
Output:
10. Performance analysis of Classification Algorithms on a specific dataset (Mini Project)
Program:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target
# Standardize the features (important for some models like SVM, KNN)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# Initialize classifiers
models = {
"Logistic Regression": LogisticRegression(),
"KNN": KNeighborsClassifier(),
"SVM": SVC(),
"Random Forest": RandomForestClassifier()
}
y_pred = model.predict(X_test)
cm = confusion_matrix(y_test, y_pred)
results = []
best_model_index = results_df['Accuracy'].idxmax()
best_model_cm = results_df.iloc[best_model_index]['Confusion Matrix']
plt.figure(figsize=(6,6))
sns.heatmap(best_model_cm, annot=True, fmt="d", cmap='Blues', xticklabels=iris.target_names,
yticklabels=iris.target_names)
plt.title(f'Confusion Matrix of {results_df.iloc[best_model_index]["Model"]}')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()
Output