Machine Learning Report
Machine Learning Report
Submitted To:
Md. Rashadur Rahman
Lecturer,
Department of CSE, CUET
Hasan Murad
Lecturer,
Department of CSE, CUET
i
Abstract
Machine learning algorithms have become indispensable in our modern world, where data is
the king. They can be used to extract valuable insights and make informed decisions about a
wide range of topics. This comprehensive report takes a deep dive into the intricacies of five
distinct machine learning algorithms: Apriori, Multivariable Linear Regression, Decision
Trees, K-Means Clustering, and Artificial Neural Networks (ANN). We'll break down how
each of these methods works, where they're useful, what they're good at, and what they're not
so good at. The algorithms are described in detail, with examples of how they can be used in
practice. The strengths and weaknesses of each algorithm are discussed, so that can make
informed decisions about which algorithm to use for a particular task.
1 Introduction
Machine learning is a rapidly evolving field that lies at the intersection of computer science
and artificial intelligence (AI). It provides computers with the ability to learn from data patterns
and experiences, continuously improving their performance without requiring explicit
programming. In essence, it enables machines to make predictions, recognize patterns, and
autonomously solve complex problems. As this field continues to progress, grasping its
principles and applications becomes increasingly crucial in our data-centric world, where it
plays a transformative role [1].
In our lab we have implemented five fundamental machine learning algorithms: Apriori,
Multivariable Linear Regression, Decision Trees, K-Means Clustering, and Artificial Neural
Networks (ANN). Each algorithm possesses unique characteristics and is used to solve specific
types of problems.
Apriori is great for spotting patterns in transaction data, while multivariable linear regression
helps us understand how different things relate. K-means clustering groups similar data points
together, and decision trees are handy for classifying and predicting. Artificial neural networks
(ANNs) are advanced models used for tricky tasks like recognizing images and understanding
language. Each algorithm has its own special traits, making them valuable tools for a variety
of fields [1].
Machine learning is a powerful tool for examining data and making intelligent choices. When
we grasp the various sorts of machine learning methods and their pros and cons, we can select
1
the most suitable one for a particular task. Since machine learning is a rapidly changing area,
keeping up to date with the latest advancements is crucial.
The Apriori method is a recognized algorithm for association rule mining. It begins by
identifying all the frequent itemset, which are collections of items that appear together in a
dataset with a specific minimum frequency. The algorithm may produce association rules from
the frequent itemset once they have been identified.
The initial step in the Apriori algorithm's operation is to identify all 1-itemsets, or the items
that are present in the dataset. Then, it identifies all frequent 2-itemsets, or sets of two things
that happen together in a transaction with a minimal frequency. For each of the k-itemset where
k is the total number of items in the dataset, then this process is repeated [2].
Finding frequent item sets may be done simply and effectively using the Apriori method. It
may be used to mine huge datasets and is comparatively simple to put into practice. For huge
datasets, it could, however, be computationally costly.
2.1.1 Dataset
The dataset is comprised with 20 columns and 7500 rows. Each column denotes an item or
product that can be found in each row, which denotes a transaction or record.
The code is implemented by using the built in library of apriori algorithm of python. Apriori
can be easily imported in code by using this command “from apyori import apriori”
2
• The dataset's cells are iterated over in a nested loop where i stands for the row
index and j for the column index.
• The item (if present) is appended to the transaction list for each cell in the
dataset.
➢ Apriori Algorithm:
• To do association rule mining using the Apriori algorithm, the apyori library is
loaded.
• To create association rules, the apriori function is used:
• Transactions specification: The transaction list, which includes the individual
item entries, is where this is set.
• The minimal level of support that must be attained for an itemset to be deemed
frequent is specified by the min_support option. It is set at 0.003, meaning that
for an itemset to be deemed frequent, it must occur in at least 0.3% of the
transactions.
• The minimal confidence level for association rules is specified by the
min_confidence option. It is set to 0.2, meaning that a rule must have a
confidence level of at least 20% to be taken into account.
• The min_lift option provides the association rules' minimal lift threshold. The
value is set to 3, which denotes that a rule
2.1.3 Performance Evaluation
The list of association rules produced by the Apriori algorithm is stored in the results variable.
It will show the association rules that were found together with their confidence, and support
values.
Figure 02: Association rules with their confidence and support values
3
2.2 Multivariable Linear Regression
𝑦 = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑛 𝑥𝑛
Where:
The values of the coefficients that minimize the difference between what was expected, and
the actual target values are found through the iterative optimization process known as gradient
descent. This error is frequently expressed as the mean squared error. The coefficients are
updated iteratively by the algorithm from beginning values to get closer to the ideal values [3].
According to the following updating rule, each coefficient wi:
𝑚
1
𝑤𝑖 = 𝑤𝑖 − 𝛼 ∗ ∑(𝑦𝑗 − 𝑦̂𝑗 ) ∗ 𝑥𝑖𝑗
𝑚
𝑗=1
2.2.1 Dataset
For the multivariable linear regression, “property listing data in Bangladesh” is used. There are
almost 10 features and a one target column named ‘price’.
4
2.2.2 Implementation
In the above dataset, 'title', 'adress', 'purpose', 'flooPlan', 'url', 'lastUpdated' features are
unnecessary to our predicted target value. That’s why these columns are dropped initially. After
that, from ‘types’ features building and duplex rows are removed. Then, ‘type’ column is
removed. After that, some preprocessing is done on beds, baths area and price columns. After
that, z-normalization is used to normalize the data. Data is splitted for 20% test data, 20%
validation data and rest of the 60% is for training data.
Hyperparameters:
• Learning Rate: The learning rate was set to 0.01. Each gradient descent iteration's step
size is based on the learning rate.
• Number of Epoch: We specified a maximum of 1000 iterations for the number of
iterations (iter_number). Gradient descent continues until convergence or this many
iterations.
Cost Function:
In order to calculate the error of our model, we constructed a cost function. The mean squared
error (MSE) between the anticipated values and the actual target values is calculated using the
cost function.
Gradient Descent:
The gradient descent technique serves as the foundation of our implementation. To reduce the
cost function, the weight vector is updated repeatedly. We determine the gradient of the cost
function with respect to the weights for each iteration and change the weights as necessary.
Model Training:
We used the gradient_descent function to train the model, feeding it training data along with
starting weights, learning rate, and other hyperparameters. 1000 number of epoch is ran over
the model to train.
5
Figure 04: Training loss and validation loss per epoch
2.2.3 Performance Evaluation
A popular statistic for assessing how well regression models fit data is the Mean Squared Error
(MSE). The average squared difference between the goal values as intended and the projected
values is quantified. Better model performance is indicated by lower MSE values. We got about
11% test loss from our model.
Where,
Another crucial statistic for regression models is R-squared (R2). It calculates the percentage
of the dependent variable's variation that can be accounted for by the model's independent
variables. Higher numbers suggest a better fit, and R-squared values range from 0 to 1. If the
model accurately predicts the target variable, then the value is 1. The model score is about
76.8%.
𝑆𝑆𝑅
𝑅2 = 1 −
𝑆𝑆𝑇
Where,
6
• 𝑆𝑆𝑇 = ∑𝑛𝑖=1(𝑦𝑖 − 𝑦̅)2
Unlabeled data points are grouped into clusters using the unsupervised machine learning
process of clustering. Finding groupings of data points that are similar to one another and
distinct from data points in other clusters is the aim of clustering.
➢ k_means () function
• By randomly choosing k data points from the input data x without replacement, the
cluster centers are started. The first cluster centers are these random data points.
• A loop is ran for a maximum of 100 iterations
• Each data point in x is assigned to the closest cluster center based on Euclidean distance
by using the “calc_cluster” function.
• Based on the data points provided to each cluster, the “update_centers” function
determines new cluster centers.
7
• If the centers are same as the previous centers then the loop breaks otherwise, centers
are updated with the “new_centers”, and the next iteration begins.
➢ update_centers () function:
Based on the data points provided to each cluster, this function determines new cluster centers.
This function calculates the cluster assignments for each data point based on the Euclidean
distance between the data points and the cluster centers.
Cluster Result:
We have chosen 2 clusters for this data set. The 3D plot of the cluster with respective data
points is shown below:
8
2.3.3 Performance Evaluation
The above k-means model gives a good interpretation on two clusters of 15 3D data points.
If we divide this dataset into more than two clusters, then data shows a scattered in plot.
There are 9 points in the cluster number one and the rest of the 6 points are in cluster number
two.
A decision tree is a supervised machine learning tool used for both classification and
regression tasks. It works by breaking down data into smaller groups, assigning each to a
specific class or value. This tree-like structure consists of nodes representing feature tests
and branches showing the test results. The final classifications or values are found in the
leaf nodes. To make predictions, the tree is traversed from the root to a leaf, following
branches that match the data's characteristics, leading to a prediction located at the leaf node
[4].
2.4.1 Dataset
For the implementation of decision tree algorithm, we use ‘Play Tennis’ dataset. There are
three categorical and one numerical feature in the dataset. Categorical features are outlook,
temperature, humidity. Numerical feature is wind speed in mph. By analyzing this data we
have to predict whether a person can play tennis or not.
The numerical data then converted into the categorical data by sort of preprocessing.
2.4.2 Implementation
9
• The entropy of the dataset is first calculated before building the decision tree. An
indicator of impurity or disorder in a dataset is entropy. It serves as the dividing
condition in this code. A dataset that is more homogenous has a lower entropy.
❖ Information Gain Calculation (infoGain):
• Information gain quantifies the decrease in entropy attained by dataset segmentation
according to a certain property (feature). It measures the amount of information about
the target variable (in this case, "Play") that is acquired by employing a certain
characteristic for splitting.
• The algorithm determines the information gain by comparing the entropy of the
dataset before and after the split for each feature in the dataset. The node for splitting
is determined by which characteristic yields the most information gain.
❖ Attribute Selection (best_tree):
• Using an iterative process, the best_tree function chooses the feature that maximizes
information gain from among all those that are accessible. For the current node of
the decision tree, this chosen characteristic serves as the splitting attribute.
❖ Tree Construction (tree):
• The decision tree is built using the recursive tree function.
• Base case: It returns the value of the target variable ('Play') as a leaf node if all data
points in the current node have the same value for it.
• Recursive case: If the target variable in the current node has more than one value,
the method uses ‘best_tree’ function to choose the best splitting attribute.
• After that, the function builds a subtree for each distinct value of the chosen
characteristic. Based on this property, the dataset is separated into subgroups, and
the procedure is repeated for each subset. Subtrees that are created are joined to the
current node.
2.4.3 Performance Evaluation
The decision tree model over play tennis dataset gives almost 100% accuracy on the test
data. From the decision tree model, we can see that ‘outlook’ feature has a great impact. If
the outlook of the weather is overcast, then it directly predicts play tennis as ‘yes’.
11
• Forward Propagation (forward_propagation):
This function calculates the neural network's forward pass given input data X and
initialized weights and biases. Using the sigmoid activation for the hidden layer and the
softmax activation for the output layer, it determines the hidden layer activations (a)
and the final class probabilities (y_cap).
• Backpropagation (back_propagation):
Using backpropagation, this program calculates gradients for weight and bias updates.
With regard to the weights and biases in the hidden and output layers, it computes the
gradients of the loss. Equations are:
𝜕𝐿
𝑤𝑖 = 𝑤𝑖 − 𝛼 ∗
𝜕𝑤𝑖
𝜕𝐿
𝑏𝑖 = 𝑏𝑖 − 𝛼 ∗
𝜕𝑏𝑖
• Loss Function (loss_function):
The cross-entropy loss between the anticipated and actual class probabilities is
calculated using the loss function. It gauges how effectively the model is working.
• SoftMax and SoftMax derivative activation functions:
The softmax function, which is frequently employed for multiclass classification
applications, is calculated by softmax(z). It transforms a vector z of scores into
probability distributions across several classes.
The softmax function's derivative is calculated by softmax_derivative(z). When dealing
with multiclass classification, it is utilized in backpropagation.
• Loop for training (train):
The train function applies stochastic gradient descent (SGD) to train the neural network.
The gradients are produced using backpropagation, and they are used to iteratively
update the weights and biases.
Each cycle includes a computation of the loss and a history of that loss.
The factors that regulate the training process are the learning rate (lr) and the quantity
of training epochs (epochs).
2.5.3 Performance Evaluation
12
Figure 09: Loss per thousand epochs
The model overall accuracy for the test dataset is almost 83.33%
3 Discussion
Machine learning algorithms are effective tools for prediction and data analysis. The problem
determines the algorithm to use. These algorithms have been applied in a variety of industries
to offer information and support decision-making.
We have implemented five major machine learning algorithms in the lab. Each algorithms have
some strengths and weaknesses and specific field of applications.
➢ Apriori: Apriori specializes at finding common item sets in transactional data, ideal for
market basket analysis and recommendations. However, it can struggle with large and high-
dimensional datasets, potentially missing complex patterns. It finds applications in retail
for better product recommendations, healthcare for treatment analysis, and web analytics
for understanding user click behavior [6].
➢ Multivariable Linear Regression: Multivariable Linear Regression offers simplicity and
the ability to model interactions between multiple variables, making it ideal for forecasting
outcomes influenced by many factors. However, it assumes linear relationships, which
might not always be accurate and can be sensitive to outliers and multicollinearity. It's
13
widely used in fields like economics, medicine, and marketing for tasks like GDP
prediction, patient outcome forecasting, and sales projection.
➢ K-Means Clustering: K-Means Clustering is great for handling large datasets efficiently
by grouping similar data points. Yet, it requires knowing the number of clusters in advance
and can be sensitive to initial cluster choices, struggling with irregular clusters. It finds
practical use in customer segmentation for market analysis, image compression for efficient
storage, and anomaly detection to detect fraud by identifying unusual data patterns [7].
➢ Decision Tree: Decision Trees are adaptable, able to do both classification and regression
tasks, and able to handle different types of data. Their drawbacks include a propensity to
overfit complicated trees, sensitivity to small data changes, and a potential inability to grasp
links that go beyond hierarchical splits. They are useful in recommendation systems, credit
risk analysis, and medical diagnostics.
➢ ANN: Artificial Neural Networks (ANNs) specializes at processing complex, high-
dimensional data, making them ideal for tasks like image and text analysis. However, they
require substantial data and computational resources, and their intricate structure can be
hard to understand. Nevertheless, ANNs find extensive use in image recognition, natural
language processing, autonomous vehicles, and various fields that demand complex data
analysis and decision-making.
Every machine learning algorithm comes with its unique advantages and limitations, rendering
them appropriate for particular tasks and fields. The selection of the most suitable algorithm
hinges on the characteristics of the data and the specific problem being addressed.
4 Conclusion
Machine learning algorithms are powerful tools that can be used to analyze and predict data.
The best algorithm for a particular problem depends on the specific circumstances, so it is
important to carefully evaluate the strengths and weaknesses of each algorithm before applying
any algorithm to any specific task. We have implemented five important machine learning
algorithms throughout the lab. This implementation gives crystal clear and in-depth knowledge
for each of the algorithms. We have learned how mathematical calculations and statistical
formulas are applied to build a machine learning algorithm. This hands-on experience of
implementing machine learning from scratch gives as a proper understanding of machine
learning model and their uses.
14
5 References
[1] I. H. Sarker, "Machine Learning: Algorithms, Real-World Applications and Research,"
SN Compputer Science, vol. III, p. 160, 2021.
[2] M. K. J. P. Jiawei Han, Data Mining Concepts and Techniques, The Morgan Kaufmann,
2011.
[3] M. Badole, "Mastering Multiple Linear Regression: A Comprehensive Guide," Analytics
Vidhya, 1 May 2021. [Online]. Available:
https://www.analyticsvidhya.com/blog/2021/05/multiple-linear-regression-using-python-
and-scikit-learn/.
[4] IBM, "Decision Trees," IBM, [Online]. Available: https://www.ibm.com/topics/decision-
trees#:~:text=data%20mining%20solutions-
,Decision%20Trees,internal%20nodes%20and%20leaf%20nodes..
[5] G. Singh, "Introduction to Artificial Neural Networks," Analytics Vidhya, 6 September
2021. [Online]. Available: https://www.analyticsvidhya.com/blog/2021/09/introduction-
to-artificial-neural-networks/.
[6] P. S, "Underrated Apriori Algorithm Based Unsupervised Machine Learning," Analytics
Vidhya, 29 January 2022. [Online]. Available:
analyticsvidhya.com/blog/2022/01/underrated-apriori-algorithm-based-unsupervised-
machine-learning/.
[7] B. M. Banoula, "K-means Clustering Algorithm: Applications, Types, & How Does It
Work?," Simple Learn, 23 April 2023. [Online]. Available:
https://www.simplilearn.com/tutorials/machine-learning-tutorial/k-means-clustering-
algorithm.
15
6 Appendices
➢ Apriori Algorithm (Source Code)
1. import pandas as pd
2. import numpy as np
3. import matplotlib.pyplot as plt
4. data = pd.read_csv("store_data.csv")
5. data.shape
6.
7. data.head()
8.
9. transaction = []
10. for i in range(0, data.shape[0]):
11. for j in range(0, data.shape[1]):
12. transaction.append(data.values[i,j])
13.
14. pip install apyori
15.
16. from apyori import apriori
17. rules= apriori(transactions= transactions,
min_support=0.003, min_confidence = 0.2, min_lift=3,
min_length=2, max_length=2)
18.
19. results= list(rules)
20. results
16
16. df.drop(["type"], axis=1, inplace=True)
17. df
18. # Working with "beds", "bath" and "area columns"
19. df['beds'] = df['beds'].str.replace(' Bed',
'').astype(int)
20. df['bath'] = df['bath'].str.replace(' Bath',
'').astype(int)
21. df['area'] = df['area'].str.replace(' sqft',
'').str.replace(',', '').astype(int)
22. df.head(3)
23. df.dtypes
24. # Working with "price columns"
25. price_value = {'Thousand': 1000, 'Lakh': 100000}
26. def convert_price(value):
27. number, unit = value.split()
28. return float(number)*price_value[unit]
29. df['price'] = df['price'].apply(convert_price)
30. df.head()
31. df.dtypes
32. df.isnull().sum()
33. """### Normalizing the data"""
34. # Using "z-normalization"
35. data=pd.DataFrame(df)
36. mean=data.mean()
37. std=data.std()
38. df_norm=(data-mean)/std
39. df_norm.head()
40. df_norm.loc[7000]
41. shuffled_df = df_norm.sample(frac=1, random_state=10)
42. shuffled_df.head()
43. df_final=shuffled_df
44. df_final.head()
45. values = df_final.values
46. values
47. x = values[:, :-1]
48. y = values[:, -1]
49. x,y
50. len(x)
17
51. total_size = len(x)
52. train_size = int(0.6 * total_size)
53. val_size = int(0.2 * total_size)
54. test_size = total_size - train_size - val_size
55. total_size,train_size,test_size,val_size
56.
57. x_train = x[:train_size]
58. y_train = y[:train_size]
59. x_val = x[train_size:train_size+val_size]
60. y_val = y[train_size:train_size+val_size]
61. x_test = x[train_size+val_size:]
62. y_test = y[train_size+val_size:]
63.
64. len(x_train), len(x_val), len(x_test)
65.
66. x_train
67. len(y_train), len(y_val), len(y_test)
68. y_train
69. # bias term column
70. X_train=np.column_stack((np.ones(len(x_train)), x_train))
71. X_val=np.column_stack((np.ones(len(x_val)), x_val))
72. X_test=np.column_stack((np.ones(len(x_test)), x_test))
73.
74. learning_rate=0.01
75. iter_number=1000
76. weight = np.zeros(X_train.shape[1])
77. def cost_function(x, y, weight):
78. m=len(y)
79. h=x.dot(weight)
80. cost=(1/(2*m))*np.sum((h-y)**2)
81. return cost
82. def gradient_descent(x_train, y_train, weight, alpha,
iter_number, x_val, y_val):
83. m=len(y_train)
84. for i in range(iter_number):
85. h=x_train.dot(weight)
86. error= h-y_train
87. gradient=(alpha/m)*x_train.T.dot(error)
18
88. weight -= gradient
89. train_loss= cost_function(x_train, y_train,
weight)
90. val_loss = cost_function(x_val, y_val, weight)
91. print("Epoch",i+1,"Training
Loss:",train_loss,"Validation Loss:", val_loss)
92. return weight
93. weight = gradient_descent(X_train, y_train, weight,
learning_rate, iter_number, X_val, y_val)
94. print("Final Weight:", weight)
95.
96. def model_evaluation(X, y, weight):
97. cost = cost_function(X, y, weight)
98. return cost
99. # Evaluating model on test dataset
100. print("Test Loss:",model_evaluation(X_test, y_test,
weight))
101. # Calculating model score
102. def model_score(x, y, weight):
103. y_pred = x.dot(weight)
104. sst = np.sum((y - np.mean(y))**2)
105. ssr = np.sum((y - y_pred)**2)
106. score = 1 - (ssr/sst) # R-Squared
107. return score
108. score = model_score(X_test, y_test, weight)
109. print("Model score:", score)
19
11. np_data = scaler.fit_transform(np_data)
12. df = pd.DataFrame(np_data, columns=['X', 'Y', 'Z'])
13. df.head()
14. df.values
15. def calc_cluster(x, centers):
16. dist = np.sqrt(np.sum((x[:, np.newaxis] -
centers)**2, axis=2))
17. return np.argmin(dist, axis=1)
18. def update_centers(x, cluster, k):
19. new_centers = np.zeros((k, x.shape[1]))
20. for i in range(k):
21. cluster_points=x[cluster==i]
22. new_centers[i]=cluster_points.mean(axis=0)
23. return new_centers
24. def k_means(x, k):
25. centers=x[np.random.choice(x.shape[0], k,
replace=False)]
26. for i in range(100):
27. cluster=calc_cluster(x, centers)
28. new_centers=update_centers(x, cluster, k)
29. if np.all(centers==new_centers):
30. break
31. centers = new_centers
32. return cluster, centers
33. x = df.values
34. k = int(input("Enter Number of Clusters:"))
35. clusters, centers = k_means(x, k)
36. print(clusters,centers,end='\n')
37. # Plotting in 3D
38. fig = plt.figure(figsize=(15, 12))
39. ax = fig.add_subplot(111, projection='3d')
40. ax.scatter(x[:, 0], x[:, 1], x[:, 2], c=clusters)
41. ax.scatter(centers[:, 0], centers[:, 1], centers[:, 2],
42. marker='X', color='red')
43. ax.set_xlabel('X')
44. ax.set_ylabel('Y')
45. ax.set_zlabel('Z')
46. plt.title('K-means Clustering')
20
47. plt.show()
21
35. entropy_val = calc_entropy(data)
36. n = len(data)
37. unique_val = data[play_class].unique()
38. weighted_entropy = 0
39.
40. for value in unique_val:
41. subset = data[data[play_class]==value]
42. p = len(subset) / n
43. weighted_entropy += p*calc_entropy(subset)
44. info_gain = entropy_val - weighted_entropy
45. return info_gain
46. def best_tree(data, features):
47. max_gain = 0
48. best_node = None
49. for feature in features:
50. gain = infoGain(data, feature)
51. if gain > max_gain:
52. max_gain = gain
53. best_node = feature
54. return best_node
55. def tree(data, features):
56. if len(data['Play'].unique()) == 1:
57. return data['Play'].iloc[0]
58. best_node = best_tree(data, features)
59. ttree = {best_node: {}}
60. for value in data[best_node].unique():
61. subset = data[data[best_node] ==
value].drop(columns=[best_node])
62. subtree = tree(subset, features.drop(best_node))
63. ttree[best_node][value] = subtree
64. return ttree
65. DT = tree(train, x)
66. DT
67. def predict(instance, tree):
68. if isinstance(tree, str):
69. return tree
70. x = next(iter(tree))
71. val = instance[x]
22
72. sub_tree = tree[x][val]
73. return predict(instance, sub_tree)
74. pred_data = {'Outlook': 'Rain', 'Temperature': 'Hot',
'Humidity':'High', 'Wind':'Gale'}
75. play = predict(pred_data, DT)
76. print("Predicted class:", play)
77. pred_data = {'Outlook': 'Rain', 'Temperature': 'Hot',
'Humidity':'High', 'Wind':'Normal'}
78. play = predict(pred_data, DT)
79. print("Predicted class:", play)
80. def accuracy(test_set, decision_tree):
81. correct_predictions = 0
82. total_instances = len(test_set)
83. for i, instance in test_set.iterrows():
84. predicted_class = predict(instance,
decision_tree)
85. if predicted_class == instance['Play']:
86. correct_predictions += 1
87. ac = correct_predictions/total_instances
88. return ac
89. ac = accuracy(test, DT)
90. print("Accuracy:", ac)
23
14. std = np.std(X, axis=0)
15. X_normalized = (X - mean) / std
16. return X_normalized
17. X_train=normalize(X_train)
18. X_test=normalize(X_test)
19. def sigmoid(x):
20. return 1/(1 + np.exp(-x))
21.
22. def sigmoid_derivative(x):
23. return sigmoid(x) * (1 - sigmoid(x))
24. def softmax(z): # for multiclass classification
25. exp_z=np.exp(z)
26. return exp_z/np.sum(exp_z, axis=0, keepdims=True)
27.
28. def softmax_derivative(z):
29. s = softmax(z)
30. dZ = s*(1-s)
31. return dZ
32. def initialize_parameters(in_dim, hid_dim, out_dim):
33. np.random.seed(1)
34. w_hid=np.random.randn(hid_dim, in_dim) * 0.01
35. b_hid=np.zeros((hid_dim, 1))
36. w_out=np.random.randn(out_dim, hid_dim) * 0.01
37. b_out=np.zeros((out_dim, 1))
38. return w_hid, b_hid, w_out, b_out
39. def loss_function(A_output, y):
40. epsilon = 1e-10
41. m = y.shape[1]
42. loss = -1/m * np.sum(y * np.log(A_output + epsilon))
43. return loss
44. def forward_propagation(X, w_hid, b_hid, w_out, b_out):
45. z_hid = np.dot(w_hid, X.T) + b_hid
46. a = sigmoid(z_hid)
47. z_out=np.dot(w_out, a) + b_out
48. y_cap = softmax(z_out)
49. return y_cap, a
50. def back_propagation(X, y, y_cap, w_out, a):
51. k = X.shape[0]
24
52. dz_out = y_cap - y.T
53. dw_out=np.dot(dz_out, a.T)/k
54. db_out=np.sum(dz_out, axis=1, keepdims=True)/k
55. dz_hid=np.dot(w_out.T, dz_out) *
sigmoid_derivative(a)
56. dw_hid=np.dot(dz_hid, X)/k
57. db_hid=np.sum(dz_hid, axis=1, keepdims=True)/k
58. return dw_hid, db_hid, dw_out, db_out
59. loss_history = []
60. def train(X, y, hid_layer, lr, epochs):
61. input_size = X.shape[1]
62. output_size = y.shape[1]
63.
64. w_hid, b_hid, w_out, b_out =
initialize_parameters(input_size, hid_layer, output_size)
65.
66. for epoch in range(epochs):
67. y_cap, a = forward_propagation(X, w_hid, b_hid,
w_out, b_out)
68. loss = loss_function(y_cap.T, y)
69. loss_history.append(loss)
70.
71. dw_hid, db_hid, dw_out, db_out =
back_propagation(X, y, y_cap, w_out, a)
72.
73. w_hid-=lr*dw_hid
74. b_hid-=lr*db_hid
75. w_out-=lr*dw_out
76. b_out-=lr*db_out
77.
78. if epoch % 1000 == 0:
79. print(f"Epoch {epoch}, Loss: {loss}")
80.
81. return w_hid, b_hid, w_out, b_out
82. lr = 0.1
83. epochs=5000
84. hid_layer=32
85.
25
86. weights_hidden, biases_hidden, weights_output,
biases_output = train(X_train, y_train, hid_layer, lr, epochs)
87. def predict(X, weights_hidden, biases_hidden,
weights_output, biases_output):
88. A_output, _ = forward_propagation(X, weights_hidden,
biases_hidden, weights_output, biases_output)
89. predictions = np.argmax(A_output, axis=0)
90. return predictions
91.
92. predictions = predict(X_test, weights_hidden,
biases_hidden, weights_output, biases_output)
93. def calculate_accuracy(y_true, y_pred):
94. accuracy = np.mean(y_true == y_pred) * 100
95. return accuracy
96. test_accuracy = calculate_accuracy(np.argmax(y_test,
axis=1), predictions)
97. print("Test Accuracy:", test_accuracy)
98. import matplotlib.pyplot as plt
99. plt.plot(range(epochs), loss_history)
100. plt.xlabel('No. of Epochs')
101. plt.ylabel('Loss')
102. plt.title('Training Loss Curve')
103. plt.show()
26