GK Deeplearning

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

Introduction to Deep Learning

Deep learning is a branch of machine learning which is based on artificial


neural networks. It is capable of learning complex patterns and relationships
within data. In deep learning, we don’t need to explicitly program everything. It
has become increasingly popular in recent years due to the advances in
processing power and the availability of large datasets. Because it is based on
artificial neural networks (ANNs) also known as deep neural networks (DNNs).
These neural networks are inspired by the structure and function of the human
brain’s biological neurons, and they are designed to learn from large amounts of
data.
Introduction to topic: Recognizing handwritten digits using deep learning.
In this article, we are going to demonstrate how to implement a neural network
from scratch by building a digit recognizer using MNIST dataset. MNIST
dataset is a large dataset of handwritten digits of dimensions 28x28. Below is a
sample taken form MNIST dataset.

1. MNIST Dataset:
o The MNIST dataset consists of 28x28-pixel grayscale images of
handwritten digits (0 to 9).
o It’s widely used for evaluating machine learning models in the
context of digit classification.
o The dataset contains 60,000 training images and 10,000 test
images.
o Each image represents a single digit, and the goal is to predict the
correct digit class.
2. Neural Networks (NN):
o A neural network is a computational model inspired by the human
brain’s interconnected neurons.
o It consists of layers of interconnected nodes (neurons) that process
and transform input data.
o Each neuron computes a weighted sum of its inputs, applies an
activation function, and produces an output.
o Neural networks can learn from data by adjusting their weights
during training.
3. Deep Learning:
o Deep learning refers to neural networks with multiple hidden
layers (deep architectures).
o These deep architectures allow for more complex representations
and better feature extraction.
o Common deep learning architectures include Convolutional
Neural Networks (CNNs) for image processing, Recurrent Neural
Networks (RNNs) for sequential data, and Transformers for
natural language processing.
4. Training a Neural Network:
o To train a neural network:
1. Define the architecture (number of layers, neurons per layer,
activation functions).
2. Prepare labeled training data (input features and
corresponding target labels).
3. Initialize weights randomly.
4. Use backpropagation and optimization algorithms (e.g.,
stochastic gradient descent) to adjust weights.
5. Iterate until the model converges or reaches a stopping
criterion.
o Training involves minimizing a loss function (difference between
predicted and actual outputs).
5. Common Activation Functions:
o ReLU (Rectified Linear Unit): (f(x) = \max(0, x))
o Sigmoid: (f(x) = \frac{1}{1 + e^{-x}})
{-x}}{e
o Tanh (Hyperbolic Tangent): (f(x) = \frac{e^x - e x + e^{-
x}})
6. Deep Learning Libraries:
o Python libraries like TensorFlow, Keras, and PyTorch provide
tools for building and training neural networks.
o Keras (now part of TensorFlow) offers a high-level API for
creating neural networks.
After the model is developed and trained, the model will be able to predict
handwritten digits from 0-9 from image provided by us. The output will be as

below:
Model Architecture

As one input in MNIST dataset is of shape 28x28(784) pixels, we take 784


inputs in the input layers. As we have 10 digits (0-9), this is a multiclass
classification task, so we have 10 outputs in the output layers. The overall
architecture is [784,128,64,32,10]. Relu Activation fuction is used in the hidden
layers, and softmax activation function is used in output layer as it is a
classification task.
Libraries
Let us implement all the required components of the neural network. The
libraries used are numpy to perform mathematical operations, matplotlib will be
used to plot the graph of the costs and pillow will be used to import the image
from user provided path.
import numpy as np
import PIL
import matplotlib.pyplot as plt
Initialize Weights and Biases
Let us define a function to initialize the all the parameters i.e. weights and
biases. The weights are initialized using He approximation(i.e W [l] =
np.random.randn(shape)*np.sqrt(2/n[l-1] ), l is the layer in NN, and n is number
of nodes in the layer) and biases are initialized to 0.
def initialize_parameters_deep(layer_dims):
np.random.seed(1)
parameters = {}
L = len(layer_dims)

for l in range(1,L):
parameters['W' + str(l)] = np.random.randn(layer_dims[l],
layer_dims[l-1]) * np.sqrt(2/layer_dims[l-1])
parameters["b"+str(l)]=np.zeros((layer_dims[l],1))
return parameters
Activation Functions
The activation functions we will be using are softmax and relu.
Let us the define relu and softmax along with their derivatives.
Relu
def relu(x):
x = np.maximum(0,x)
return x
Relu_Prime
def relu_prime(x):
x[x<=0] = 0
x[x>0] = 1
return x
Softmax
def softmax(x):
ans = np.exp(x)/np.sum(np.exp(x),axis=0)
return ans
Softmax_Prime
def softmax_prime(x):
ans = softmax(x)*(1-softmax(x))
return ans
Forward Propagation
Let us define the helper functions required for the forward propagation task
Let us define a linear_forward layer function with calculates z = w*a+b
in a layer of neural network.
def linear_forward(A,W,b):
Z=W.dot(A)+b
cache = (A, W ,b)
return Z,cache
Here A is the activations from previous layer, W is weight numpy array of size
(size of current layer,size of previous layer), b is bias numpy array of size (size
of current layer, 1) and cache is tuple of (A,W,b) which will be used during
backpropagation
Now we will implement deep_layer which will calculate the activations
according to the defined activation function as:
def deep_layer(A,W,b,activation):
Z,linear_cache = linear_forward(A,W,b)
if activation == 'softmax':
A = softmax(Z)
activation_cache = Z
elif activation == 'relu':
A = relu(Z)
activation_cache = Z
cache = (linear_cache,activation_cache)
return A,cache
Here cache contains linear_cache(A,W,b) from linear_forward and
activation_cache(Z) , which are going to be used during the back propagation.
Now let us use the above functions and implement forward propagation as:
def forward_pass(input_X,parameters):
caches=[]
depth = int(len(parameters)/2) # number of layers in the neural network
A = input_X
for l in range(1,depth):
A_prev = A
A,cache =
deep_layer(A_prev,parameters['W'+str(l)],parameters['b'+str(l)],'relu')
caches.append(cache)
A_last,cache =
deep_layer(A,parameters['W'+str(depth)],parameters['b'+str(depth)],'softmax')
caches.append(cache)
return A_last,caches
Here A_last is the output of the neural network after passing through all layers
and caches is the list of all the cache(A,W,b,Z) of each layer of neural network.
Thus, forward propagation part is complete
Cost Computation
Since this is a multiclass classification task, the loss function we use is
categorical cross-entropy
n

L L( y , ^y )=−∑ y i log ^y i
i=1

So, the cost is calculated as


m
1
- ∑ L¿ ¿ )
m i=1
Now, lets write a function to compute cost as:
def compute_cost(AL,Y):
Y= np.reshape(Y,(Y.shape[0],Y.shape[1]))
m = Y.shape[1]
cost =(-1/m)*(np.multiply(Y,np.log(AL))) #categorical_cross_entropy
cost = np.squeeze(cost)
return cost
Backpropagation
In backpropagation we have to calculate the gradients so as the update the
weight and bias parameters across the network using loss generated by the loss
function at output layer. After calculating 𝑑𝑍[𝑙]=∂L/∂𝑍[𝑙], we have to calculate
𝑑𝑊[𝑙],𝑑𝑏[𝑙],𝑑𝐴[𝑙−1] as

Let us write helper function to calculate the gradients as below


def linear_backward(dZ,cache):
A_prev , W, b = cache
m = A_prev.shape[1]
dW = (1/m)*np.dot(dZ,A_prev.T)
db = (1/m)*np.sum(dZ,axis=1,keepdims=True)
dA_prev = np.dot(W.T,dZ)
return dA_prev, dW, db

def backward_activation(dA, cache, activation):


linear_cache,activation_cache = cache
if activation == "relu":
dZ = dA*relu_prime(activation_cache)
dA_prev, dW,db = linear_backward(dZ,linear_cache)
elif activation == "softmax":
dZ = dA
dA_prev, dW,db = linear_backward(dZ,linear_cache)
return dA_prev, dW, db
In case of softmax at the output layer, to avoid the division by zero during
gradient calculation in backward pass, we compute dz = A_last-Y before hand
send it to the backward_activation function as dA_last_Z, whereas the dZ for
relu is calculated in backward_activation function itself.
Now, let us implement backward pass using the above helper functions.
def backward_pass(A_last,Y,caches):
grads={}
L = len(caches)
m = A_last.shape[1]
Y = Y.reshape(A_last.shape)
dA_last_Z = A_last-Y
current_cache = caches[L-1]
grads["dA"+str(L-1)],grads["dW"+str(L)],grads["db"+str(L)] =
backward_activation(dA_last_Z,current_cache,activation='softmax')
for l in reversed(range(L-1)):
current_cache = caches[l]
dA_prev_temp, dW_temp, db_temp =
backward_activation(grads['dA'+str(l+1)],current_cache,activation="relu")
grads["dA"+str(l)] = dA_prev_temp
grads["dW"+str(l+1)] = dW_temp
grads["db"+str(l+1)]=db_temp
return grads
Now, after all the gradients have been calculated, we have to updates all the
weights and biases across the network as below:
def update_parameters(parameters, grads, learning_rate):
depth = len(parameters) // 2
for l in range(depth):
parameters["W"+str(l+1)] = parameters["W"+ str(l+1)]-
learning_rate*grads['dW'+str(l+1)]
parameters["b"+str(l+1)] = parameters["b" + str(l+1)]-
learning_rate*grads['db'+str(l+1)]
return parameters
Compiling the model, Loading The DataSet and Preprocessing
We will be implementing mini batch gradient descent for our model as the
dataset is very large, to update the parameters after the training is complete on
the particular batch instead of waiting to update the parameters after an epoch
like in batch gradient descent.
def mini_batch_gradient_descent(X,Y,layer_dims,mini_batch_size,epochs,
learning_rate):
np.random.seed(1)
m = X.shape[1]
mini_batches = []
correct = 0
Y= np.reshape(Y,(1,Y.shape[0]))
Y = one_hot(Y,10)
#shuffling the data
permutation = list(np.random.permutation(X.shape[1]))
X_shuffled = X[:, permutation]
Y_shuffled = Y[:, permutation]
num_of_complete_batches = m // mini_batch_size
for i in range(num_of_complete_batches):
mini_batch_X = X_shuffled[:,i*mini_batch_size:(i+1)*mini_batch_size]
mini_batch_Y = Y_shuffled[:,i*mini_batch_size:(i+1)*mini_batch_size]
mini_batch = (mini_batch_X,mini_batch_Y)
mini_batches.append(mini_batch)

#if there is incomplete batch


if m % mini_batch_size != 0:
mini_batch_X=X_shuffled[:,num_of_complete_batches*mini_batch_size:num_
of_complete_batches*mini_batch_size + (m -
mini_batch_size*num_of_complete_batches)]

mini_batch_Y=Y_shuffled[:,num_of_complete_batches*mini_batch_size:num_
of_complete_batches*mini_batch_size + (m -
mini_batch_size*num_of_complete_batches)]
mini_batch = (mini_batch_X,mini_batch_Y)
mini_batches.append(mini_batch)

#parameters_initialize
costs = []
accuracies = []
parameters = initialize_parameters_deep(layer_dims)
f = h5py.File("/content/train.hdf5", 'r')
train_y = f['label'][...]
f.close()
f = h5py.File("/content/test.hdf5", 'r')
test_y = f['label'][...]
f.close()
if np.array_equal(X, train_x):
y = train_y
elif np.array_equal(X, test_x):
y = test_y

for Iteration in range(epochs):


last, _ = forward_pass(X, parameters)
correct = np.sum(np.argmax(last, 0) == y)
accuracy = (correct/y.shape[0])*100
for mini_batch in mini_batches:
x_batch,y_batch=mini_batch
# parameters = parameters
A_Last,caches = forward_pass(x_batch,parameters)
cost = np.sum(compute_cost(A_Last, y_batch))/A_Last.shape[0]
grads = backward_pass(A_Last, y_batch , caches)
parameters = update_parameters(parameters, grads, learning_rate)

if Iteration%1 == 0:
print(f'Epochs {Iteration} : Cost ={cost} Accuracy ={accuracy}')
costs.append(cost)
accuracies.append(accuracy)
return parameters,costs,A_Last,x_batch,y_batch,accuracies
The data is now in shape:

Each column denotes one training example in the dataset.


Running the model:
We have parameters of interest such as X, Y, layer_dims, mini_batch_size,
epochs, and learning_rate.
layer_dims=[784,128,64,32,10]
mini_batch_size=64
epochs= 100

With the learning rate parameter, we tried with 5 values: 0.001, 0.01, 0.005,
0.0025, 0.0075 to see which value gives the best results for the model.
Below are the graphs representing the cost and accuracy of the model with 5
learning rate values
Try with our own image
test_img = Image.open('/content/Screenshot 2024-05-15 033514.png') #path of
the image
test_img_converted = test_img.convert('L')
test_img_resized = test_img_converted.resize((28,28))
test_img_resized = invert_image(test_img_resized) # converts the image having
white background to black background
test_img_array = np.asarray(test_img_resized)/255.0

prediction,_=forward_pass(np.reshape(test_img_array,(784,1)),parameters2)
predicted_num=np.argmax(prediction)
prediction = np.max(prediction)
print("Prediction: ",prediction)
print("Input Image: ", predicted_num)
test_img.resize((128,128))

Prediction: 0.999999811240444
Input Image: 3

We can try with our own image by changing the path of the image below.
We print on the screen the accuracy of the prediction for the number 3.

Conclusion
By using a neural network model with the MNIST dataset, we have built a deep
learning model that can recognize handwritten digits from 0 to 9 with high
accuracy.The model is trained on the training dataset to learn how to classify
handwritten digits. After training, the model is tested on the test dataset to
evaluate its accuracy. The results show that the model achieves high accuracy
on the test dataset. This indicates that the model has effectively and accurately
learned how to recognize handwritten digits. This model can be used to
recognize handwritten digits in real-world applications such as barcode
recognition, handwritten text recognition in documents, or in systems that
identify users by handwritten text.
Final exam exercise development directions.
We will continue to develop this model by utilizing available libraries to train
the model for real-time communication through the computer camera.
Introduce team members and assign tasks to each persons
Personal Tasks
information
1 Vũ Hoàng Nam
(21012)

2 Phạm Văn Nghị


(21012557)

Phạm Khôi Nguyên


(21012)

You might also like