GK Deeplearning
GK Deeplearning
GK Deeplearning
1. MNIST Dataset:
o The MNIST dataset consists of 28x28-pixel grayscale images of
handwritten digits (0 to 9).
o It’s widely used for evaluating machine learning models in the
context of digit classification.
o The dataset contains 60,000 training images and 10,000 test
images.
o Each image represents a single digit, and the goal is to predict the
correct digit class.
2. Neural Networks (NN):
o A neural network is a computational model inspired by the human
brain’s interconnected neurons.
o It consists of layers of interconnected nodes (neurons) that process
and transform input data.
o Each neuron computes a weighted sum of its inputs, applies an
activation function, and produces an output.
o Neural networks can learn from data by adjusting their weights
during training.
3. Deep Learning:
o Deep learning refers to neural networks with multiple hidden
layers (deep architectures).
o These deep architectures allow for more complex representations
and better feature extraction.
o Common deep learning architectures include Convolutional
Neural Networks (CNNs) for image processing, Recurrent Neural
Networks (RNNs) for sequential data, and Transformers for
natural language processing.
4. Training a Neural Network:
o To train a neural network:
1. Define the architecture (number of layers, neurons per layer,
activation functions).
2. Prepare labeled training data (input features and
corresponding target labels).
3. Initialize weights randomly.
4. Use backpropagation and optimization algorithms (e.g.,
stochastic gradient descent) to adjust weights.
5. Iterate until the model converges or reaches a stopping
criterion.
o Training involves minimizing a loss function (difference between
predicted and actual outputs).
5. Common Activation Functions:
o ReLU (Rectified Linear Unit): (f(x) = \max(0, x))
o Sigmoid: (f(x) = \frac{1}{1 + e^{-x}})
{-x}}{e
o Tanh (Hyperbolic Tangent): (f(x) = \frac{e^x - e x + e^{-
x}})
6. Deep Learning Libraries:
o Python libraries like TensorFlow, Keras, and PyTorch provide
tools for building and training neural networks.
o Keras (now part of TensorFlow) offers a high-level API for
creating neural networks.
After the model is developed and trained, the model will be able to predict
handwritten digits from 0-9 from image provided by us. The output will be as
below:
Model Architecture
for l in range(1,L):
parameters['W' + str(l)] = np.random.randn(layer_dims[l],
layer_dims[l-1]) * np.sqrt(2/layer_dims[l-1])
parameters["b"+str(l)]=np.zeros((layer_dims[l],1))
return parameters
Activation Functions
The activation functions we will be using are softmax and relu.
Let us the define relu and softmax along with their derivatives.
Relu
def relu(x):
x = np.maximum(0,x)
return x
Relu_Prime
def relu_prime(x):
x[x<=0] = 0
x[x>0] = 1
return x
Softmax
def softmax(x):
ans = np.exp(x)/np.sum(np.exp(x),axis=0)
return ans
Softmax_Prime
def softmax_prime(x):
ans = softmax(x)*(1-softmax(x))
return ans
Forward Propagation
Let us define the helper functions required for the forward propagation task
Let us define a linear_forward layer function with calculates z = w*a+b
in a layer of neural network.
def linear_forward(A,W,b):
Z=W.dot(A)+b
cache = (A, W ,b)
return Z,cache
Here A is the activations from previous layer, W is weight numpy array of size
(size of current layer,size of previous layer), b is bias numpy array of size (size
of current layer, 1) and cache is tuple of (A,W,b) which will be used during
backpropagation
Now we will implement deep_layer which will calculate the activations
according to the defined activation function as:
def deep_layer(A,W,b,activation):
Z,linear_cache = linear_forward(A,W,b)
if activation == 'softmax':
A = softmax(Z)
activation_cache = Z
elif activation == 'relu':
A = relu(Z)
activation_cache = Z
cache = (linear_cache,activation_cache)
return A,cache
Here cache contains linear_cache(A,W,b) from linear_forward and
activation_cache(Z) , which are going to be used during the back propagation.
Now let us use the above functions and implement forward propagation as:
def forward_pass(input_X,parameters):
caches=[]
depth = int(len(parameters)/2) # number of layers in the neural network
A = input_X
for l in range(1,depth):
A_prev = A
A,cache =
deep_layer(A_prev,parameters['W'+str(l)],parameters['b'+str(l)],'relu')
caches.append(cache)
A_last,cache =
deep_layer(A,parameters['W'+str(depth)],parameters['b'+str(depth)],'softmax')
caches.append(cache)
return A_last,caches
Here A_last is the output of the neural network after passing through all layers
and caches is the list of all the cache(A,W,b,Z) of each layer of neural network.
Thus, forward propagation part is complete
Cost Computation
Since this is a multiclass classification task, the loss function we use is
categorical cross-entropy
n
L L( y , ^y )=−∑ y i log ^y i
i=1
mini_batch_Y=Y_shuffled[:,num_of_complete_batches*mini_batch_size:num_
of_complete_batches*mini_batch_size + (m -
mini_batch_size*num_of_complete_batches)]
mini_batch = (mini_batch_X,mini_batch_Y)
mini_batches.append(mini_batch)
#parameters_initialize
costs = []
accuracies = []
parameters = initialize_parameters_deep(layer_dims)
f = h5py.File("/content/train.hdf5", 'r')
train_y = f['label'][...]
f.close()
f = h5py.File("/content/test.hdf5", 'r')
test_y = f['label'][...]
f.close()
if np.array_equal(X, train_x):
y = train_y
elif np.array_equal(X, test_x):
y = test_y
if Iteration%1 == 0:
print(f'Epochs {Iteration} : Cost ={cost} Accuracy ={accuracy}')
costs.append(cost)
accuracies.append(accuracy)
return parameters,costs,A_Last,x_batch,y_batch,accuracies
The data is now in shape:
With the learning rate parameter, we tried with 5 values: 0.001, 0.01, 0.005,
0.0025, 0.0075 to see which value gives the best results for the model.
Below are the graphs representing the cost and accuracy of the model with 5
learning rate values
Try with our own image
test_img = Image.open('/content/Screenshot 2024-05-15 033514.png') #path of
the image
test_img_converted = test_img.convert('L')
test_img_resized = test_img_converted.resize((28,28))
test_img_resized = invert_image(test_img_resized) # converts the image having
white background to black background
test_img_array = np.asarray(test_img_resized)/255.0
prediction,_=forward_pass(np.reshape(test_img_array,(784,1)),parameters2)
predicted_num=np.argmax(prediction)
prediction = np.max(prediction)
print("Prediction: ",prediction)
print("Input Image: ", predicted_num)
test_img.resize((128,128))
Prediction: 0.999999811240444
Input Image: 3
We can try with our own image by changing the path of the image below.
We print on the screen the accuracy of the prediction for the number 3.
Conclusion
By using a neural network model with the MNIST dataset, we have built a deep
learning model that can recognize handwritten digits from 0 to 9 with high
accuracy.The model is trained on the training dataset to learn how to classify
handwritten digits. After training, the model is tested on the test dataset to
evaluate its accuracy. The results show that the model achieves high accuracy
on the test dataset. This indicates that the model has effectively and accurately
learned how to recognize handwritten digits. This model can be used to
recognize handwritten digits in real-world applications such as barcode
recognition, handwritten text recognition in documents, or in systems that
identify users by handwritten text.
Final exam exercise development directions.
We will continue to develop this model by utilizing available libraries to train
the model for real-time communication through the computer camera.
Introduce team members and assign tasks to each persons
Personal Tasks
information
1 Vũ Hoàng Nam
(21012)