Project Report: CS 574 - Computer Vision Using Machine Learning
Project Report: CS 574 - Computer Vision Using Machine Learning
Project Report
Classification of digits in MNIST Dataset
Group 26
160101035 Inderpreet Singh Chera
160101039 Kapil Goyal
160101043 Mohit Singh
160101057 Sahib Khan
160101069 Shubham Kumar Koul
PROBLEM STATEMENT
Use the following methods to classify the given image sample of MNIST dataset
into one of the 10 possible classes (0 to 9) and Write a report stating the
detailed summary after tweaking different parameters.
1. Logistic Regression
2. Multi-Layer Perceptron
3. Deep Neural Network
4. Deep Convolutional Neural Network
Logistic Regression
Logistic regression is a statistical model that in its basic form uses a
logistic function to model a binary dependent variable that is dependent on
one or more nominal, ordinal, interval or ratio-level independent variables.
We implemented our model using multinomial and one vs rest logistic
regression and observed that multinomial class has better accuracy.
By default configuration Used:
● Solver - lbfgs
● Iterations - 100
● Regularization Strength - 1
● multi-class - multinomial
Accuracy vs Iterations :
Multi-Layer Perceptron
A multilayer perceptron (MLP) is a class of feedforward artificial neural
network. A MLP consists of at least three layers of nodes: an input layer, a
hidden layer and an output layer. Except for the input nodes, each node is a
neuron that uses a nonlinear activation function. MLP utilizes a supervised
learning technique called backpropagation for training
We will implement MLP model using the following configuration and will tweak
some of these parameters to analyze their effect on the accuracy of the
model.
Configuration Used:
● Input Layer: 784 Neurons
● Hidden Layers: 1 layer of 512 Neurons
● Output Layer: 10 Neurons
● Optimizer: SGD
● Activation Function:
○ Hidden Layer: sigmoid function
○ Output Layer: softmax
● Batch Size: 128
● Epochs: 20
Similarly training and validation loss keep decreasing. That means there is
no overfitting in this case.
Different Number of Hidden Layers
We will change the number of hidden layers from 1 to 3 and see its effect on
the accuracy of the model. We will keep all other parameters same as the main
configuration.
We observe that using smaller batch size makes convergence faster as compared
to the bigger batch size. This because using smaller batch size gives good
generalization of data. We also observed that using higher batch size
decreases computation time as compared to smaller batch size.
Vanishing Gradient Problem occurs when we try to train a Neural Network model
using Gradient based optimization techniques. When we do Back-propagation i.e
moving backward in the Network and calculating gradients of loss(Error) with
respect to the weights , the gradients tends to get smaller and smaller as we
keep on moving backward in the Network. This means that the neurons in the
Earlier layers learn very slowly as compared to the neurons in the later
layers in the Hierarchy.
Configuration Used:
● Input Layer: 784 Neurons
● Hidden Layers: 2 layers of 100 Neurons
● Output Layer: 10 Neurons
● Optimizer: RMSprop
● Activation Function:
○ 1st Hidden Layer: tanh function
○ 2nd Hidden Layer: tanh function
○ Output Layer: softmax
● Batch Size: 128
● Epochs: 10
Dropout
First we solved overfitting problem by using dropout. Dropout randomly shuts
off some nodes and stop the gradients flowing through it. So, our forward and
back propagation happen without those nodes. In that case the rest of the
nodes need to pick up the slack and be more active in the training. We used
dropout factor equal to 0.2. That means 2 out 5 neurons will be randomly
discarded.
From the above graphs we can observe that after using dropout overfitting
problem is solved (as validation accuracy is increasing and validation loss
is decreasing)
L2 Regularization
L2 regularization adds “squared magnitude” of coefficient as penalty term to
the loss function. We have used lambda = 0.001 that is regularization factor.
From the above graphs we observe that it solves overfitting problem to some
extent but it is not as good as dropout because if we train on more epochs it
may again overfit.
Different Number of Hidden Layers
We will change the number of hidden layers from 2 to 5 and see its effect on
the accuracy of the model. We will keep all other parameters same as the main
configuration.
We observed that sigmoid function increases slowly while relu and tanh
increases faster than sigmoid. Tanh gives best accuracy while sigmoid gives
lowest accuracy among the three.
We observe that using smaller batch sizes convergence is reached faster than
using bigger batch size. This is because smaller batch sizes give better
generalization of data than bigger batch size. We also observed that it takes
less time to train using larger batch size than smaller ones.
Configuration Used:
● Input Layer: 784 Neurons
● Hidden Layers: 1 2D Convolutional layer with 32 kernels of size 3 x 3
● Output Layer: 10 Neurons
● Optimizer = Adadelta
● Activation Function:
○ Hidden Layer: relu function
○ Output Layer: softmax
● Batch Size: 128
● Epochs: 10
We can observe here that our Validation Accuracy decreases after 6th epoch
and again decreases after 8th epoch but our train accuracy keeps on
increasing, hence it shows the case of overfitting.
Similarly we can see here that our loss increases after 6th epoch and 8th
epoch for validation set but loss keeps on decreasing for training set thus
it is a case of overfitting.
Overfitting Improvements
To handle the case of overfitting we have used two techniques :-
1. Dropout
2. L2 regularisation
We can see from above that our accuracy first increases upto 20 epochs then
it started decreasing because on increasing epochs our model starts to
overfit.
We observed that using more no. of kernels we can capture more features as
the no. of possible combination grow and hence our model converges more
accurately and faster. For more complex dataset, it is generally observed
that using more kernels will perform better.
Conclusion
● We observe that accuracy increases as the number of epochs increases.
But it also takes more time to train the model.
● As our Dataset is image based, CNN outperform other models because it
makes use of pattern found in data.
● To avoid overfitting a dataset with a large number of test examples
should be chosen.
Codes
Logistic Regression
import keras
from keras.datasets import mnist
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
X_train.shape
"""Logistic Regression"""
print(yl)
print(score)
logisticRegr.predict(X_test)
Y_test
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.regularizers import l2
from keras.optimizers import RMSprop
import matplotlib.pyplot as plt
from google.colab import files
model = Sequential()
model.add(Dense(hidden_layer_width, kernel_regularizer = l2(l2_reg),
activation = activation_fn, input_shape = (784,)))
if dropout:
model.add(Dropout(dropout_factor))
for i in range(1,layers):
model.add(Dense(hidden_layer_width, kernel_regularizer = l2(l2_reg),
activation = activation_fn))
if dropout:
model.add(Dropout(dropout_factor))
model.summary()
model.compile(loss = 'categorical_crossentropy',
optimizer = optimizer,
metrics = ['accuracy'])
if print_loss:
for history in histories:
if print_train:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title(title)
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.legend(legend, loc='best')
plt.show()
pass
# Base Configuration
number_of_hidden_layers = 1
hidden_layer_width = 512
optimizer = 'sgd'
activation_fn = "sigmoid"
batch_size = 128
epoch = 20
print_plot([history], "",
"Epochs", "Accuracy",
["Train Accuracy",
"Validation Accuracy (Test Accuracy: {})".format(score[1])],
True, True)
#variation in layers
history=[]
label=[]
for i in range(1,4):
[h,s]=train_model(layers=i,
hidden_layer_width=hidden_layer_width,
optimizer=optimizer,
activation_fn=activation_fn,
batch_size=batch_size,
epochs=epoch)
history.append(h)
label.append("No of Hidden Layers {} (Test Accuracy: {})".format(i,s[1]))
for i in ['sigmoid','tanh','relu']:
[h,s]=train_model(layers=number_of_hidden_layers,
hidden_layer_width=hidden_layer_width,
optimizer=optimizer,
activation_fn=i,
batch_size=batch_size,
epochs=epoch)
history.append(h)
label.append("Activation Function {} (Test Accuracy: {})".format(i,s[1]))
#Variation in epochs
history=[]
label=[]
for i in range(20,81,20):
[h,s]=train_model(layers=number_of_hidden_layers,
hidden_layer_width=hidden_layer_width,
optimizer=optimizer,
activation_fn=activation_fn,
batch_size=batch_size,
epochs=i)
history.append(h)
label.append("Epochs {} (Test Accuracy: {})".format(i,s[1]))
i=128
while i<1025:
[h,s]=train_model(layers=number_of_hidden_layers,
hidden_layer_width=hidden_layer_width,
optimizer=optimizer,
activation_fn=activation_fn,
batch_size=i,
epochs=epoch)
history.append(h)
label.append("Batch Size {} (Test Accuracy: {})".format(i,s[1]))
i*=2
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras.regularizers import l2
from keras.optimizers import RMSprop, SGD, Adadelta, Adam
import matplotlib.pyplot as plt
from google.colab import files
model = Sequential()
model.add(Dense(hidden_layer_width, kernel_regularizer = l2(l2_reg),
activation = activation_fn, input_shape = (784,)))
if dropout:
model.add(Dropout(dropout_factor))
for i in range(1,layers):
model.add(Dense(hidden_layer_width, kernel_regularizer = l2(l2_reg),
activation = activation_fn))
if dropout:
model.add(Dropout(dropout_factor))
model.summary()
model.compile(loss = 'categorical_crossentropy',
optimizer = optimizer,
metrics = ['accuracy'])
if print_loss:
for history in histories:
if print_train:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title(title)
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.legend(legend, loc='best')
plt.show()
pass
# Base Configuration
number_of_hidden_layers = 2
hidden_layer_width = 100
optimizer = RMSprop()
activation_fn = "tanh"
batch_size = 128
epoch = 10
print_plot([history], "",
"Epochs", "Accuracy",
["Train Accuracy",
"Validation Accuracy (Test Accuracy: {})".format(score[1])],
True, True)
print_plot([history], "",
"Epochs", "Accuracy",
["Train Accuracy",
"Validation Accuracy (Test Accuracy: {})".format(score[1])],
True, True)
#Layers
history=[]
label=[]
for i in range(2,6):
[h,s] = train_model(layers=i,
hidden_layer_width=hidden_layer_width,
optimizer=optimizer,
activation_fn=activation_fn,
batch_size=batch_size,
epochs=epoch)
history.append(h)
#Activation Function
history=[]
label=[]
for i in ['relu','sigmoid','tanh']:
[h,s]=train_model(layers=number_of_hidden_layers,
hidden_layer_width=hidden_layer_width,
optimizer=optimizer,
activation_fn=i,
batch_size=batch_size,
epochs=epoch)
history.append(h)
label.append("Activation Function "+i+" (Test Accuracy " + str(s[1]) + "
)")
#Variation in Epochs
history=[]
label=[]
for i in range(10,41,10):
[h,s]=train_model(layers=number_of_hidden_layers,
hidden_layer_width=hidden_layer_width,
optimizer=optimizer,
activation_fn=activation_fn,
batch_size=batch_size,
epochs=i)
history.append(h)
label.append("Epochs "+str(i)+" (Test Accuracy "+str(s[1])+" )")
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.regularizers import l2
from keras import backend as K
import matplotlib.pyplot as plt
if K.image_data_format() == 'channels_first':
x_train = x_train.reshape(x_train.shape[0], 1, 28, 28)
x_test = x_test.reshape(x_test.shape[0], 1, 28, 28)
input_shape = (1, 28, 28)
else:
x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)
input_shape = (28, 28, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
if layers >= 2:
model.add(Conv2D(64, kernel_size, activation = activation_fn,
kernel_regularizer=l2(l2_reg)))
model.add(MaxPooling2D(pool_size=(2, 2)))
if dropout:
model.add(Dropout(dropout_factor))
model.add(Flatten())
if dense_layer:
model.add(Dense(hidden_layer_width, activation = activation_fn))
if dropout:
model.add(Dropout(dropout_factor))
model.add(Dense(num_classes, activation='softmax'))
model.compile(loss=keras.losses.categorical_crossentropy,
optimizer=keras.optimizers.Adadelta(),
metrics=['accuracy'])
if print_loss:
for history in histories:
if print_train:
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title(title)
plt.ylabel("Loss")
plt.xlabel("Epochs")
plt.legend(legend, loc='best')
plt.show()
pass
# Base Configuration
number_of_hidden_layers = 1
hidden_layer_width =128
activation_fn = 'relu'
epochs =10
batch_size = 128
no_of_kernels = 32
kernel_size = (3,3)
print_plot([history], "",
"Epochs", "Accuracy",
["Train Accuracy",
"Validation Accuracy (Test Accuracy: {})".format(score[1])],
True, True)
print_plot([history], "",
"Epochs", "Accuracy",
["Train Accuracy",
"Validation Accuracy (Test Accuracy: {})".format(score[1])],
True, True)
print_plot([history], "",
"Epochs", "Accuracy",
["Train Accuracy",
"Validation Accuracy (Test Accuracy: {})".format(score[1])],
True, True)
#variation in layers
history=[]
label=[]
[h,s]=train_model(layers=1,
hidden_layer_width=hidden_layer_width,
activation_fn=activation_fn,
batch_size=batch_size,
epochs=epochs,
kernel_size=kernel_size,
no_of_kernels=no_of_kernels,)
history.append(h)
label.append("No of Hidden Layers {} (Test Accuracy: {})".format(1,s[1]))
[h,s]=train_model(layers=2,
hidden_layer_width=hidden_layer_width,
activation_fn=activation_fn,
batch_size=batch_size,
epochs=epochs,
kernel_size=kernel_size,
no_of_kernels=no_of_kernels,)
history.append(h)
label.append("No of Hidden Layers {} (Test Accuracy: {})".format(2,s[1]))
[h,s]=train_model(layers=3,
hidden_layer_width=hidden_layer_width,
activation_fn=activation_fn,
batch_size=batch_size,
epochs=epochs,
kernel_size=kernel_size,
no_of_kernels=no_of_kernels, dense_layer=True)
history.append(h)
label.append("No of Hidden Layers {} (Test Accuracy: {})".format(3,s[1]))
print_plot(history,"Variation in No of Hidden
Layers","Epochs","Accuracy",label)
for i in ['sigmoid','tanh','relu']:
[h,s]=train_model(layers=number_of_hidden_layers,
hidden_layer_width=hidden_layer_width,
activation_fn=i,
batch_size=batch_size,
epochs=epochs,
kernel_size=kernel_size,
no_of_kernels=no_of_kernels,)
history.append(h)
label.append("Activation Function {} (Test Accuracy: {})".format(i,s[1]))
print_plot(history,"Variation in Activation
Function","Epochs","Accuracy",label)
#Variation in epochs
history=[]
label=[]
for i in range(10,41,10):
[h,s]=train_model(layers=number_of_hidden_layers,
hidden_layer_width=hidden_layer_width,
activation_fn=activation_fn,
batch_size=batch_size,
epochs=i,
kernel_size=kernel_size,
no_of_kernels=no_of_kernels,)
history.append(h)
label.append("Epochs {} (Test Accuracy: {})".format(i,s[1]))
print_plot(history,"Variation in Epochs","Epochs","Accuracy",label)
i=128
while i<1025:
[h,s]=train_model(layers=number_of_hidden_layers,
hidden_layer_width=hidden_layer_width,
activation_fn=activation_fn,
batch_size=i,
epochs=epochs,
kernel_size=kernel_size,
no_of_kernels=no_of_kernels,)
history.append(h)
label.append("Batch Size {} (Test Accuracy: {})".format(i,s[1]))
i*=2
#@title
print_plot(history,"Variation in No. of Kernels","Epochs","Accuracy",label)