TMA01 Question 2 (55 Marks)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

TMA01 Question 2 (55 marks)

Name: Parth Shah


PI: E395923X
In this question, you will create some neural network models that use a dataset of natural
scenes.
You will create some neural network models that use a dataset of natural scenes. The
dataset is currently hosted on Kaggle and has been divided into separate datasets for this
TMA. You will use all six classes of the dataset.
In this question, you are not being asked to select a "best" model. Therefore, for the
purposes of the question, you'll be using the test dataset to explore how models behave
differently on seen and unseen data. Don't do this on models you want to deploy, or in
subsequent TMAs!

Completing the TMA


The tasks in this notebook can be addressed using the techniques discussed in the
Foundation and Block 1 of the module materials, and the associated notebooks.
You should be able to complete this question when you have completed the
practical activities in Block 1
You should look at the notebooks for Block 1 while working through this
question. You will find many useful examples in those notebooks which will help
you in this assignment.
Record all your activity and observations in this notebook. Insert additional notebook cells
as required. Remember to run each cell in sequence and to rerun cells if you make any
changes in earlier cells.
Include Markdown cells (like this one) liberally in your solutions, to describe what you are
doing. This will help your tutor give full credit for all you have done, and is invaluable in
reminding you what you were doing when you return to the TMA after a few days away.
Before you submit your notebook make sure you run all cells in order and check that you
get the results you expect. (It is not unknown to receive notebooks which don't work when
the cells are run in order.)
See the VLE for details of how to submit your completed notebook. You should submit only
this notebook file for this question.

Marks are based on process, not results


In this notebook, you will be asked to create, train, and evaluate several neural networks.
Training neural networks is inherently a stochastic process, based on the random
allocation of initial weights and the shuffled order of training examples. Therefore, your
results will differ from results generated by other students, and those generated by the
module team and presented in the tutor's marking guide.
The marks in this question are awarded solely on your ability to carry out the steps of
training and evaluation, not on any particular results you may achieve. There are no
thresholds for accuracy (or any other metric) you must achieve. You will gain credit
for carrying out the tasks specified in this question, including honest evaluations of how the
models perform.

Setup
This imports the required libraries.
import tensorflow as tf

import os
import json

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from IPython.display import HTML, display

BATCH_SIZE = 64

IMAGE_SIZE = (150, 150, 3)


IMAGE_RESCALE = (IMAGE_SIZE[0], IMAGE_SIZE[1])

# Where to find the data


base_dir = '/datasets/intel-multiclass/'
train_dir = os.path.join(base_dir, 'train')
validation_dir = os.path.join(base_dir, 'validation')
test_dir = os.path.join(base_dir, 'test')

label_list = !ls {train_dir}


label_list

['buildings', 'forest', 'glacier', 'mountain', 'sea', 'street']

# Human-sensible labels for the classification


class_names = {i: l for i, l in enumerate(sorted(label_list))}
class_numbers = {l: i for i, l in enumerate(sorted(label_list))}
num_classes = len(label_list)
class_names, class_numbers, num_classes

({0: 'buildings',
1: 'forest',
2: 'glacier',
3: 'mountain',
4: 'sea',
5: 'street'},
{'buildings': 0,
'forest': 1,
'glacier': 2,
'mountain': 3,
'sea': 4,
'street': 5},
6)

def lookup_class_label(label_text):
return class_numbers[label_text.numpy().decode('utf-8')]

def load_image(image_path):
# read the image from disk, decode it, resize it, and scale the
# pixels intensities to the range [0, 1]
image = tf.io.read_file(image_path)
image = tf.io.decode_jpeg(image, channels=3)
image = tf.image.resize(image, IMAGE_RESCALE)
image /= 255.0
# grab the label and encode it
label_text = tf.strings.split(image_path, os.path.sep)[-2]
label = tf.py_function(lookup_class_label, inp=[label_text],
Tout=tf.int32)
encoded_label = tf.one_hot(label, num_classes)
encoded_label = tf.ensure_shape(encoded_label, [num_classes])

# return the image and the integer encoded label


return (image, encoded_label)

train_dataset_files = tf.data.Dataset.list_files(
os.path.join(train_dir, '*', '*.jpg'),
shuffle=True)

train_data = train_dataset_files.map(load_image,
num_parallel_calls=tf.data.AUTOTUNE)
train_data = train_data.cache()
train_data = train_data.shuffle(20000)
train_data = train_data.batch(BATCH_SIZE)
train_data = train_data.prefetch(tf.data.AUTOTUNE)

validation_dataset_files = tf.data.Dataset.list_files(
os.path.join(validation_dir, '*', '*.jpg'),
shuffle=True)

validation_data = validation_dataset_files.map(load_image,
num_parallel_calls=tf.data.AUTOTUNE)
validation_data = validation_data.cache()
validation_data = validation_data.batch(BATCH_SIZE)
validation_data = validation_data.prefetch(tf.data.AUTOTUNE)
test_dataset_files = tf.data.Dataset.list_files(
os.path.join(test_dir, '*', '*.jpg'),
shuffle=True)

test_data = test_dataset_files.map(load_image,
num_parallel_calls=tf.data.AUTOTUNE)
test_data = test_data.cache()
test_data = test_data.batch(BATCH_SIZE)
test_data = test_data.prefetch(tf.data.AUTOTUNE)

len(train_data), len(validation_data), len(test_data)

(182, 18, 67)

def pretty_cm(cm):
result_table = '<h3>Confusion matrix</h3>\n'
result_table += '<table border=1>\n'
result_table += '<tr><td>&nbsp;</td><td>&nbsp;</td><th
colspan=10}>Predicted labels</th></tr>\n'
result_table += '<tr><td>&nbsp;</td><td>&nbsp;</td>'

for _, cn in sorted(class_names.items()):
result_table += f'<td><strong>{cn}</strong></td>'
result_table += '</tr>\n'

result_table += '<tr>\n'
result_table += '<th rowspan=11>Actual labels</th>\n'

for ai, an in class_names.items():


result_table += '<tr>\n'
result_table += f' <td><strong>{an}</strong></td>\n'
for pi, pn in sorted(class_names.items()):
result_table += f' <td>{cm[ai, pi]}</td>\n'
result_table += '</tr>\n'
result_table += "</table>"
# print(result_table)
display(HTML(result_table))

Examining the data


sample_imgs, sample_labels = train_data.as_numpy_iterator().next()

plt.figure(figsize=(10,10))
for i in range(25):
plt.subplot(5,5,i+1)
plt.imshow(sample_imgs[i])#, cmap=plt.cm.binary)
plt.xticks([])
plt.yticks([])
plt.grid(False)
plt.title(class_names[np.argmax(sample_labels[i])])
plt.show()
Validation and test labels
Use these for generating confusion matrices.
validation_labels = np.array(list(validation_data.unbatch().map(lambda
x, y: y).as_numpy_iterator()))
validation_labels = np.argmax(validation_labels, axis=1)
validation_labels.shape

(1134,)

test_labels = np.array(list(test_data.unbatch().map(lambda x, y:
y).as_numpy_iterator()))
test_labels = np.argmax(test_labels, axis=1)
test_labels.shape

(4257,)
Part a (15 marks)
Create a two layer model, of 2048 neurons feeding into a six output neurons. Note that the
initial Flatten layer should have an input_shape=IMAGE_SIZE (that is, (150, 150, 3)).
The 2048-neuron layer should use sigmoid activation. The 6-neuron layer should use
softmax activation.
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
flatten (Flatten) (None, 67500) 0

dense (Dense) (None, 2048) 138242048

dense_1 (Dense) (None, 6) 6150

=================================================================
Total params: 138,254,342
Trainable params: 138,254,342
Non-trainable params: 0
_________________________________________________________________`

Store the model is a variable called model_a.


Use the SGD optimiser, with its default parameters, and train this model for 40 epochs.
Train using the train_data and validation_data defined above, as per the module
notebooks. As this is a multi-class classification, you must use the
categorical_crossentropy loss function for training.

(You may want to pass in the parameter verbose=0 to model_a.fit() to reduce the
output.)
Plot how the accuracy and loss change during training. Comment on your observations of
training.
Evaluate the model on the test dataset and generate a confusion matrix. Comment on this
evaluation and confusion matrix.
Hints
• When training for many epochs, you may want to pass in the parameter verbose=0
to model_a.fit() to reduce the output verbiage. You may want to save the model
and the training history so you can reload it in a later session.
model_a = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=IMAGE_SIZE),
tf.keras.layers.Dense(2048, activation='sigmoid'),
tf.keras.layers.Dense(6, activation='softmax')
])

opt = tf.keras.optimizers.SGD()
model_a.compile(
optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy']
)

history = model_a.fit(
train_data,
validation_data=validation_data,
epochs=40,
verbose=0
)

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'ro', label='Training acc')


plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'ro', label='Training loss')


plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()
The training accuracy continues to increase at a steady rate, but the validation accuracy
flattens out between ~55% - ~60%. Similarly, the training loss continues to decrease at a
steady rate, but after 20 epochs, the validation loss levels out between ~1.1 and ~1.2.
input_test = np.array(list(test_data.unbatch().map(lambda x, y:
x).as_numpy_iterator()))
expected_output_test = np.array(list(test_data.unbatch().map(lambda x,
y: y).as_numpy_iterator()))
expected_output_test = np.argmax(expected_output_test, axis=1)

model_a_predictions = model_a.predict(input_test)

134/134 [==============================] - 1s 5ms/step

cm_a = tf.math.confusion_matrix(
expected_output_test,
# extract the class with the most probability
np.argmax(model_a_predictions, axis=1),
)
pretty_cm(cm_a)

<IPython.core.display.HTML object>

Values in the diagonal are high, which suggests that the model does a good job of
classifying.
The confusion between "mountain" and "street" has a value of 5, which is one of the lowest
values in the confusion matrix. This makes sense, as it is very easy to distinguish between a
mountain and a street. A picture of a street would have a definitive structure, i.e. buildings
on the left and right side, the sky in the middle, and perhaps bright street lights, whereas
rocks of a mountain would be dark and detailed, leading to a sharper image. This suggests
that the pixel values of both types of pictures may be in very different sections of the RGB
space, which means that a clear separation can be drawn by the ML model. This explains
the low mis-classification score.
On the other hand, the confusion between "glacier" and "sea" has a value of 253, one of the
highest value for misclassification in the confusion matrix. This also seems to align well
with intuition. It is harder to differentiate between a picture of a glacier and a sea, than it is
between a picture of a mountain and a street, even for humans. Glaciers have a blue hue
and pictures of a glacier are taken from a lower angle, as you typically point your camera
upwards when taking a picture of a tall structure, meaning that more of the blue sky is
included in the picture. Glaciers often have bodies of flowing water too, adding to the
confusion. Therefore, an image of a glacier is more likely to be concentrated in the blue
region of the RGB space, similar to that of an image of a sea. This means that it is harder to
draw a definitive line between the two classes, leading to higher mis-classification.

Part b (10 marks)


Using the model created in part (a) as a base, create a new model but with additional Dense
layer, of 128 neurons, between the two existing Dense layers.
The first two Dense layers should use the relu activation function.
Store the model is a variable called model_b.
Use the SGD optimizer for training, with a learning rate of 0.005. Train the model for 60
epochs.
Comment on the progress of training.
Evaluate the model and generate a confusion matrix, based on predictions of the test data.
model_b = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=IMAGE_SIZE),
tf.keras.layers.Dense(2048, activation='sigmoid'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(6, activation='softmax')
])

opt = tf.keras.optimizers.SGD(learning_rate=0.005)

model_b.compile(
optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy']
)

history = model_b.fit(
train_data,
validation_data=validation_data,
epochs=60,
verbose=0
)

acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'ro', label='Training acc')


plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'ro', label='Training loss')


plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()
Due to the lower learning rate compared to 2(a), the model is able to hone in on following
the minimum more closely. The epoch value has also increased from 40 to 60, and
combined with the fact that the trainable parameters to the model have increased with the
addition of a new layer (leading to finer weight and bias control), the model achieves much
better training accuracy and loss values compared to 2(a). As the learning rate is low, the
training accuracy and loss have very volatility. The accuracy/loss improvements grow at a
stable rate, and if the model were to be trained further, even better training metrics would
be achieved.
However, the validation metrics seem to flatten out after 30 epochs. The loss converges
between ~1.1 and ~1.2, and the accuracy converges between ~55% and ~60%. These
findings are very similar to that of 2(a) despite having a 50% greater epoch value, slower
learning rate, and a more complex neural network which enables more finer adjustments
to wrap around the data points.
The poor performance of the model with real-world data may be attributed to the nature of
the problem. There is a lot of variability between images of natural scenes, and hence, the
model is not able to do a good job of generalizing.
input_test = np.array(list(test_data.unbatch().map(lambda x, y:
x).as_numpy_iterator()))
expected_output_test = np.array(list(test_data.unbatch().map(lambda x,
y: y).as_numpy_iterator()))
expected_output_test = np.argmax(expected_output_test, axis=1)

model_b_predictions = model_b.predict(input_test)

134/134 [==============================] - 1s 5ms/step

cm_b = tf.math.confusion_matrix(
expected_output_test,
# extract the class with the most probability
np.argmax(model_b_predictions, axis=1),
)
pretty_cm(cm_b)

<IPython.core.display.HTML object>

0.7774762550881954

Again, the values in the diagonal are high, suggesting that the model mostly does a good
job. However, there are certain pairs of classes where misclassification is very high, such as
"buildings" and "sea", "buildings" and "street", and "buildings" and "mountain". The
"forest" class has the best classification rate, with it predicting 573 out of 737 (~77%) of
forest images correctly. This value is much better than the test accuracy.

Part c (5 marks)
Evaluate the two models you have created on all three of the training, validation, and test
datasets. Compare the two confusion matrices you created above. Compare and comment
on the performance of the two models.
train_metric_a = model_a.evaluate(train_data, return_dict=True)
validation_metric_a = model_a.evaluate(validation_data,
return_dict=True)
test_metric_a = model_a.evaluate(test_data, return_dict=True)
print("---------------------------------------------------------------
-----")

train_metric_b = model_b.evaluate(train_data, return_dict=True)


validation_metric_b = model_b.evaluate(validation_data,
return_dict=True)
test_metric_b = model_b.evaluate(test_data, return_dict=True)

182/182 [==============================] - 2s 9ms/step - loss: 0.7941


- accuracy: 0.7093
18/18 [==============================] - 0s 9ms/step - loss: 1.1655 -
accuracy: 0.5432
67/67 [==============================] - 1s 9ms/step - loss: 1.1440 -
accuracy: 0.5699
--------------------------------------------------------------------
182/182 [==============================] - 2s 9ms/step - loss: 0.5866
- accuracy: 0.7901
18/18 [==============================] - 0s 9ms/step - loss: 1.0909 -
accuracy: 0.6199
67/67 [==============================] - 1s 9ms/step - loss: 1.0964 -
accuracy: 0.6119

Model A performs significantly worse than Model B. In the training dataset, Model A has a
~9% lower accuracy than Model B, while in the validation and testing datasets, the
difference is closer to ~5%. This can be attributed to Model B's additional layer, which
introduces more trainable parameters into the neural network, and therefore allows it to
better classify data points. Model B's better performance may also be due to the usage of
ReLU as the activation function compared to Sigmoid used in Model A. Sigmoid is prone to
the vanishing gradient issue compared to ReLU - an issue where the gradient becomes so
small that it makes SGD converge to a suboptimal solution.
pretty_cm(cm_a - cm_b)

<IPython.core.display.HTML object>

Subtracting the Confusion Matrix of Model A with that of Model B reveals that Model B
better classifies forest, glaciers, mountains, and streets (due to the presence of many
negative values in those columns), and Model A better classifies buildings and seas.

Part d (15 marks)


You will now investigate the effect of changing the topology and other hyperparameters of
the model.
Train two other models, based on the model in part (b). Store the models in distinct
variable names.
• One model should have a different structure but use the same hyperparameters for
training.
• The other should have the same structure as in part (b) but use different
hyperparameters for training.
Describe the changes you make in each case.
Use your judgment for how to change the models for this part.
For each model, comment on how well the training went, evaluate the model on the test
data, and compare to the models found in parts (a) and (b). State whether the changes had
a significant effect on the model's performance.
model_c = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=IMAGE_SIZE),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dense(2048, activation='relu'),
tf.keras.layers.Dense(6, activation='softmax')
])

opt = tf.keras.optimizers.SGD(learning_rate=0.005)

model_c.compile(
optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy']
)

• Reduced the number of neurons in the first hidden layer from 2048 to 512.
• Changed the activation function of the first hidden layer to ReLU.
• Increased the number of neurons in the second hidden from 128 to 1024.
model_d = tf.keras.Sequential([
tf.keras.layers.Flatten(input_shape=IMAGE_SIZE),
tf.keras.layers.Dense(2048, activation='sigmoid'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(6, activation='softmax')
])

opt = tf.keras.optimizers.SGD(learning_rate=0.05)

model_d.compile(
optimizer=opt,
loss='categorical_crossentropy',
metrics=['accuracy']
)

• Increased the learning rate from 0.005 to 0.05.


• Increased the number of epochs from 60 to 100.
history_c = model_c.fit(
train_data,
validation_data=validation_data,
epochs=60,
verbose=0
)

history_d = model_d.fit(
train_data,
validation_data=validation_data,
epochs=100
)

Epoch 1/100
182/182 [==============================] - 4s 22ms/step - loss: 1.6335
- accuracy: 0.3600 - val_loss: 1.3252 - val_accuracy: 0.4727
Epoch 2/100
182/182 [==============================] - 4s 21ms/step - loss: 1.3603
- accuracy: 0.4592 - val_loss: 1.2728 - val_accuracy: 0.4780
Epoch 3/100
182/182 [==============================] - 4s 21ms/step - loss: 1.2926
- accuracy: 0.4888 - val_loss: 1.2897 - val_accuracy: 0.4771
Epoch 4/100
182/182 [==============================] - 4s 21ms/step - loss: 1.2447
- accuracy: 0.5116 - val_loss: 1.2432 - val_accuracy: 0.5088
Epoch 5/100
182/182 [==============================] - 4s 21ms/step - loss: 1.2125
- accuracy: 0.5282 - val_loss: 1.2347 - val_accuracy: 0.5379
Epoch 6/100
182/182 [==============================] - 4s 21ms/step - loss: 1.1720
- accuracy: 0.5466 - val_loss: 1.1698 - val_accuracy: 0.5591
Epoch 7/100
182/182 [==============================] - 4s 21ms/step - loss: 1.1531
- accuracy: 0.5581 - val_loss: 1.1479 - val_accuracy: 0.5370
Epoch 8/100
182/182 [==============================] - 4s 21ms/step - loss: 1.1167
- accuracy: 0.5741 - val_loss: 1.1411 - val_accuracy: 0.5564
Epoch 9/100
182/182 [==============================] - 4s 21ms/step - loss: 1.1070
- accuracy: 0.5765 - val_loss: 1.1412 - val_accuracy: 0.5529
Epoch 10/100
182/182 [==============================] - 4s 21ms/step - loss: 1.0904
- accuracy: 0.5814 - val_loss: 1.1089 - val_accuracy: 0.5750
Epoch 11/100
182/182 [==============================] - 4s 21ms/step - loss: 1.0654
- accuracy: 0.5888 - val_loss: 1.1244 - val_accuracy: 0.5697
Epoch 12/100
182/182 [==============================] - 4s 21ms/step - loss: 1.0350
- accuracy: 0.6041 - val_loss: 1.1194 - val_accuracy: 0.5697
Epoch 13/100
182/182 [==============================] - 4s 22ms/step - loss: 1.0172
- accuracy: 0.6116 - val_loss: 1.1596 - val_accuracy: 0.5661
Epoch 14/100
182/182 [==============================] - 4s 21ms/step - loss: 1.0011
- accuracy: 0.6169 - val_loss: 1.0806 - val_accuracy: 0.5847
Epoch 15/100
182/182 [==============================] - 4s 21ms/step - loss: 0.9837
- accuracy: 0.6208 - val_loss: 1.1948 - val_accuracy: 0.5529
Epoch 16/100
182/182 [==============================] - 4s 21ms/step - loss: 0.9815
- accuracy: 0.6225 - val_loss: 1.1036 - val_accuracy: 0.5838
Epoch 17/100
182/182 [==============================] - 4s 21ms/step - loss: 0.9505
- accuracy: 0.6385 - val_loss: 1.1724 - val_accuracy: 0.5335
Epoch 18/100
182/182 [==============================] - 4s 21ms/step - loss: 0.9320
- accuracy: 0.6440 - val_loss: 1.1902 - val_accuracy: 0.5582
Epoch 19/100
182/182 [==============================] - 4s 21ms/step - loss: 0.9268
- accuracy: 0.6474 - val_loss: 1.0941 - val_accuracy: 0.5864
Epoch 20/100
182/182 [==============================] - 4s 21ms/step - loss: 0.8930
- accuracy: 0.6614 - val_loss: 1.1162 - val_accuracy: 0.5582
Epoch 21/100
182/182 [==============================] - 4s 21ms/step - loss: 0.8735
- accuracy: 0.6674 - val_loss: 1.0817 - val_accuracy: 0.5970
Epoch 22/100
182/182 [==============================] - 4s 21ms/step - loss: 0.8786
- accuracy: 0.6720 - val_loss: 1.1273 - val_accuracy: 0.5591
Epoch 23/100
182/182 [==============================] - 4s 21ms/step - loss: 0.8449
- accuracy: 0.6748 - val_loss: 1.0737 - val_accuracy: 0.5961
Epoch 24/100
182/182 [==============================] - 4s 21ms/step - loss: 0.8232
- accuracy: 0.6856 - val_loss: 1.1459 - val_accuracy: 0.5741
Epoch 25/100
182/182 [==============================] - 4s 21ms/step - loss: 0.8243
- accuracy: 0.6881 - val_loss: 1.0914 - val_accuracy: 0.5882
Epoch 26/100
182/182 [==============================] - 4s 21ms/step - loss: 0.8115
- accuracy: 0.6908 - val_loss: 1.0560 - val_accuracy: 0.6014
Epoch 27/100
182/182 [==============================] - 4s 21ms/step - loss: 0.7800
- accuracy: 0.7057 - val_loss: 1.0695 - val_accuracy: 0.6111
Epoch 28/100
182/182 [==============================] - 4s 21ms/step - loss: 0.7836
- accuracy: 0.7051 - val_loss: 1.1282 - val_accuracy: 0.5750
Epoch 29/100
182/182 [==============================] - 4s 21ms/step - loss: 0.7525
- accuracy: 0.7130 - val_loss: 1.1220 - val_accuracy: 0.5873
Epoch 30/100
182/182 [==============================] - 4s 21ms/step - loss: 0.7502
- accuracy: 0.7233 - val_loss: 1.0420 - val_accuracy: 0.6208
Epoch 31/100
182/182 [==============================] - 4s 21ms/step - loss: 0.7257
- accuracy: 0.7242 - val_loss: 1.1284 - val_accuracy: 0.5767
Epoch 32/100
182/182 [==============================] - 4s 21ms/step - loss: 0.7101
- accuracy: 0.7310 - val_loss: 1.0988 - val_accuracy: 0.5979
Epoch 33/100
182/182 [==============================] - 4s 21ms/step - loss: 0.7007
- accuracy: 0.7383 - val_loss: 1.0678 - val_accuracy: 0.6190
Epoch 34/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6724
- accuracy: 0.7453 - val_loss: 1.2091 - val_accuracy: 0.5908
Epoch 35/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6502
- accuracy: 0.7531 - val_loss: 1.0968 - val_accuracy: 0.6129
Epoch 36/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6563
- accuracy: 0.7559 - val_loss: 1.1310 - val_accuracy: 0.5952
Epoch 37/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6251
- accuracy: 0.7669 - val_loss: 1.1051 - val_accuracy: 0.6058
Epoch 38/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6307
- accuracy: 0.7684 - val_loss: 1.1777 - val_accuracy: 0.5944
Epoch 39/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6015
- accuracy: 0.7741 - val_loss: 1.1868 - val_accuracy: 0.6085
Epoch 40/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5912
- accuracy: 0.7824 - val_loss: 1.1754 - val_accuracy: 0.5855
Epoch 41/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5584
- accuracy: 0.7928 - val_loss: 1.1265 - val_accuracy: 0.6146
Epoch 42/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5551
- accuracy: 0.7962 - val_loss: 1.1984 - val_accuracy: 0.5944
Epoch 43/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5288
- accuracy: 0.8086 - val_loss: 1.7243 - val_accuracy: 0.4938
Epoch 44/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5284
- accuracy: 0.8048 - val_loss: 1.7496 - val_accuracy: 0.4621
Epoch 45/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5072
- accuracy: 0.8139 - val_loss: 1.1717 - val_accuracy: 0.6138
Epoch 46/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5180
- accuracy: 0.8075 - val_loss: 1.2915 - val_accuracy: 0.5547
Epoch 47/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4708
- accuracy: 0.8259 - val_loss: 1.1599 - val_accuracy: 0.6208
Epoch 48/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5305
- accuracy: 0.8019 - val_loss: 1.2184 - val_accuracy: 0.5829
Epoch 49/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4204
- accuracy: 0.8490 - val_loss: 1.2576 - val_accuracy: 0.6164
Epoch 50/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4421
- accuracy: 0.8402 - val_loss: 1.2287 - val_accuracy: 0.6243
Epoch 51/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4274
- accuracy: 0.8431 - val_loss: 1.2189 - val_accuracy: 0.6190
Epoch 52/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5648
- accuracy: 0.8036 - val_loss: 1.2398 - val_accuracy: 0.6146
Epoch 53/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4184
- accuracy: 0.8469 - val_loss: 1.2334 - val_accuracy: 0.6085
Epoch 54/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5389
- accuracy: 0.8067 - val_loss: 1.2504 - val_accuracy: 0.5794
Epoch 55/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5052
- accuracy: 0.8185 - val_loss: 1.2343 - val_accuracy: 0.6041
Epoch 56/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4847
- accuracy: 0.8293 - val_loss: 1.2706 - val_accuracy: 0.5864
Epoch 57/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4040
- accuracy: 0.8548 - val_loss: 1.3521 - val_accuracy: 0.5935
Epoch 58/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4236
- accuracy: 0.8491 - val_loss: 1.2492 - val_accuracy: 0.6032
Epoch 59/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4126
- accuracy: 0.8522 - val_loss: 1.3871 - val_accuracy: 0.5891
Epoch 60/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3742
- accuracy: 0.8674 - val_loss: 1.6285 - val_accuracy: 0.5123
Epoch 61/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3408
- accuracy: 0.8769 - val_loss: 1.6896 - val_accuracy: 0.5767
Epoch 62/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3307
- accuracy: 0.8796 - val_loss: 1.2892 - val_accuracy: 0.6049
Epoch 63/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3869
- accuracy: 0.8712 - val_loss: 1.2416 - val_accuracy: 0.6243
Epoch 64/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2860
- accuracy: 0.8980 - val_loss: 1.3422 - val_accuracy: 0.6199
Epoch 65/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3346
- accuracy: 0.8770 - val_loss: 1.5340 - val_accuracy: 0.5794
Epoch 66/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6544
- accuracy: 0.7757 - val_loss: 1.1911 - val_accuracy: 0.5670
Epoch 67/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5327
- accuracy: 0.8043 - val_loss: 1.3770 - val_accuracy: 0.5899
Epoch 68/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3869
- accuracy: 0.8658 - val_loss: 1.7137 - val_accuracy: 0.5018
Epoch 69/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3421
- accuracy: 0.8763 - val_loss: 1.3768 - val_accuracy: 0.5961
Epoch 70/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2747
- accuracy: 0.9040 - val_loss: 1.4370 - val_accuracy: 0.6076
Epoch 71/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2779
- accuracy: 0.9017 - val_loss: 1.6268 - val_accuracy: 0.5944
Epoch 72/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2756
- accuracy: 0.9006 - val_loss: 1.6559 - val_accuracy: 0.5511
Epoch 73/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3364
- accuracy: 0.8859 - val_loss: 1.5114 - val_accuracy: 0.6085
Epoch 74/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4220
- accuracy: 0.8530 - val_loss: 1.2488 - val_accuracy: 0.6032
Epoch 75/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2879
- accuracy: 0.8994 - val_loss: 1.3877 - val_accuracy: 0.6155
Epoch 76/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6074
- accuracy: 0.7932 - val_loss: 1.2990 - val_accuracy: 0.6049
Epoch 77/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2475
- accuracy: 0.9115 - val_loss: 1.3998 - val_accuracy: 0.6217
Epoch 78/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2344
- accuracy: 0.9175 - val_loss: 1.4864 - val_accuracy: 0.5838
Epoch 79/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2232
- accuracy: 0.9184 - val_loss: 1.5302 - val_accuracy: 0.5864
Epoch 80/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4999
- accuracy: 0.8276 - val_loss: 1.3673 - val_accuracy: 0.6093
Epoch 81/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3939
- accuracy: 0.8752 - val_loss: 1.3134 - val_accuracy: 0.5635
Epoch 82/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4110
- accuracy: 0.8531 - val_loss: 1.4103 - val_accuracy: 0.6005
Epoch 83/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3073
- accuracy: 0.8951 - val_loss: 1.4412 - val_accuracy: 0.6164
Epoch 84/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2702
- accuracy: 0.9070 - val_loss: 1.3318 - val_accuracy: 0.6332
Epoch 85/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2140
- accuracy: 0.9254 - val_loss: 1.3811 - val_accuracy: 0.6093
Epoch 86/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2184
- accuracy: 0.9242 - val_loss: 1.5797 - val_accuracy: 0.6049
Epoch 87/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2580
- accuracy: 0.9112 - val_loss: 1.6112 - val_accuracy: 0.5573
Epoch 88/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2025
- accuracy: 0.9303 - val_loss: 1.5805 - val_accuracy: 0.5996
Epoch 89/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3299
- accuracy: 0.8891 - val_loss: 1.5787 - val_accuracy: 0.6296
Epoch 90/100
182/182 [==============================] - 4s 21ms/step - loss: 0.1675
- accuracy: 0.9442 - val_loss: 1.4891 - val_accuracy: 0.6093
Epoch 91/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4774
- accuracy: 0.8410 - val_loss: 1.3164 - val_accuracy: 0.5132
Epoch 92/100
182/182 [==============================] - 4s 21ms/step - loss: 0.5685
- accuracy: 0.7981 - val_loss: 1.5222 - val_accuracy: 0.5714
Epoch 93/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2238
- accuracy: 0.9205 - val_loss: 1.6442 - val_accuracy: 0.5988
Epoch 94/100
182/182 [==============================] - 4s 21ms/step - loss: 0.6244
- accuracy: 0.7930 - val_loss: 1.5795 - val_accuracy: 0.5467
Epoch 95/100
182/182 [==============================] - 4s 21ms/step - loss: 0.4518
- accuracy: 0.8329 - val_loss: 1.3896 - val_accuracy: 0.5750
Epoch 96/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3457
- accuracy: 0.8705 - val_loss: 1.4078 - val_accuracy: 0.5952
Epoch 97/100
182/182 [==============================] - 4s 22ms/step - loss: 0.2039
- accuracy: 0.9290 - val_loss: 2.6695 - val_accuracy: 0.4982
Epoch 98/100
182/182 [==============================] - 4s 21ms/step - loss: 0.7958
- accuracy: 0.7153 - val_loss: 1.4076 - val_accuracy: 0.6164
Epoch 99/100
182/182 [==============================] - 4s 21ms/step - loss: 0.2727
- accuracy: 0.9010 - val_loss: 1.4954 - val_accuracy: 0.6085
Epoch 100/100
182/182 [==============================] - 4s 21ms/step - loss: 0.3225
- accuracy: 0.8907 - val_loss: 1.3947 - val_accuracy: 0.6199

acc = history_c.history['accuracy']
val_acc = history_c.history['val_accuracy']
loss = history_c.history['loss']
val_loss = history_c.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'ro', label='Training acc')


plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'ro', label='Training loss')


plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()
The training accuracy and loss improved at a stable rate. While the training loss was ~30%,
training accuracy reached over 90%. These results are better than what was achieved from
2(a) and 2(b). The validation accuracy and loss are very volatile just like 2(a) and 2(b), and
the accuracy oscillates between ~55% and ~60%, and the loss oscillates between ~1.2 and
~1.3.
It is worth noting that the validation loss begins to increase gradually from the 20th epoch.
This is a sign of over-fitting — the model has over-tuned itself to the intricacies of the
training data, and therefore, it starts to perform poorly on unseen, real-world data.
acc = history_d.history['accuracy']
val_acc = history_d.history['val_accuracy']
loss = history_d.history['loss']
val_loss = history_d.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'ro', label='Training acc')


plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()

plt.figure()

plt.plot(epochs, loss, 'ro', label='Training loss')


plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()

plt.show()
you tend to use the word "volatile", this may be true but there is a general trend to the validation data which is mostly stagnant after 10 epochs.

As the learning rate was increased, the training and validation metrics are much more
volatile than the previous model. However, this model reaches excellent training loss and
training validation metrics during later epochs. This was not seen in earlier models.
This model also suffers from over-fitting, as the validation loss gradually increases from the
20th epoch. As with all other models though, the validation accuracy does not move much
over 60%.
It may be worth having a dynamic learning rate, and more specifically, one which decreases
in further epochs. This would help prevent the issue of the final epoch giving worse metrics
than previous epochs, as can be seen in the graphs above.
train_metric_c = model_c.evaluate(train_data, return_dict=True)
validation_metric_c = model_c.evaluate(validation_data,
return_dict=True)
test_metric_c = model_c.evaluate(test_data, return_dict=True)

print("---------------------------------------------------------------
-----")

train_metric_d = model_d.evaluate(train_data, return_dict=True)


validation_metric_d = model_d.evaluate(validation_data,
return_dict=True)
test_metric_d = model_d.evaluate(test_data, return_dict=True)

182/182 [==============================] - 1s 7ms/step - loss: 0.3123


- accuracy: 0.8932
18/18 [==============================] - 0s 7ms/step - loss: 1.3791 -
accuracy: 0.5811
67/67 [==============================] - 0s 7ms/step - loss: 1.2967 -
accuracy: 0.5990
--------------------------------------------------------------------
182/182 [==============================] - 2s 9ms/step - loss: 0.1737
- accuracy: 0.9476
18/18 [==============================] - 0s 9ms/step - loss: 1.3947 -
accuracy: 0.6199
67/67 [==============================] - 1s 9ms/step - loss: 1.3453 -
accuracy: 0.6321

Both models performed much better given the training samples compared to the models in
2(b). The validation and testing metrics were also slightly better.
It’s quite clear that the lower learning rate improves matters, but that training should not go on
for very long.

Part e (10 marks)


Compare the performance of these models with the the ones you built in Block 1, for
handwritten digit classification.
• Compare the accuracy of the trained models on unseen data. (2 marks)
• Which dataset is the more challenging to classify? Give two reasons why you think
this is. (3 marks)
• What do your results here suggest for how to improve the models you have created?
(2 marks)
• Consider the task of classifying images in a photo collection. Would any of the
models created in this TMA be useful for that task? Justify your answer. (3 marks)

Compare the accuracy of the trained models on unseen data.


The MNIST model is much more accurate with unseen data than the models created in this
TMA. The best accuracy the models in this TMA achieved was approximately 60%. The
MNIST model achieved 91% accuracy with test data. In fact, the test results outperformed
the training results. The models created in this TMA achieved a good training accuracy, but
the testing accuracy was generally 20-30% behind.

Which dataset is the more challenging to classify? Give two reasons why you think this is.
The dataset in this TMA is more challenging to classify because of (1) the nature of the
problem, and (2) the large amount of overlap between classes like "glaciers" and "sea". The
MNIST dataset consists of 28 x 28 images, with only black and white pixel values. Not only
is the dataset much smaller compared to the 3-channel 150 x 150 color image dataset in
this TMA (allowing for more complex neural networks), the problem of classifying
handwritten digits is also much simpler to solve than the problem of classifying natural
scenes, due to the limited scope of the former. There are only a certain number of ways a
digit can be handwritten, whereas natural scenes like buildings and streets have a lot of
overlap and massively vary across many parts of the world. It is easier for a model to learn
a finite number of features compared to massive variability concerning natural scenes.
What do your results here suggest for how to improve the models you have created?
The models created in this TMA suggest that over-fitting is a major issue. This can be
combated by reducing the number of epochs to a sufficient level that would prevent the
issue of under-fitting while also making sure that the model does not deeply cater to the
training data. Further, the models do not achieve accuracy scores of more than 60% when
dealing with unseen data. Since the images that can be fed to the model can vary
significantly, the model may benefit from a larger number of samples. This would help it
learn diverse pattern and perform better with unseen data.

Consider the task of classifying images in a photo collection. Would any of the models created in
this TMA be useful for that task? Justify your answer.
This depends on the variance involved in the photo collection. For example, if the photo
collection was of different natural scenes within the same locality, the models could be
useful for that task, as it can be assumed that each image within a class are very similar to
others in the same class. However, if the photo collection consists of natural scenes from
vastly different parts of the world, the models may not be able to perform well. Consider
the example of a typical American street with wide roads and large sidewalks, and compare
this to the narrow British streets and alleyways. The models may not be able to learn that
both are indeed streets due to the vast variance in both images, and therefore, may not be
able to perform very well.

The accuracies are low, and several categories get confused a lot. That would
mean that many images (around 40%) would be misclassified. In addition, this is still a
restricted set of classes to distinguish, and most people are unlikely to have many photos of
these topics. Therefore, the model’s would be of little practical purpose.

You might also like