10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.
ipynb at master · vnikov/machine-learning
TensorFlow Neural Network Lab
In this lab, you'll use all the tools you learned from Introduction to TensorFlow to label images of English
letters! The data you are using, notMNIST, consists of images of a letter from A to J in different fonts.
The above images are a few examples of the data you'll be training on. After training the network, you will
compare your prediction model against test data. Your goal, by the end of this lab, is to make predictions
against that test set with at least an 80% accuracy. Let's jump in!
To start this lab, you first need to import all the necessary modules. Run the code below. If it runs
successfully, it will print "All modules imported".
In [ ]: import hashlib
import os
import pickle
from urllib.request import urlretrieve
import numpy as np
from PIL import Image
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelBinarizer
from sklearn.utils import resample
from tqdm import tqdm
from zipfile import ZipFile
print('All modules imported.')
The notMNIST dataset is too large for many computers to handle. It contains 500,000 images for just
training. You'll be using a subset of this data, 15,000 images for each label (A-J).
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 1/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
In [ ]: def download(url, file):
"""
Download file from <url>
:param url: URL to file
:param file: Local file path
"""
if not os.path.isfile(file):
print('Downloading ' + file + '...')
urlretrieve(url, file)
print('Download Finished')
# Download the training and test dataset.
download('https://s3.amazonaws.com/udacity-sdc/notMNIST_train.zip',
'notMNIST_train.zip')
download('https://s3.amazonaws.com/udacity-sdc/notMNIST_test.zip',
'notMNIST_test.zip')
# Make sure the files aren't corrupted
assert hashlib.md5(open('notMNIST_train.zip', 'rb').read()).hexdiges
t() == 'c8673b3f28f489e9cdf3a3d74e2ac8fa',\
'notMNIST_train.zip file is corrupted. Remove the file and
try again.'
assert hashlib.md5(open('notMNIST_test.zip', 'rb').read()).hexdigest
() == '5d3c7e653e63471c88df796156a9dfa9',\
'notMNIST_test.zip file is corrupted. Remove the file and t
ry again.'
# Wait until you see that all files have been downloaded.
print('All files downloaded.')
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 2/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
In [ ]: def uncompress_features_labels(file):
"""
Uncompress features and labels from a zip file
:param file: The zip file to extract the data from
"""
features = []
labels = []
with ZipFile(file) as zipf:
# Progress Bar
filenames_pbar = tqdm(zipf.namelist(), unit='files')
# Get features and labels from all files
for filename in filenames_pbar:
# Check if the file is a directory
if not filename.endswith('/'):
with zipf.open(filename) as image_file:
image = Image.open(image_file)
image.load()
# Load image data as 1 dimensional array
# We're using float32 to save on memory space
feature = np.array(image, dtype=np.float32).flat
ten()
# Get the the letter from the filename. This is the
letter of the image.
label = os.path.split(filename)[1][0]
features.append(feature)
labels.append(label)
return np.array(features), np.array(labels)
# Get the features and labels from the zip files
train_features, train_labels = uncompress_features_labels('notMNIST_
train.zip')
test_features, test_labels = uncompress_features_labels('notMNIST_te
st.zip')
# Limit the amount of data to work with a docker container
docker_size_limit = 150000
train_features, train_labels = resample(train_features, train_labels
, n_samples=docker_size_limit)
# Set flags for feature engineering. This will prevent you from ski
pping an important step.
is_features_normal = False
is_labels_encod = False
# Wait until you see that all features and labels have been uncompre
ssed.
print('All features and labels uncompressed.')
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 3/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
Problem 1
The first problem involves normalizing the features for your training and test data.
Implement Min-Max scaling in the normalize_grayscale() function to a range of a=0.1 and b=0.9.
After scaling, the values of the pixels in the input data should range from 0.1 to 0.9.
Since the raw notMNIST image data is in grayscale, the current values range from a min of 0 to a max of
255.
Min-Max Scaling:
If you're having trouble solving problem 1, you can view the solution here.
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 4/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
In [ ]: # Problem 1 - Implement Min-Max scaling for grayscale image data
def normalize_grayscale(image_data):
"""
Normalize the image data with Min-Max scaling to a range of [0.
1, 0.9]
:param image_data: The image data to be normalized
:return: Normalized image data
"""
# TODO: Implement Min-Max scaling for grayscale image data
### DON'T MODIFY ANYTHING BELOW ###
# Test Cases
np.testing.assert_array_almost_equal(
normalize_grayscale(np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
255])),
[0.1, 0.103137254902, 0.106274509804, 0.109411764706, 0.11254901
9608, 0.11568627451, 0.118823529412, 0.121960784314,
0.125098039216, 0.128235294118, 0.13137254902, 0.9],
decimal=3)
np.testing.assert_array_almost_equal(
normalize_grayscale(np.array([0, 1, 10, 20, 30, 40, 233, 244, 25
4,255])),
[0.1, 0.103137254902, 0.13137254902, 0.162745098039, 0.194117647
059, 0.225490196078, 0.830980392157, 0.865490196078,
0.896862745098, 0.9])
if not is_features_normal:
train_features = normalize_grayscale(train_features)
test_features = normalize_grayscale(test_features)
is_features_normal = True
print('Tests Passed!')
In [ ]: if not is_labels_encod:
# Turn labels into numbers and apply One-Hot Encoding
encoder = LabelBinarizer()
encoder.fit(train_labels)
train_labels = encoder.transform(train_labels)
test_labels = encoder.transform(test_labels)
# Change to float32, so it can be multiplied against the feature
s in TensorFlow, which are float32
train_labels = train_labels.astype(np.float32)
test_labels = test_labels.astype(np.float32)
is_labels_encod = True
print('Labels One-Hot Encoded')
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 5/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
In [ ]: assert is_features_normal, 'You skipped the step to normalize the fe
atures'
assert is_labels_encod, 'You skipped the step to One-Hot Encode the
labels'
# Get randomized datasets for training and validation
train_features, valid_features, train_labels, valid_labels = train_t
est_split(
train_features,
train_labels,
test_size=0.05,
random_state=832289)
print('Training features and labels randomized and split.')
In [ ]: # Save the data for easy access
pickle_file = 'notMNIST.pickle'
if not os.path.isfile(pickle_file):
print('Saving data to pickle file...')
try:
with open('notMNIST.pickle', 'wb') as pfile:
pickle.dump(
'train_dataset': train_features,
'train_labels': train_labels,
'valid_dataset': valid_features,
'valid_labels': valid_labels,
'test_dataset': test_features,
'test_labels': test_labels,
},
pfile, pickle.HIGHEST_PROTOCOL)
except Exception as e:
print('Unable to save data to', pickle_file, ':', e)
raise
print('Data cached in pickle file.')
Checkpoint
All your progress is now saved to the pickle file. If you need to leave and comeback to this lab, you no longer
have to start from the beginning. Just run the code block below and it will load all the data and modules
required to proceed.
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 6/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
In [ ]: %matplotlib inline
# Load the modules
import pickle
import math
import numpy as np
import tensorflow as tf
from tqdm import tqdm
import matplotlib.pyplot as plt
# Reload the data
pickle_file = 'notMNIST.pickle'
with open(pickle_file, 'rb') as f:
pickle_data = pickle.load(f)
train_features = pickle_data['train_dataset']
train_labels = pickle_data['train_labels']
valid_features = pickle_data['valid_dataset']
valid_labels = pickle_data['valid_labels']
test_features = pickle_data['test_dataset']
test_labels = pickle_data['test_labels']
del pickle_data # Free up memory
print('Data and modules loaded.')
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 7/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
Problem 2
Now it's time to build a simple neural network using TensorFlow. Here, your network will be just an input layer
and an output layer.
For the input here the images have been flattened into a vector of features. Then, we're
trying to predict the image digit so there are 10 output units, one for each label. Of course, feel free to add
hidden layers if you want, but this notebook is built to guide you through a single layer network.
For the neural network to train on your data, you need the following float32 tensors:
features
Placeholder tensor for feature data
(train_features/valid_features/test_features)
labels
Placeholder tensor for label data (train_labels/valid_labels/test_labels)
weights
Variable Tensor with random numbers from a truncated normal distribution.
See `tf.truncated_normal()` documentationfor help.
biases
Variable Tensor with all zeros.
See `tf.zeros()` documentation for help.
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 8/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
If you're having trouble solving problem 2, review "TensorFlow Linear Function" section of the class. If that
doesn't help, the solution for this problem is available here.
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 9/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
In [ ]: # All the pixels in the image (28 * 28 = 784)
features_count = 784
# All the labels
labels_count = 10
# TODO: Set the features and labels tensors
# features =
# labels =
# TODO: Set the weights and biases tensors
# weights =
# biases =
### DON'T MODIFY ANYTHING BELOW ###
#Test Cases
from tensorflow.python.ops.variables import Variable
assert features._op.name.startswith('Placeholder'), 'features must b
e a placeholder'
assert labels._op.name.startswith('Placeholder'), 'labels must be a
placeholder'
assert isinstance(weights, Variable), 'weights must be a TensorFlow
variable'
assert isinstance(biases, Variable), 'biases must be a TensorFlow va
riable'
assert features._shape == None or (\
features._shape.dims[0].value is None and\
features._shape.dims[1].value in [None, 784]), 'The shape of fea
tures is incorrect'
assert labels._shape == None or (\
labels._shape.dims[0].value is None and\
labels._shape.dims[1].value in [None, 10]), 'The shape of labels
is incorrect'
assert weights._variable._shape == (784, 10), 'The shape of weights
is incorrect'
assert biases._variable._shape == (10), 'The shape of biases is inco
rrect'
assert features._dtype == tf.float32, 'features must be type float3
2'
assert labels._dtype == tf.float32, 'labels must be type float32'
# Feed dicts for training, validation, and test session
train_feed_dict = {features: train_features, labels: train_labels}
valid_feed_dict = {features: valid_features, labels: valid_labels}
test_feed_dict = {features: test_features, labels: test_labels}
# Linear Function WX + b
logits = tf.matmul(features, weights) + biases
prediction = tf.nn.softmax(logits)
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 10/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
# Cross entropy
cross_entropy = -tf.reduce_sum(labels * tf.log(prediction), reductio
n_indices=1)
# Training loss
loss = tf.reduce_mean(cross_entropy)
# Create an operation that initializes all variables
init = tf.global_variables_initializer()
# Test Cases
with tf.Session() as session:
session.run(init)
session.run(loss, feed_dict=train_feed_dict)
session.run(loss, feed_dict=valid_feed_dict)
session.run(loss, feed_dict=test_feed_dict)
biases_data = session.run(biases)
assert not np.count_nonzero(biases_data), 'biases must be zeros'
print('Tests Passed!')
In [ ]: # Determine if the predictions are correct
is_correct_prediction = tf.equal(tf.argmax(prediction, 1), tf.argmax
(labels, 1))
# Calculate the accuracy of the predictions
accuracy = tf.reduce_mean(tf.cast(is_correct_prediction, tf.float32
))
print('Accuracy function created.')
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 11/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
Problem 3
Below are 2 parameter configurations for training the neural network. In each configuration, one of the
parameters has multiple options. For each configuration, choose the option that gives the best acccuracy.
Parameter configurations:
Configuration 1
Epochs: 1
Learning Rate:
0.8
0.5
0.1
0.05
0.01
Configuration 2
Epochs:
1
2
3
4
5
Learning Rate: 0.2
The code will print out a Loss and Accuracy graph, so you can see how well the neural network performed.
If you're having trouble solving problem 3, you can view the solution here.
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 12/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
In [ ]: # Change if you have memory restrictions
batch_size = 128
# TODO: Find the best parameters for each configuration
# epochs =
# learning_rate =
### DON'T MODIFY ANYTHING BELOW ###
# Gradient Descent
optimizer = tf.train.GradientDescentOptimizer(learning_rate).minimiz
e(loss)
# The accuracy measured against the validation set
validation_accuracy = 0.0
# Measurements use for graphing loss and accuracy
log_batch_step = 50
batches = []
loss_batch = []
train_acc_batch = []
valid_acc_batch = []
with tf.Session() as session:
session.run(init)
batch_count = int(math.ceil(len(train_features)/batch_size))
for epoch_i in range(epochs):
# Progress bar
batches_pbar = tqdm(range(batch_count), desc='Epoch {:>2}/{}
'.format(epoch_i+1, epochs), unit='batches')
# The training cycle
for batch_i in batches_pbar:
# Get a batch of training features and labels
batch_start = batch_i*batch_size
batch_features = train_features[batch_start:batch_start
+ batch_size]
batch_labels = train_labels[batch_start:batch_start + ba
tch_size]
# Run optimizer and get loss
_, l = session.run(
[optimizer, loss],
feed_dict={features: batch_features, labels: batch_l
abels})
# Log every 50 batches
if not batch_i % log_batch_step:
# Calculate Training and Validation accuracy
training_accuracy = session.run(accuracy, feed_dict=
train_feed_dict)
validation_accuracy = session.run(accuracy, feed_dic
t=valid_feed_dict)
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 13/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
# Log batches
previous_batch = batches[-1] if batches else 0
batches.append(log_batch_step + previous_batch)
loss_batch.append(l)
train_acc_batch.append(training_accuracy)
valid_acc_batch.append(validation_accuracy)
# Check accuracy against Validation data
validation_accuracy = session.run(accuracy, feed_dict=valid_
feed_dict)
loss_plot = plt.subplot(211)
loss_plot.set_title('Loss')
loss_plot.plot(batches, loss_batch, 'g')
loss_plot.set_xlim([batches[0], batches[-1]])
acc_plot = plt.subplot(212)
acc_plot.set_title('Accuracy')
acc_plot.plot(batches, train_acc_batch, 'r', label='Training Accurac
y')
acc_plot.plot(batches, valid_acc_batch, 'x', label='Validation Accur
acy')
acc_plot.set_ylim([0, 1.0])
acc_plot.set_xlim([batches[0], batches[-1]])
acc_plot.legend(loc=4)
plt.tight_layout()
plt.show()
print('Validation accuracy at {}'.format(validation_accuracy))
Test
You're going to test your model against your hold out dataset/testing data. This will give you a good indicator
of how well the model will do in the real world. You should have a test accuracy of at least 80%.
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 14/15
10/26/21, 9:45 AM machine-learning/intro_to_tensorflow.ipynb at master · vnikov/machine-learning
In [ ]: ### DON'T MODIFY ANYTHING BELOW ###
# The accuracy measured against the test set
test_accuracy = 0.0
with tf.Session() as session:
session.run(init)
batch_count = int(math.ceil(len(train_features)/batch_size))
for epoch_i in range(epochs):
# Progress bar
batches_pbar = tqdm(range(batch_count), desc='Epoch {:>2}/{}
'.format(epoch_i+1, epochs), unit='batches')
# The training cycle
for batch_i in batches_pbar:
# Get a batch of training features and labels
batch_start = batch_i*batch_size
batch_features = train_features[batch_start:batch_start
+ batch_size]
batch_labels = train_labels[batch_start:batch_start + ba
tch_size]
# Run optimizer
_ = session.run(optimizer, feed_dict={features: batch_fe
atures, labels: batch_labels})
# Check accuracy against Test data
test_accuracy = session.run(accuracy, feed_dict=test_feed_di
ct)
assert test_accuracy >= 0.80, 'Test accuracy at {}, should be equal
to or greater than 0.80'.format(test_accuracy)
print('Nice Job! Test Accuracy is {}'.format(test_accuracy))
Multiple layers
Good job! You built a one layer TensorFlow network! However, you might want to build more than one layer.
This is deep learning after all! In the next section, you will start to satisfy your need for more layers.
https://github.com/vnikov/machine-learning/blob/master/projects/intro-to-tensorflow/intro_to_tensorflow.ipynb 15/15