Deep Learning With Python Sample
Deep Learning With Python Sample
Thank-you for your interest in Deep Learning with Python, Second Edition.
This is just a sample of the full text. You can purchase the complete book online from:
https://machinelearningmastery.com/deep-learning-with-python/
This is Just a Sample ii
Disclaimer
The information contained within this eBook is strictly for educational purposes. If you wish to
apply ideas contained in this eBook, you are taking full responsibility for your actions.
The author has made every effort to ensure the accuracy of the information within this book was
correct at time of publication. The author does not assume and hereby disclaims any liability to any
party for any loss, damage, or disruption caused by errors or omissions, whether such errors or
omissions result from accident, negligence, or any other cause.
No part of this eBook may be reproduced or transmitted in any form or by any means, electronic or
mechanical, recording or by any information storage and retrieval system, without written
permission from the author.
Credits
Authors: Jason Brownlee, Adrian Tam, Zhe Ming Chng
Lead Editor: Adrian Tam
Technical Reviewers: Darci Heikkinen, Jerry Yiu, Amy Lam
Copyright
Deep Learning with Python, Second Edition
© 2016–2022 MachineLearningMastery.com. All Rights Reserved.
Edition: v2.00
Contents
Copyright ii
Introduction vi
Deep learning is a fascinating field. Artificial neural networks have been around for a long
time, but something special has happened in recent years. The mixture of new faster hardware,
new techniques and highly optimized open source libraries allow very large networks to be
created with frightening ease.
This new wave of much larger and much deeper neural networks are also impressively
skillful on a range of problems. I have watched over recent years as they tackle and
handily become state-of-the-art across a range of difficult problem domains. Not least object
recognition, speech recognition, sentiment classification, translation and more.
When a technique comes along that does so well on such a broad set of problems, you have
to pay attention. The problem is where do you start with deep learning? I created this book
because I thought that there was no gentle way for Python machine learning practitioners to
quickly get started developing deep learning models.
In developing the lessons in this book, I chose the best of breed Python deep learning
library called Keras that abstracted away all of the complexity, ruthlessly leaving you an API
containing only what you need to know to efficiently develop and evaluate neural network
models.
This is the guide that I wish I had when I started apply deep learning to machine learning
problems. I hope that you find it useful on your own projects and have as much fun applying
deep learning as I did in creating this book for you.
Jason Brownlee
Melbourne, Australia
2016
Preface to Second Edition
Deep learning is evolving fast, so are the deep learning libraries. If we compare the tools we
have in 2016 and 2022, there is a fascinating change in mere six years.
When the first edition of this book was written, Keras and TensorFlow were separate
libraries. TensorFlow at the time was unintuitive and difficult to use. Nowadays, they are
combined into one and the other backends for Keras such as Theano ceased development.
The eager execution syntax of TensorFlow also brings it to a new age. Therefore, we need to
bring this book up to date with the new code that can run in the modern version of Keras.
Same as the first edition, we try to cover a broad topics in deep learning without going
too deep. We covered enough to help you see why deep learning models are impressive. This
book is intended for practitioners and therefore each chapter is a gentle introduction to an
idea in deep learning without distracting you with too much theory and mathematics. We
also enriched the first edition with some more examples and tools, thanks to the new features
included in the libraries over the past few years.
As in the first edition, we wish this book can help you start applying deep learning quickly.
It is a book with examples that you can copy. By doing so, you should have a jump-start to
use TensorFlow and Keras library for some practical problems. This can be a stepping stone
to something bigger, and your journey starts here!
Jason Brownlee
Melbourne, Australia
Zhe Ming Chng
Singapore
Adrian Tam
New York, U.S.A.
2022
Introduction
Welcome to Deep Learning with Python, Second Edition. This book is your guide to deep
learning in Python. You will discover the Keras library in TensorFlow for deep learning and
how to use it to develop and evaluate deep learning models. In this book you will discover the
techniques, recipes and skills in deep learning that you can then bring to your own machine
learning projects.
Deep learning does have a lot of fascinating math under the covers, but you do not need
to know it to be able to pick it up as a tool and wield it on important projects and deliver real
value. From the applied perspective, deep learning is quite a shallow field and a motivated
developer can quickly pick it up and start making very real and impactful contributions. This
is our goal for you and this book is your ticket to that outcome.
Book Organization
There are three kinds of chapters in this book.
⊲ Lessons, where you learn about specific features of neural network models and how
to use specific aspects of the Keras API.
⊲ Projects, where you will pull together multiple lessons into an end-to-end project and
deliver a result, providing a template for your own projects.
⊲ Recipes, where you can copy and paste the standalone code into your own project,
including all of the code presented in this book.
Part 1: Background
In this part you will learn about the TensorFlow and Keras libraries that lay the foundation
for your deep learning journey. This part of the book includes the following lessons:
⊲ Overview of Some Deep Learning Libraries
⊲ Introduction to TensorFlow
⊲ Using Autograd in TensorFlow to Solve a Regression Problem
⊲ Introduction to the Keras
The lessons will introduce you to the important foundational libraries that you need to install
and use on your workstation. You will also learn about the relationship between TensorFlow
and Keras. At the end of this part you will be ready to start developing models in Keras on
your workstation.
neural networks that you need to know in order to deliver world class results. This part of
the book includes the following lessons:
⊲ Use Keras Deep Leraning Models with scikit-learn
⊲ How to Grid Search Hyperparameters for Deep Learning Models
⊲ Save and Load Your Keras Model with Serialization
⊲ Keep the Best Models During Training with Checkpointing
⊲ Understand Model Behavior During Training by Plotting History
⊲ Using Activation Functions in Neural Networks
⊲ Loss Functions in TensorFlow
⊲ Reduce Overfitting with Dropout Regularization
⊲ Lift Performance with Learning Rate Schedules
⊲ Introduciton to tf.data API
At the end of this part you will know how to confidently wield Keras on your machine learning
projects with a focus on the finer points of investigating model performance, persisting models
for later use and gaining lifts in performance over baseline models.
Conclusions
The book concludes with some resources that you can use to learn more information about a
specific topic or find help if you need it, as you start to develop and evaluate your own deep
learning models.
Recipes
Building up a catalog of code recipes is an important part of your deep learning journey. Each
time you learn about a new technique or new problem type, you should write up a short code
recipe that demonstrates it. This will give you a starting point to use on your next deep
learning or machine learning project.
As part of this book you will receive a catalog of deep learning recipes. This includes
recipes for all of the lessons presented in this book, as well as complete code for all of the
projects. You are strongly encouraged to add to and build upon this catalog of recipes as you
expand your use and knowledge of deep learning in Python.
the cloud. You will be guided as to how to install the deep learning libraries TensorFlow and
Keras in Part I of the book. If you have trouble, you can follow the step-by-step tutorial in
Appendix B.
Machine Learning
You do not need to be a machine learning expert, but it would be helpful if you knew how
to navigate a small machine learning problem using scikit-learn. Basic concepts like cross-
validation and one-hot encoding used in lessons and projects are described, but only briefly.
There are resources to go into these topics in more detail at the end of the book, but some
knowledge of these areas might make things easier for you.
Deep Learning
You do not need to know the math and theory of deep learning algorithms, but it would be
helpful to have some basic ideas of the field. You will get a crash course in neural network
terminology and models, but we will not go into much detail. Again, there will be resources
for more information at the end of the book, but it might be helpful if you can start with
some ideas about neural networks.
Note: All tutorials can be completed on standard workstation hardware with a CPU.
GPU is not required. Some tutorials later in the book can be speed up significantly by
running on the GPU and a suggestion is provided to consider using GPU hardware at the
beginning of those sections. You can access GPU hardware easily and cheaply in the cloud
and a step-by-step procedure is taught on how to do this in Appendix C.
To get the very most from this book, we recommend taking each lesson and project and
build upon them. Attempt to improve the results, apply the method to a similar but different
problem, and so on. Write up what you tried or learned and share it on your blog, social
media or send us an email at jason@MachineLearningMastery.com. This book is really what
you make of it and by putting in a little extra, you can quickly become a true force in applied
deep learning.
Summary
It is a special time right now. The tools for applied deep learning have never been so good.
The pace of change with neural networks and deep learning feels like it has never been so fast,
spurred by the amazing results that the methods are showing in such a broad range of fields.
This is the start of your journey into deep learning and we are excited for you. Take your
time, have fun and we’re so excited to see where you can take this amazing new technology
to.
Next
Let’s dive in. Next up is Part I where you will take a whirlwind tour of the foundation libraries
for deep learning in Python, namely the numerical library TensorFlow and the library you
will be using throughout this book called Keras.
Using Autograd in TensorFlow to
Solve a Regression Problem
3
We usually use TensorFlow to build a neural network. However, TensorFlow is not limited
to this. Behind the scenes, TensorFlow is a tensor library with automatic differentiation
capability. Hence you can easily use it to solve a numerical optimization problem with
gradient descent, which is the algorithm to train a neural network. In this chapter, you will
learn how TensorFlow’s automatic differentiation engine, autograd, works. After finishing
this chapter, you will learn:
⊲ What is autograd in TensorFlow
⊲ How to make use of autograd and an optimizer to solve an optimization problem
Let’s get started.
Overview
This chapter is in three parts; they are:
⊲ Autograd in TensorFlow
⊲ Using Autograd for Polynomial Regression
⊲ Using Autograd to Solve a Math Puzzle
import tensorflow as tf
x = tf.constant([1, 2, 3])
print(x)
print(x.shape)
print(x.dtype)
This creates an integer vector (in the form of Tensor object). This vector can work like a
NumPy vector in most cases. For example, you can do x+x or 2*x, and the result is just what
you would expect. TensorFlow comes with many functions for array manipulation that match
NumPy, such as tf.transpose or tf.concat.
Creating variables in TensorFlow is just the same, for example:
import tensorflow as tf
x = tf.Variable([1, 2, 3])
print(x)
print(x.shape)
print(x.dtype)
The operations (such as x+x and 2*x) that you can apply to Tensor objects can also be applied
to variables. The difference between variables and constants is that the former allows the
value to change while the latter is immutable. This distinction is important when you run a
gradient tape as follows:
import tensorflow as tf
x = tf.Variable(3.6)
dy = tape.gradient(y, x)
print(dy)
This prints:
What it does is the following: This defined a variable x (with value 3.6) and then created a
gradient tape. While the gradient tape is working, it computes y=x*x or y = x2 . The gradient
tape monitored how the variables are manipulated. Afterward, you asked the gradient tape
dy dy
to find the derivative . You know y = x2 means = 2x. Hence the output would give
dx dx
you a value of 3.6 × 2 = 7.2.
import numpy as np
This prints:
2
1 x + 2 x + 3
print(polynomial(1.5))
Now you can generate a number of samples from this function using NumPy:
N = 20 # number of samples
In the above, both X and Y are NumPy arrays of the shape (20,1), and they are related as
y = f (x) for the polynomial f (x).
Now, assume you do not know what the polynomial is, except it is quadratic. And you
want to recover the coefficients. Since a quadratic polynomial is in the form of Ax2 + Bx + C,
you have three unknowns to find. You can find them using the gradient descent algorithm
you implement or an existing gradient descent optimizer. The following demonstrates how it
works:
import tensorflow as tf
XX = np.hstack([X*X, X, np.ones_like(X)])
for _ in range(1000):
with tf.GradientTape() as tape:
y_pred = x @ w
mse = tf.reduce_sum(tf.square(y - y_pred))
grad = tape.gradient(mse, w)
optimizer.apply_gradients([(grad, w)])
print(w)
The print statement before the for loop gives three random number, such as
But the one after the for loop gives you the coefficients very close to that in the polynomial:
3.2 Using Autograd for Polynomial Regression 5
What the above code does is the following: First, it creates a variable vector w of 3 values,
namely the coefficients A, B, C. Then you create an array of shape (N, 3), in which N is the
number of samples in the array X. This array has 3 columns, which are the values of x2 , x,
and 1, respectively. Such an array is built from the vector X using the np.hstack() function.
Similarly, we build the TensorFlow constant y from the NumPy array Y. Afterwards, you use
a for-loop to run gradient descent in 1,000 iterations. In each iteration, you compute x × w
in matrix form to find Ax2 + Bx + C and assign it to the variable y_pred. Then, compare y
and y_pred and find the mean square error. Next, derive the gradient, i.e., the rate of change
of the mean square error with respect to the coefficients w. And based on this gradient, you
use gradient descent to update w.
In essence, the above code is to find the coefficients w that minimizes the mean square
error. Putting everything together, the following is the complete code:
import numpy as np
import tensorflow as tf
N = 20 # number of samples
# Run optimizer
for _ in range(1000):
with tf.GradientTape() as tape:
y_pred = x @ w
mse = tf.reduce_sum(tf.square(y - y_pred))
grad = tape.gradient(mse, w)
optimizer.apply_gradients([(grad, w)])
print(w)
A+B =8
C −D =6
A + C = 13
B+D =8
import tensorflow as tf
import random
A = tf.Variable(random.random())
B = tf.Variable(random.random())
C = tf.Variable(random.random())
D = tf.Variable(random.random())
print(A)
print(B)
print(C)
print(D)
There can be multiple solutions to this problem. One solution is the following:
3.4 Further Reading 7
Which means A = 3.5, B = 4.5, C = 9.5, and D = 3.5. You can verify this solution fits the
problem.
The above code defines the four unknown as variables with random initial values. Then
you compute the result of the four equations and compare it to the expected answer. You
then sum up the squared error and ask TensorFlow to minimize it. The minimum possible
square error is zero, attained when our solution exactly fits the problem.
Note the way the gradient tape is asked to produce the gradient: You ask the gradient
of sqerr respective to A, B, C, and D. Hence four gradients are found. You then apply each
gradient to the respective variables in each iteration. Rather than looking for the gradient in
four different calls to tape.gradient(), this is required in TensorFlow because the gradient of
sqerr can only be recalled once by default.
Articles
Introduction to gradients and automatic differentiation. TensorFlow.
https://www.tensorflow.org/guide/autodiff
Advanced automatic differentiation. TensorFlow.
https://www.tensorflow.org/guide/advanced_autodiff
3.5 Summary
In this chapter, we demonstrated how TensorFlow’s automatic differentiation works. This is
the building block for carrying out deep learning training. Specifically, you learned:
⊲ What is automatic differentiation in TensorFlow
⊲ How you can use gradient tape to carry out automatic differentiation
⊲ How you can use automatic differentiation to solve a optimization problem
In the next chapter, you will learn about the higher-level library for deep learning, Keras.
Develop Your First Neural
Network with Keras
7
Keras is a powerful and easy-to-use free open source Python library for developing and
evaluating deep learning models. It is part of the TensorFlow library and allows you to
define and train neural network models in just a few lines of code. In this chapter, you will
discover how to create your first deep learning neural network model in Python using Keras.
After completing this chapter you will know:
⊲ How to load a CSV dataset ready for use with Keras.
⊲ How to define and compile a Multilayer Perceptron model in Keras.
⊲ How to evaluate a Keras model on a validation dataset.
Let’s get started.
Overview
There is not a lot of code required, but we will go over it slowly so that you will know how to
create your own models in the future. The steps you will learn in this chapter are as follows:
⊲ Load Data
⊲ Define Keras Model
⊲ Compile Keras Model
⊲ Fit Keras Model
⊲ Evaluate Keras Model
⊲ Tie It All Together
⊲ Make Predictions
Download the dataset and place it in your local working directory, the same location as your
Python file. Save it with the filename pima-indians-diabetes.csv. Take a look inside the file;
you should see rows of data like the following:
6,148,72,35,0,33.6,0.627,50,1
1,85,66,29,0,26.6,0.351,31,0
8,183,64,0,0,23.3,0.672,32,1
1,89,66,23,94,28.1,0.167,21,0
0,137,40,35,168,43.1,2.288,33,1
...
You can now load the file as a matrix of numbers using the NumPy function loadtxt(). There
are eight input variables and one output variable (the last column). You will be learning a
model to map rows of input variables (X) to an output variable (y), which is often summarized
as y = f (X). The variables can be summarized as follows:
Input Variables (X):
1. Number of times pregnant
2. Plasma glucose concentration at 2 hours in an oral glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
7.2 Define Keras Model 10
...
# load the dataset
dataset = loadtxt('pima-indians-diabetes.csv', delimiter=',')
# split into input (X) and output (y) variables
X = dataset[:,0:8]
y = dataset[:,8]
...
a sigmoid on the output layer ensures your network output is between 0 and 1, and is easy
to map to either a probability of class 1 or snap to a hard classification of either class with a
default threshold of 0.5. You can piece it all together by adding each layer:
⊲ The model expects rows of data with 8 variables (the input_shape=(8,) argument).
⊲ The first hidden layer has 12 nodes and uses the relu activation function.
⊲ The second hidden layer has 8 nodes and uses the relu activation function.
⊲ The output layer has one node and uses the sigmoid activation function.
...
# define the keras model
model = Sequential()
model.add(Dense(12, input_shape=(8,), activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
...
You are free to change the design and see if you get a better or worse result than the subsequent
part of this chapter. The figure below provides a depiction of the network structure:
Note: The most confusing thing here is that the shape of the input to the model
INFO-CIRCLE is defined as an argument on the first hidden layer. This means that the line
of code that adds the first Dense layer is doing two things, defining the input or
visible layer and the first hidden layer.
7.3 Compile Keras Model 12
...
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
...
...
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
...
...
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
# compile the keras model
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
# fit the keras model on the dataset
model.fit(X, y, epochs=150, batch_size=10)
# evaluate the keras model
_, accuracy = model.evaluate(X, y)
print('Accuracy: %.2f' % (accuracy*100))
Listing 7.7: Complete working example of your first neural network in Keras
You can copy all the code into your Python file and save it as “keras_first_network.py” in the
same directory as your data file “pima-indians-diabetes.csv”. You can then run the Python
file as a script from your command line (command prompt) as follows:
python keras_first_network.py
Running this example, you should see a message for each of the 150 epochs, printing the loss
and accuracy, followed by the final evaluation of the trained model on the training dataset. It
takes about 10 seconds to execute on my workstation running on the CPU. Ideally, you would
like the loss to go to zero and the accuracy to go to 1.0 (e.g., 100%). This is not possible for
any but the most trivial machine learning problems. Instead, you will always have some error
in your model. The goal is to choose a model configuration and training configuration that
achieve the lowest loss and highest accuracy possible for a given dataset.
Note: Your results may vary given the stochastic nature of the algorithm or
Exclamation-Triangle evaluation procedure, or differences in numerical precision. Consider running the
example a few times and compare the average outcome.
...
77/77 [==============================] - 0s 334us/step - loss: 0.4753 - accuracy: 0.7630
Epoch 147/150
77/77 [==============================] - 0s 336us/step - loss: 0.4794 - accuracy: 0.7565
Epoch 148/150
77/77 [==============================] - 0s 330us/step - loss: 0.4786 - accuracy: 0.7630
Epoch 149/150
77/77 [==============================] - 0s 327us/step - loss: 0.4777 - accuracy: 0.7669
Epoch 150/150
77/77 [==============================] - 0s 337us/step - loss: 0.4806 - accuracy: 0.7721
24/24 [==============================] - 0s 368us/step - loss: 0.4675 - accuracy: 0.7786
Accuracy: 77.86
Neural networks are stochastic algorithms, meaning that the same algorithm on the same data
can train a different model with different skill each time the code is run. This is a feature,
not a bug. The variance in the performance of the model means that to get a reasonable
approximation of how well your model is performing, you may need to fit it many times and
7.7 Make Predictions 15
calculate the average of the accuracy scores. For example, below are the accuracy scores from
re-running the example five times:
Accuracy: 75.00
Accuracy: 77.73
Accuracy: 77.60
Accuracy: 78.12
Accuracy: 76.17
You can see that all accuracy scores are around 77%, and the average is 76.924%.
...
# make probability predictions with the model
predictions = model.predict(X)
# round predictions
rounded = [round(x[0]) for x in predictions]
Alternately, you can convert the probability into 0 or 1 to predict crisp classes directly; for
example:
...
# make class predictions with the model
predictions = (model.predict(X) > 0.5).astype(int)
The complete example below makes predictions for each example in the dataset, then prints
the input data, predicted class, and expected class for the first five examples in the dataset.
Listing 7.11: Complete working example of fitting a model in Keras and using it to
make predictions
Running the example does not show the progress bar as before, as the verbose argument has
been set to 0. After the model is fit, predictions are made for all examples in the dataset, and
the input rows and predicted class value for the first five examples is printed and compared
to the expected class value. You can see that most rows are correctly predicted. In fact,
you can expect about 76.9% of the rows to be correctly predicted based on your estimated
performance of the model in the previous section.
[6.0, 148.0, 72.0, 35.0, 0.0, 33.6, 0.627, 50.0] => 0 (expected 1)
[1.0, 85.0, 66.0, 29.0, 0.0, 26.6, 0.351, 31.0] => 0 (expected 0)
[8.0, 183.0, 64.0, 0.0, 0.0, 23.3, 0.672, 32.0] => 1 (expected 1)
[1.0, 89.0, 66.0, 23.0, 94.0, 28.1, 0.167, 21.0] => 0 (expected 0)
[0.0, 137.0, 40.0, 35.0, 168.0, 43.1, 2.288, 33.0] => 1 (expected 1)
Note: Your results may vary given the stochastic nature of the algorithm or
Exclamation-Triangle evaluation procedure, or differences in numerical precision. Consider running the
example a few times and compare the average outcome.
Books
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. MIT Press, 2016.
https://www.amazon.com/dp/0262035618
(Online version at http://www.deeplearningbook.org).
7.9 Summary 17
APIs
Keras official homepage.
https://keras.io
Keras API Reference.
https://keras.io/api/
7.9 Summary
In this chapter, you discovered how to create your first neural network model using the powerful
Keras Python library for deep learning. Specifically, you learned the six key steps in using
Keras to create a neural network or deep learning model step-by-step, including:
⊲ How to load data
⊲ How to define a neural network in Keras
⊲ How to compile a Keras model using the efficient numerical backend
⊲ How to train a model on data
⊲ How to evaluate a model on data
⊲ How to make predictions with the model
Now you have your first Keras model created and you will repeat this work flow for all deep
learning projects. In the next chapter, we will look closer into the evaluation step.
This is Just a Sample
Thank-you for your interest in Deep Learning with Python, Second Edition.
This is just a sample of the full text. You can purchase the complete book online from:
https://machinelearningmastery.com/deep-learning-with-python/