Sanet - St.deep Learning in Practice
Sanet - St.deep Learning in Practice
Sanet - St.deep Learning in Practice
Practice
Deep Learning in
Practice
Mehdi Ghayoumi
Cornell University
First edition published 2022
by CRC Press
6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742
and by CRC Press
2 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN
CRC Press is an imprint of Taylor & Francis Group, LLC
© 2022 Mehdi Ghayoumi
Reasonable efforts have been made to publish reliable data and information, but the
author and publisher cannot assume responsibility for the validity of all materials or the
consequences of their use. The authors and publishers have attempted to trace the copy-
right holders of all material reproduced in this publication and apologize to copyright
holders if permission to publish in this form has not been obtained. If any copyright
material has not been acknowledged, please write and let us know so we may rectify in
any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted,
reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other
means, now known or hereafter invented, including photocopying, microfilming, and
recording, or in any information storage or retrieval system, without written permission
from the publishers.
For permission to photocopy or use material electronically from this work, access www.
copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood
Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC,
please contact mpkbookspermissions@tandf.co.uk.
Trademark notice: Product or corporate names may be trademarks or registered trade-
marks and are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging-in-Publication Data
Names: Ghayoumi, Mehdi, author.
Title: Deep learning in practice / Mehdi Ghayoumi.
Description: First edition. | Boca Raton : CRC Press, 2022. | Includes
bibliographical references and index.
Identifiers: LCCN 2021030250 | ISBN 9780367458621 (hardback) | ISBN
9780367456580 (paperback) | ISBN 9781003025818 (ebook)
Subjects: LCSH: Deep learning (Machine learning)
Classification: LCC Q325.73 .G43 2022 | DDC 006.3/1--dc23
LC record available at https://lccn.loc.gov/2021030250
ISBN: 978-0-367-45862-1 (hbk)
ISBN: 978-0-367-45658-0 (pbk)
ISBN: 978-1-003-02581-8 (ebk)
DOI: 10.1201/9781003025818
Typeset in Minion
by SPi Technologies India Pvt Ltd (Straive)
Front cover image: One of the applications of Deep Learning is to create new things
(objects, faces, image, dreams, etc.) that do not exist. The cover image has been designed
in that spirit.
Contents
Preface, xv
Acknowledgments, xvii
Author, xix
Chapter 1 ◾ Introduction 1
1.1 WHAT IS LEARNING? 1
1.2 WHAT IS MACHINE LEARNING? 1
1.3 WHAT IS DEEP LEARNING? 2
1.4 ABOUT THIS BOOK! 2
1.4.1 Introduction 2
1.4.2 Python /NumPy 3
1.4.3 TensorFlow and Keras Fundamentals 3
1.4.4 Artificial Neural Networks (ANNs)
Fundamentals and Architectures 3
1.4.5 Deep Neural Networks (DNNs) Fundamentals
and Architectures 3
1.4.6 Deep Neural Networks for Images and Audio
Data Analysis 3
1.4.7 Deep Neural Networks for Virtual Assistant Robots 3
1.4.8 Finding the Best Model? 4
4.5.5 Adagrad 64
4.5.6 Adam 64
4.6 LINEAR AND NONLINEAR FUNCTIONS 65
4.6.1 Linear Functions 65
4.6.2 Nonlinear Functions 67
4.7 ANNS ARCHITECTURES 68
4.7.1 Feed Forward Neural Networks (FFNNs) 68
4.7.1.1 FFN Example in TensorFlow 70
4.7.2 Backpropagation 72
4.7.3 Single-Layer Perceptron 72
4.7.4 Multi-Layer Perceptron (MLP) 73
4.7.4.1 MLP Example in TensorFlow 73
BIOGRAPHY, 189
INDEX, 195
Preface
xvii
Author
xix
CHAPTER 1
Introduction
1.1 WHAT IS LEARNING?
There are several definitions for learning, and maybe we can say, “Learning
is the achievement of knowledge or skills through study or getting experi-
ence.” Humans need to promote their social and personal situations and
conditions to make their life better. For this purpose, in every moment,
people face a decision-making situation. For everyone, getting new knowl-
edge and learning new skills help make better decisions and help people
be more successful in their personal and professional life. There are sev-
eral research and studies to know about the humans learning process. One
of the most important parts of these research and studies is discovering
how learning can be simulated in machines and how it can be more accu-
rate and effective.
DOI: 10.1201/9781003025818-1 1
2 ◾ Deep Learning in Practice
1.4.1 Introduction
This chapter (current chapter) reviews the book chapters and gives brief
descriptions of each chapter’s contents.
Introduction ◾ 3
1.4.2 Python/NumPy
Python is one of the most interesting programming languages with sev-
eral libraries for many applications, and it is easy to learn. Also, NumPy
has some features (especially for arrays and matrices) that make the com-
putation faster and easier. This book has a quick review of these tools to
help reader to understand the projects better.
Python/NumPy
Fundamentals
2.1 PYTHON
2.1.1 Variables
A place of the memory to store data. It has a name to access it or change it.
The syntax is:
variable = expression
There are four main types of variables.
TABLE 2.1 Data Types in Python
Type Example 1 Example 2 Example 3
Integers 30 0 –10
Real number 3.14 0.0 –3.14
Strings “Hello” “” “77”
Boolean True False
Python can recognize the type of data when you assign it to the vari-
able. There are some conversion functions like int, float, str, and bool to
change the type of data. For example, a conversion command converts the
integer to the new data type when the type is an integer. For example, str
(30) ➔ 30, at first, the type of the 30 is an integer, and after conversion, the
type of 30 is a string. Naming variables is based on these rules:
1. It can contain only numbers, letters, and underscores.
2. It cannot start with a number.
DOI: 10.1201/9781003025818-2 5
6 ◾ Deep Learning in Practice
3. It is cAsE-sEnSiTiVe.
4. It uses camelCase.
5. It never contains spaces.
6. It often uses underscore in names with multiple words.
If you don’t follow these rules, you will get some errors. In your module
and projects, the variable name should have some meaning related to its
values (it is better to choose descriptive names).
2.1.2 Keywords
You cannot use keywords for naming variables. However, you can remem-
ber these words, or have their list when the interpreter gives you an error
and then check the list.
There are also comparison operators that their outputs are Boolean values.
Here are the comparison operators:
1. import statements,
2. assignment statement,
3. if statements,
4. for and while statements.
There are some methods like input function (it allows providing a
prompt string) to get user data for making the program more interactive.
Example:
m = input ("Please enter your favorite number")
Here, after this line, the interpreter waits for the input value and puts
the value in m.
The input returns string (even if the user puts the number). The solution is
to convert the string to integer or float using int or float casting.
Example:
n=int(m)
print(n)
The type of m is string (even if the user enters the number), and the type
of n is an integer. You cannot cast character to int or float. It is valid to
assign variables several times (the current value is the last assigned value).
The different variables can refer to the same place of the memory.
2.1.5 Sequence
A sequence is an “ordered set of things,” Or “a group of items stored
together in a collection.”
Examples:
• a "range" of numbers,
• string variables,
• lists,
• tuples.
2.1.6 For Loop
This technique allows us to repeat a block (set) of the codes for a specific
time. It is a determined loop in which the programmer defines the loop
iteration at the beginning of the code. Its syntax is as follows:
2.1.7 While Loop
It is an undetermined loop that its iteration checks in the block of the loop.
This is the syntax:
(The block of the loop will continue until the conditions are true)
while condition:
statements
…
2.1.8 String
The string is one of the data types that consist of some pieces, which are
characters. It is a sequence of characters (there is an empty string if its
length is zero). Mathematical operations do not work on strings even if
their forms are like numbers (Except for + and *, which concatenates two
strings and repetition). The characters can be accessed by their index.
From left to right, it starts at 0, and for accessing from right to left, it starts
with –1. The string is an object in python that, same as other objects, has
some methods. For example, upper and lower are the methods that con-
vert a string to the uppercase and lowercase characters. Here are some
string methods that are used more.
2.1.9 List
It is a collection of the data (elements) that are indexed. It is like a string,
but here the elements can be any type of data in python. For creating a list,
you can use a square bracket []. You can access the list elements by using
the index that it starts with 0. You can also access using negative indexing
like the string. You can use in and not in to check an element that exists
or not in a list. + makes concatenation of two lists and * makes repetition
of lists. There are the same concepts as the string for slicing. By using the
del, you can remove elements from a list. You can use comparison opera-
tors either is or is not for making comparisons. You can use + and * on the
list name as references to the list. There are some methods for lists that
process the lists easier. Table 2.7 shows the methods that you can use when
you are working with lists.
2.1.10 Dictionary
It is an unordered, changeable, and indexed collection. You can demon-
strate it with curly brackets, and the key and values can represent its
elements.
Example:
mydict = {
"python": "best",
"Programming": "1",
"language": 1999
}
print(mydict)
Output:
{'language': 1999, 'Programming': '1', 'python': 'best'}
You can access the dictionary values by putting the value of key or get ().
Example:
print(mydict["python"])
Output:
best
By using the key name, you can assign a new value to the key:
Example:
mydict["python"] =1999
print(mydict["python"])
Output:
1999
You can use the for loop to access the keys and values of a dictionary.
Example:
for x in mydict:
print(x)
Output:
Programming
python
language
Python/NumPy Fundamentals ◾ 13
2.1.11 Tuple
A tuple is a python collection with these features: ordered, unchangeable,
and you can demonstrate it with round brackets ().
Example:
mytuple = ("test", 7, "tuple")
print(mytuple)
Output:
('test', 7, 'tuple')
Example:
mytuple = ("test", 7, "tuple")
print(mytuple[-1])
Output:
tuple
The indexing range can be positive or negative. It has start (included) and
end (is not included).
Example:
mytuple = ("test", 7, "tuple")
print(mytuple[1:3])
Output:
(7, 'tuple')
Example:
mytuple = ("test", 7, "tuple")
new_mytuple = list(mytuple)
new_mytuple[0] = "change"
mytuple = tuple(new_mytuple)
print(mytuple)
14 ◾ Deep Learning in Practice
Output:
('change', 7, 'tuple')
Example:
mytuple = ("test", 7, "tuple")
for x in mytuple:
print(x)
Output:
test
7
tuple
You cannot remove an item of a tuple, but you can delete it completely.
Example:
mytuple = ("test", 7, "tuple")
del mytuple
print(mytuple)
Output:
Traceback (most recent call last): File "./program.py",
line 3, in NameError: name 'mytuple' is not defined
Example:
tuple1 = ("test")
tuple2 = ("7")
tuple3= ("tuple")
tuple4= tuple1 + tuple2+tuple3
print(tuple4)
Output:
test7tuple
Example:
mytuple = ("test", 7, "tuple")
if "test" in mytuple:
print ("Yes, the item is in the tuple")
Output:
Yes, the item is in the tuple
2.1.12 Sets
Set is a collection of data that has these features: unordered, unindexed,
and written with curly brackets {}.
Example:
myset = {"test", 7, "set"}
print(myset)
Output:
{'test', 'set', 7}
You cannot access the elements of a set using an index because it is unor-
dered and unindexed. However, you can use for loop for accessing set
values.
Example:
myset = {"test", 7, "set"}
for x in myset:
print(x)
Output:
set
test
7
Example:
myset = {"test", 7, "set"}
myset.add("learning")
print(myset)
16 ◾ Deep Learning in Practice
Output:
{'learning', 'test', 'set', 7}
For adding more than one item, you can use update(). We remove an item
of a set by using remove() or discard(). The remove() raises an error if the
item does not exist, but the discard() does not.
Example:
myset = {"test", 7, "set"}
myset.remove("test")
print(myset)
Output:
{'set', 7}'t', 7}
{'set', 7}'t', 7}
POP() removes the last item; however, because the set is unordered when
you use pop(), you don’t know which item exactly will be removed!
Example:
myset = {"test", 7, "set"}
myset.pop()
print(myset)
Output:
{'test', 7}
Example:
myset = {"test", 7, "set"}
myset.clear()
Output:
Set()
Python/NumPy Fundamentals ◾ 17
2.1.13 Function
A function is a block of the codes that do specific tasks. They are used to
decrease computation costs. You can create it one time and use it anytime
you need it by calling the function. Its syntax is as follows:
def name (parameters):
statements
These are some functions that are built-in in python, and you can use
them. You already used some of them (like print!). There are some librar-
ies in python which, after importing them, we can use the library’s meth-
ods and modules. Some functions return value. A function also can call
other functions. You can call and execute all functions in the main func-
tion. It does not need any parameters. Python interpreter defines special
variables (for example, __name__ that is automatically set to the string
value “__main__”) before executing the program.
2.1.14 File
It is a way to store the data. There are two tasks here, reading and writing,
and there are two types of files, text, and binary. In python, we should
open and close the file before and after we use it.
open(myfilename,'r')
filevariable.close()
Assume there is a text file(test.txt). For reading the file at first, you should
open the file and then read it. You can iterate file data using loop
commands.
The syntax is as follows:
We can write to the file by opening a file, writing, and then closing it.
18 ◾ Deep Learning in Practice
2.1.15 Object (class)
Python supports Object-Oriented Programming (OOP). For each object,
there are two main parts: states and methods. For creating an object, use
the class keyword. Use the class name that you already created and then
assign value to their parameters. For real-world application classes, every
class has one __init__ function to initiate the class parameters. Methods
are the functions that belong to the class. Self is an instance of the current
object, and by using self, you can access the variables which belong to the
class. You can use any name for self (here myobject). In addition, you can
modify the properties of the object.
2.2 NUMPY
It is a python library that makes working with the array easier. Most of the
data computations in deep learning algorithms are with an array, and
NumPy can help us make them faster (however, the python list does the
array computation, the NumPy module is up to 50 times faster). It uses the
ndarray object and its modules for array processing.
After installing NumPy, it can be imported by:
import numpy as np
2.2.1 Create Array
For using an array by numpy, you should create it.
import numpy as np
a= np.array([1, 2, 3, 4])
2.2.2 ndarray
For creating a ndarray, we can pass datatypes like tuple or array into an
array (), and then it will be converted to a ndarray. One of the key points
here is the dimension value.
Look at these examples:
Example:
import numpy as np
# 0 Dimension
arr0 = np.array(7)
# 1 Dimension
arr1 = np.array([5, 6, 7])
Python/NumPy Fundamentals ◾ 19
# 2 Dimension
arr2 = np.array([[1, 2, 3, 4], [5, 6, 7]])
# 3 Dimension
arr3 = np.array([[[10, 20, 30], [4, 5, 6, 7]], [[1, 2, 3],
[4, 5, 6, 7]]])
Example:
print(arr1.ndim)
Output:
>>>1
2.2.3 Access Elements
You can access the array elements using an index. The first index is 0, and
the second one is 1, etc. For accessing the array elements in 2D or ND
array, you can separate the index by a comma for each dimension.
Example:
import numpy as np
arr1 = np.array([[[10, 20, 30, 40], [5, 6, 7, 8]],
[9, 10, 11, 12], [2, 3, 4, 5]])
print(myarr[0, 0, 2])
Output:
>>>30
Example:
The first index represents the first element in arr1:
Example:
The second index represents the first element in arr2:
arr3=[1, 2, 3, 4]
20 ◾ Deep Learning in Practice
Example:
The last index represents the third element in arr3:
2.2.4 Array Slicing
Here is the general syntax for slicing:
[start: end: step]
There are some default values:
Example:
import numpy as np
myarr = np.array([10, 20, 30, 40, 50, 60, 70])
print(arr[1:4:1])
Output:
>>> [20 30 40]
Example:
import numpy as np
myarr = np.array([[10, 20, 30, 40, 50], [60, 70, 80,
90, 10], [110, 120, 130, 140, 150]])
print(myarr[2, 0:4])
Output:
>>> [110 120 130 140]
The first part shows that the first element to the third one should be pro-
cessed, and the slice is from the second element to the third one.
Python/NumPy Fundamentals ◾ 21
Example:
import numpy as np
myarr = np.array([[10, 20, 30, 40, 50], [60, 70, 80,
90, 100], [110, 120, 130, 140, 150]])
print(myarr[0:3, 1:3])
Output:
>>> [[20 30] [70 80] [120 130]]
2.2.5 Data Type
Here are the datatypes in NumPy:
Example:
import numpy as np
myarr = np.array([10, 20, 30, 40, 50, 60, 70])
x = myarr.copy()
arr[0]=7
y = arr.view()
print(x)
22 ◾ Deep Learning in Practice
print(y)
print(x.base)
print(y.base)
Output:
[10 20 30 40 50 60 70]
[7 20 30 40 50 60 70]
None
[7 20 30 40 50 60 70]
Example:
import numpy as np
myarr = np.array([11, 12, 13, 14, 15, 16, 17, 18,
90, 100, 110, 120, 130, 140])
print(myarr.shape)
newarr1 = myarr.reshape(2, 7, 1)
newarr2 = myarr.reshape(1, 7, 2)
print(newarr1)
print(newarr2)
Output:
(14,)
[[[11] [12] [13] [14] [15] [16] [17]] [[18] [90]
[100] [110] [120] [130] [140]]]
[[[11 12] [13 14] [15 16] [17 18] [90 100][110 120] [130
140]]]
2.2.8 Array Iterating
Each dimension is presented by a for loop.
Example:
import numpy as np
myarr = np.array([[[10, 20, 30, 40], [50, 60, 70,
80]], [[90, 100, 110, 120], [130, 140, 150, 160]]])
Python/NumPy Fundamentals ◾ 23
for x in arr:
for y in x:
for z in y:
print(z)
Output:
11,12,13,14,15,16,17,80,90,100,110,120,130,140,150,160
Example:
nditer() does the same things.
import numpy as np
myarr = np.array([[[10, 20, 30, 40], [50, 60, 70,
80]], [[90, 100, 110, 120], [130, 140, 150, 160]]])
for x in np.nditer(myarr):
print(x)
Output:
11,12,13,14,15,16,17,80,90,100,110,120,130,140,150,160
2.2.9 Joining Array
For joining array, some methods like concatenate, stack, joining array and
hstack, stacking along a row, vstack for stacking, columns and dstack for
joining along with the height.
Example:
import numpy as np
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arrn1 = np.concatenate((arr1, arr2))
arrn2 = np.stack((arr1, arr2), axis=1)
arrn3 = np.hstack((arr1, arr2))
arrn4 = np.vstack((arr1, arr2))
arrn5 = np.dstack((arr1, arr2))
print(arrn1)
print(arrn2)
print(arrn3)
print(arrn4)
print(arrn5)
24 ◾ Deep Learning in Practice
Output:
[1 2 3 4]
[[1 3] [2 4]]
[1 2 3 4]
[[1 2] [3 4]]
[[[1 3] [2 4]]]
2.2.10 Splitting Array
Here array_split splits the array into n parts.
Example:
import numpy as np
myarr = np.array([[10, 20], [30, 40], [50, 60], [70,
80], [90, 100], [110, 120],[130, 140]])
newarr = np.array_split(myarr, 2)
print(newarr)
it splits the array to two parts (n=2).
Output:
[array([[10, 20],[30, 40],[50, 60],[70, 80]]),
array([[90, 100],[110, 120],[130, 140]])]
2.2.11 Searching Arrays
where () is used to find the index of the elements in the array.
Example:
import numpy as np
myarr = np.array([10, 20, 30, 40, 50, 60, 70, 20,
40, 50, 70])
x = np.where(myarr == 70)
print(x)
Output:
(array([6, 10], dtype=int64),)
2.2.12 Sorting Arrays
sort () put the elements in the array in the ordered sequence.
Python/NumPy Fundamentals ◾ 25
Example:
import numpy as np
arr1 = np.array([3, 2, 0, 1])
arr2 = np.array([[3, 2, 4], [5, 0, 1]])
arr3 = np.array(['banana', 'cherry', 'apple'])
print(np.sort(arr1))
print(np.sort(arr2))
print(np.sort(arr3))
Output:
[0 1 2 3]
[[2 3 4] [0 1 5]]
['apple' 'banana' 'cherry']
2.2.13 Filter Array
The array elements are filtered using boolean indexing.
Example:
import numpy as np
arr = np.array([1, 2, 3, 4])
x = [True, False, False, True]
newarr = arr[x]
print(newarr)
Output:
[1 4]
2.2.14 Random Numbers
There is a random module that has some methods like randint(), rand(),
and choice(). randint() and rand() are used to create random numbers in
a range, and choice() is used to choose an array element randomly.
Example:
from numpy import random
x1= random.rand()
x2= random.rand(2)
x3= random.rand(1, 2)
26 ◾ Deep Learning in Practice
x4= random.randint(2)
x5= random.rand(1, 2)
x6=random.randint(10, size=(5))
x7 = random.choice([1,2,3,4,5,6,7,9], size=(1,2))
print(x1)
print(x2)
print(x3)
print(x4)
print(x5)
print(x6)
print(x7)
Output:
0.9991215682671595
[0.29059623 0.67536595]
[[0.69502737 0.99873108]]
0
[[0.27185965 0.13683672]]
[8 5 9 0 4]
[[1 3]]
2.2.15 Array Vectorization
Convert data to the vector-based operation, for example, for adding; look
at these examples for adding x and y. Both give you the same results, but
using the NumPy makes the computation faster and easier.
Example:
import numpy as np
x = [1, 2, 3, 4]
y = [5, 6, 7, 8]
z = []
t = np.add(x, y)
for i, j in zip(x, y):
z.append(i + j)
print(t)
print(z)
Output:
[6 8 10 12]
[6, 8, 10, 12]
Python/NumPy Fundamentals ◾ 27
Example:
np.zeros((2, 2))
np.ones((2, 2))
Example:
a = np.array([l, 2, 3])
b = np.array([4, 5, 6])
print('Horizontally:', np.hstack((a, b)))
print('Vertically:', np.vstack((a, b)))
Example:
normal_array = np.random.normal(1, 0.7, 5)
print(normal _ array)
Output:
[1.35834557 0.17359591 1.35380164 0.65828157 0.10857762]
28 ◾ Deep Learning in Practice
2.2.19 Mathematical Functions
It supports most statistical methods as follows:
2.2.21 Determinant
You can use np.linalg.det to find the determinant of a matrix.
CHAPTER 3
DOI: 10.1201/9781003025818-3 29
30 ◾ Deep Learning in Practice
1. data preprocessing,
2. building a model, and
3. training and testing the model.
There are two types of tensor: (1) Constant and (2) Variable, and Tensor
has some features like:
3D 2D 1D
Size is the total number of items in the tensor, and it is the product
shape vector, for example, in Figure 3.1: 3D: 4*5*4=80, 2D: 4*4=16,
and 1D: 4
3.2 TENSORS
Tensor is the base of the TensorFlow and is a vector of an n-dimensional
matrix. It is used to show data. A tensor has three main properties:
There are nodes and edges in TensorFlow that edges transfer the scaled
values from the nodes in the current level to the nodes in the next level.
Look at this example:
K = (3x + y)/y + 7
K=m/n
m=3x+y n=y+7
x y
add them up in the new node in the next level. The current values of the
node are assigned to m. Also, for another node, the y is added up by 7, and
the value will be assigned to n. The new node values will be transferred to
the next node at the next level with a new operation. (The first step in
TensorFlow is defining a tensor.)
Let’s learn it more with some examples. For a float 0D tensor (scalar):
Example:
r0_tensor = tf.constant(7)
print(r0 _ tensor)
Output:
tf.Tensor(7, shape=(), dtype=int32)
Example:
r1_tensor = tf.constant([1.0, 2.0, 3.0])
print(r1 _ tensor)
Output:
tf.Tensor([1. 2. 3.], shape=(3,), dtype=float32)
TensorFlow and Keras Fundamentals ◾ 33
Example:
rank_2_tensor = tf.constant([[1, 2],[3, 4],[5, 6]],
dtype=tf.float16)
print(rank _ 2 _ tensor)
Output:
tf.Tensor([[1. 2.][3. 4.] [5. 6.]], shape=(3, 2),
dtype=float16)
Example:
r3_tensor = tf.constant([[[1, 2, 3, 4, 5],[6, 7,
8, 9, 10]],[[11, 12, 13, 14, 15],[16, 17, 18, 19,
20]],[[21, 22, 23, 24, 25],[26, 27, 28, 29, 30]],])
print(r3 _ tensor)
Output:
tf.Tensor([[[ 1 2 3 4 5] [6 7 8 9 10]] [[11
12 13 14 15][16 17 18 19 20]] [[21 22 23 24 25]
[26 27 28 29 30]]], shape=(3, 2, 5), dtype=int32)
Let’s find the shape, size, and dimension of the matrix using TensorFlow:
Example:
r3D_tensor = tf.zeros([4, 5, 4])
print("Type of every element:", r3D_tensor.dtype)
print("Number of dimensions:", r3D_tensor.ndim)
print("Shape of tensor:", r3D_tensor.shape)
print("Elements along axis 0 of tensor:", r3D_
tensor.shape[0])
print("Elements along the last axis of tensor:",
r3D_tensor.shape[-1])
print("Total number of elements (4*4*5): ",
tf.size(r3D _ tensor).numpy())
34 ◾ Deep Learning in Practice
Output:
Type of every element: <dtype: 'float32'>
Number of dimensions: 3
Shape of tensor: (4, 5, 4)
Elements along axis 0 of tensor: 4
Elements along the last axis of tensor: 4
Total number of elements (4*4*5): 80
3.3 TENSORFLOW
There are three main steps when you are using TensorFlow:
1. variable definition,
2. computation definition, and
3. operation execution.
Example:
X1 = tf.constant([1, 3, 5])
X2 = tf.constant([2, 4, 6])
Example:
multiply= tf.multiply(X1, X2)
Example:
print(multiply)
Output:
tf.Tensor([2 12 30], shape=(3,), dtype=int32)
Example:
a = tf.constant([[2, 5],[1, 4]])
TensorFlow and Keras Fundamentals ◾ 35
Output:
tf.Tensor([[3 6][2 5]], shape=(2, 2), dtype=int32)
tf.Tensor([[2 5][1 4]], shape=(2, 2), dtype=int32)
tf.Tensor ([[7 7][5 5]], shape=(2, 2), dtype=int32)
Example:
a = tf.constant([[12, 10],[2.,10.]])
x = tf.constant([[1.,0.],[0.,1.]])
b = tf.Variable(2.)
y = tf.matmul(a, x) + b
print(y.numpy())
Output:
[[14. 12.] [4. 12.]]
import tensorflow as tf
from tensorflow import keras as ks
Output:
Training Images Dataset Shape: (60000, 28, 28)
No. of Training Images Dataset Labels: 60000
Test Images Dataset Shape: (10000, 28, 28)
No. of Test Images Dataset Labels: 10000
Output:
Step 1: Training
Train the model using the training data and the defined model.
optimizer = 'adam'
loss_function = 'sparse_categorical_crossentropy'
metric = ['accuracy']
nn_model.compile(optimizer=optimizer,
loss=loss_function,metrics=metric)
nn _ model.fit(training _ images, training _ labels,
epochs=20)
38 ◾ Deep Learning in Practice
Output:
….
1875/1875 [===========] - 2
s 967us/step - loss:
0.1831 - accuracy: 0.9317
Epoch 19/20
1875/1875 [===========] -
2s 1ms/step - loss: 0.1779
- accuracy: 0.9339
Epoch 20/20
1875/1875 [===========] -
2s 978us/step - loss:
0.1765 - accuracy: 0.9345
…
Step 2: Evaluation
Evaluate model using a part of the original data as evaluation data that
give us some estimates about the model before testing and using with
real-world problems:
Output:
1875/1875 [============] - 1s 639us/step - loss:
0.1694 - accuracy: 0.9378
Training Data Accuracy 0.94
Step 3: Testing
Output:
313/313 [==============] - 0
s 1ms/step - loss:
0.3833 - accuracy: 0.8824
Test Data Accuracy 0.88
TensorFlow and Keras Fundamentals ◾ 39
3.5.1 Dataset
Upload the data at the first step. We are using the dataset in the previous
example (Fashion-MNIST).
3.5.2 Input Layer
Define the parameter of the input layer like activation function and data
shape. You should design these parts before start coding.
output_activation_function = 'softmax'
dnn_model = ks.models.Sequential()
dnn _ model.summary()
dnn_model.add(ks.layers.Flatten(input_shape=input_
data_shape, name='Input_layer'))
dnn_model.add(ks.layers.Dense(256, activation=hidden_
activation_function, name='Hidden_layer_1'))
# first Layer
pool1 = tf.compat.v1.layers.max_
pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)
dnn_model.add(ks.layers.Dense(192, activation=hidden_
activation_function, name='Hidden_layer_2'))
# Second Layer
pool2 = tf. compat.v1.layers.max_
pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)
dnn _ model.add(ks.layers.Dense(128, activation=hidden _
activation _ function, name='Hidden _ layer _ 3'))
3.5.4 Dense Layer
Define the output layer parameters.
dnn _ model.summary()
TensorFlow and Keras Fundamentals ◾ 41
Output:
Layer (type) Output Shape Param #
====================================================
Input_layer (Flatten) (None, 784) 0
____________________________________________________
Hidden_layer_1 (Dense) (None, 256) 200960
____________________________________________________
Hidden_layer_2 (Dense) (None, 192) 49344
____________________________________________________
Hidden_layer_3 (Dense) (None, 128) 24704
____________________________________________________
Output_layer (Dense) (None, 10) 1290
====================================================
Total params: 276, 298
Trainable params: 276, 298
Non-trainable params: 0
Output:
1875/1875 [==========] -
2s 1ms/step - loss: 0.1607
- accuracy: 0.9387
Training Data Accuracy 0.94
Output:
313/313 [==============================] - 0s 2ms/
step - loss: 0.3824 - accuracy: 0.8921
Test Data Accuracy 0.89
• Linux/Mac
• Windows
.\env\Scripts\activate
TensorFlow and Keras Fundamentals ◾ 43
3.6.3 Python Libraries
You may need these python dependencies in your project to install them
by using pip or pip3. Here we review some of these requirements with
examples to install Keras and use it and its libraries.
• Numpy
• Pandas
• Scikit-learn
• Matplotlib
• Scipy
• Seaborn
after finishing all parts of your project, you can quit the environment.
deactivate
Please check your python version and its requirements for using pip or
pip3. Creating, activating, and deactivating the virtual environment helps
you organize your codes and modules for different projects.
44 ◾ Deep Learning in Practice
3.6.4 Available Modules
These are available modules in Keras that you should learn to use and set
when you plan to create a model with Keras:
• NumPy
Import numpy as np
• Utilities
These are the most important libraries that we almost use in all our proj-
ects when we are using Keras.
• Compile:
It is used to configure the learning model
Example:
compile(
optimizer,
loss = None,
metrics = None,
loss_weights = None,
sample_weight_mode = None,
weighted_metrics = None,
target_tensors = None
)
model.compile(loss = 'mean_squared_error', optimizer
= 'sgd', metrics = [metrics.categorical_accuracy])
• Fit:
Example:
model.fit(X, y, epochs = , batch _ size = )
46 ◾ Deep Learning in Practice
where:
• Evaluate:
Use it to do evaluation using evaluation or test data. It is a module of the
model, and its main inputs are the input data and their labels.
Example:
score = model.evaluate(x _ test, y _ test, verbose = 0)
• Predict:
Example:
predict(
x,
batch_size = None,
verbose = 0,
steps = None,
callbacks = None,
max_queue_size = 10,
workers = 1,
use_multiprocessing = False
)
• Model: There are two main sequential models to implement the sim-
ple model and function or implementing the complex model.
• Layer: There are some layers like convolution, pooling, and recurrent
layers.
TensorFlow and Keras Fundamentals ◾ 47
3.7.1 MNIST Example
There are eight main steps that we explain them with code example on
MNIST dataset data.
else:
x_train = x_train.reshape(x_train.shape[0], img_
rows, img_cols, 1)
x_test = x_test.reshape(x_test.shape[0], img_rows,
img_cols, 1)
input_shape = (img_rows, img_cols, 1)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
y_train = keras.utils.to_categorical(y_train, 10)
y _ test = keras.utils.to _ categorical(y _ test, 10)
model = Sequential()
model.add(Conv2D(32, kernel_size = (3, 3),
activation = 'relu', input_shape = input_shape))
model.add(Conv2D(64, (3, 3), activation = 'relu'))
model.add(MaxPooling2D(pool_size = (2, 2)))
model.add(Dropout(0.25)) model.add(Flatten())
model.add(Dense(128, activation = 'relu'))
model.add(Dropout(0.5))
model.add(Dense(10, activation = 'softmax'))
model.compile(loss = keras.losses.categorical _
crossentropy, optimizer = keras.optimizers.Adadelta(),
metrics = ['accuracy'])
model.fit(
x_train, y_train,
batch_size = 128,
epochs = 12,
TensorFlow and Keras Fundamentals ◾ 49
verbose = 1,
validation_data = (x_test, y_test)
)
pred = model.predict(x_test)
pred = np.argmax(pred, axis = 1)[:5]
label = np.argmax(y_test,axis = 1)[:5]
print(pred)
print(label)
CHAPTER 4
Artificial Neural
Networks (ANNs)
Fundamentals and
Architectures
4.1 TERMINOLOGY
Here, we review some terminologies that we use in Chapters 4-8.
4.1.1 Inputs
The learning algorithms (here, NNs) use data for training the models.
These data are input data, and usually, we show them as a vector of data
sample with x. A set of x vectors gives us X as a matrix of inputs (N). Each
vector sample x has 1…m features that are the data values of each vector.
For example, suppose we have N=1000 apples, and each sample represents
a vector of data X= [x1=price x2=color x3=shape]. The m is the number of
features in the vector data sample that its value is 3.
4.1.2 Weights
The weights are the values that are updated by the learning algorithms to
train the model. Usually, each wij is the vector weights between two nodes
(i and j). The set of weights gives us a matrix that we show by W.
DOI: 10.1201/9781003025818-4 51
52 ◾ Deep Learning in Practice
4.1.3 Outputs
The learning algorithms in the last step give us a set of data as outputs. We
usually show these values using y as a vector of output values y j, (j =1…k).
For example, if the outputs are 0 (apple) and 1 (non-apple), the y dimen-
sion is k=2, and it is a binary classification problem.
4.1.4 Targets
These are the values that their dimension is the same as outputs (k), and
we need them in the supervised learning algorithms. Moreover, they give
us the correct values that the model should provide as outputs (target)
values.
4.1.5 Activation Function
We use some mathematical functions to let the input of each layer pass
(make output) and, after reaching a value more than a threshold, go to the
next layer. We discuss in more detail this concept and its types.
4.1.6 Error
The error is the difference between the output value and the target. We
show it with E, and the learning goal is to make it small by a limit or as
much as possible.
4.1.8 Overfitting
When we train the model, we should care about generalization. If we train
the model a lot and make the model very complicated, there is a possibility
that the model does not generalize well, and then it cannot fit the data cor-
rectly. We cannot use training data because we already used them to train
the model. We also need testing data for testing the model (we can use
validation data for this purpose).
4.1.9 Underfitting
Underfitting is another concept that is the opposite of overfitting, and it
happens when a learning model cannot simulate the behavior and rela-
tionships between a dataset’s values and the target variable (Figure 4.1).
ANNs Fundamentals and Architectures ◾ 53
4.1.10 Confusion Matrix
There are several methods for finding the accuracy of the algorithms. The
confusion matrix is the method that we can use for classification problems.
To find the confusion matrix, you can create a square matrix that, on the
horizontal and vertical directions, we have both classes’ values. We can
assume the top of the table as predicted values and the left-hand side as the
targets. For example, the values at the position (i, j) tell us the number of times
the output predicted as class j when the target was i. Let me explain with an
example (if we have two classes, then we have a 2×2 Table 4.1 as follows).
The value of t(1,1) =4 means four times that the output predicted as
class C1 when the target was C1. For t(2,1) =2, when the output was pre-
dicted two times as class C1, the target was C2. Also, the value of t(1,2)=1
means one time that the output predicted as class C2 when the target was
C1, and the last one for t(2,2)= 7, the output predicted seven times as class
C2 when the target was C2. If you are looking for one number, you can
sum the values leading diagonal and divide it by the sum of all values. For
example, here the value is:
then we have:
11/14=.785
.785×100=78.5 %
4.1.11 Accuracy Metrics
There are four definitions as accuracy metrics that you should know:
True positive (TP): the observation that is correctly classified (for exam-
ple, to class 1). For instance, if the classification is based on the apple
and not apple, if the object is apple and classified as apple, it is TP.
False positive (FP): the observation is not correctly classified (for exam-
ple, to class 1). For instance, if the object is not apple and classified as
apple, it is FP.
True negative (TN): the observation that correctly is not classified (for
example, to class 1). For instance, if the object is not an apple and
classified as not an apple, it is TN.
False negative (FN): the observation that mistake is not classified (for
example, class 1). For instance, if the object is not apple and is classi-
fied as apple, it is FN (Table 4.2).
There are five definitions to find the accuracy based on these four values:
Accuracy TP FP / T P F P T N F N
Sensitivity TP / T P F N
Specificity TN / TN FP
Precision TP / TP FP
Recall TP / TP FN
Also, there is another measurement based on those four values that its
name is MCC (Matthew’s Correlation Coefficient) and is calculated as
follows:
MCC TP TN FP FN / TP FP TP FN
TN FP TN FN
Red --> 0
Green --> 1
Blue --> 2
These values are unique, and, in the data, if there are k features (here k=3),
there are k unique numbers, and each feature is assigned to One unique
value. Look at Table 4.3:
56 ◾ Deep Learning in Practice
4.2.1 Biological Neuron
The base elements of the human and animal neurological systems are neu-
ron cells. They contain the nucleus (soma), many branches called den-
drites, and one extended part called the axon. Neuron cells receive signals
(short electrical impulses) from other neurons. For example, a neuron will
fire (transmit) its signals when it receives enough threshold of signal
ANNs Fundamentals and Architectures ◾ 57
strength. There are still many kinds of research in this field, trying to find
the details of these biological network operations and functions (Figure 4.2).
4.2.2 Artificial Neuron
The artificial neuron is designed based on some basis of biological neuron
cell structure and processes. Table 4.4 shows some of these bases and their
similar components in ANNs. For example, neuron cells transfer signals
through dendrites and axons, and in the ANNs, sample data are trans-
ferred through inputs and outputs. Also, soma decides when the signals
can transfer to the axons, and the activation functions do a similar task in
ANNs. The structure of neuron cells, such as shape, size, and structure,
can define cells’ capability and importance in networks for transferring
the signals. It is simulated as weights in the ANNs, which define the
importance of each neuron in the network.
Figure 4.3 shows an artificial neural cell structure whose main elements
are inputs (data samples and weights) that are the impulse signals and the
neuron structure, activation function (soma), and the output (the impulse
input transfer to the next neuron through the dendrites).
x1 w1
w2
x2
∑ xiwi f(∑xiwi) y
w3
x3
4.3 ACTIVATION FUNCTIONS
ANNs simulate linear and nonlinear behavioral models. They deploy acti-
vation functions for simulating nonlinear behaviors (most of the real-
world problems are nonlinear). There are several activation functions, but
the most popular ones are sigmoid (sig), hyperbolic tangent (tan),
Rectifying Linear Unit (ReLU), leaky ReLU, and softmax. The question is,
which one of them is the best? Some methods, such as cross-validation
and some active research, help us find the best.
4.3.1 Sigmoid (sig)
The probability of everything has a value between 0 and 1. There are many
models that their outputs are probability values. For these problems, a sig-
moid can be the right choice. The sigmoid is also differentiable (its curve’s
slope can be calculated for every two points). Figure 4.4 Shows a sigmoid
function.
R(z)=max (0, z)
R(z)=max (0, z)
f(y)
f(y)=y
y
f(y)=ay
4.3.4 Leaky ReLU
The Leaky ReLU version of the ReLU function maps the negative values to
some negative values and the positive values to the positive values. Its
range is (-infinity to +infinity). Figure 4.7 Shows its shape.
4.3.5 Softmax
The softmax usually is utilized for the last layer or output layer. Its output
is multiple, and it can cover the multiple class problems category. Figure
4.8 shows how softmax works.Softmax normalizes each class’s output to
the range between 0 and 1 (probability value dividing by sum).
S y i e yi e
j1.. n
yj
2 5 1
.04 .93 .01
3 4 2
.24 .66 .08
4 2 1
.84 .11 .04
4.4 LOSS FUNCTION
A loss function (cost function) is a function that shows how far is our
predicted model from our desire model. It extracts by calculating the loss
values (the difference between the predicted outputs and the desired
(target)values). The several types of loss functions are in three main
categories:
The most popular ones are Cross-entropy Loss and Mean Squared Error-
MSE (L2) Loss.
4.4.1 Cross-Entropy Loss
It is a probability value and is the difference between predicted and actual
costs. Figure 4.9 shows its graph.
y y
2
p
i
MSE i 1
n
Figure 4.10 shows the L2 graph representing an MSE function where the
MSE loss reaches its minimum value at prediction = 100.
4.5 OPTIMIZATION FUNCTIONS
The three most popular optimization algorithms are Stochastic Gradient
Decent (SGD), Adagrad, and Adam. Overall, the SGD is much faster than
others. There is a tradeoff between speed and better results (accuracy).
When data grows, the model becomes more complicated, and then choos-
ing an optimization function is challenging. We should know two con-
cepts before going to some famous optimization methods: learning rate
and convex.
4.5.1 Learning Rate
Optimization algorithms update the network’s weights and biases and have
two categories: constant learning rate and adaptive learning rate. Learning
rate is the step size value that optimization algorithms use it when the
weights are training during the training stage. Its value is small and
ANNs Fundamentals and Architectures ◾ 63
generally between 0.0 and 1.0, and choosing the improper value can affect
the training convergence. Figure 4.11 shows some of these situations.
4.5.2 Convex
A function is called convex if we make a line between any two points on
the graph, then it lies above or under the graph between the two points
(Figure 4.12).
4.5.3 Gradient Descent
Gradient descent is one of the most popular optimization methods and
gives the optimal value along the gradient descent direction. This method
FIGURE 4.12 The top curve is non-convex, and the bottom one is convex.
64 ◾ Deep Learning in Practice
is the most general and popular solution when the objective function is
convex. One of its issues is the updating process in each step. When the
gradient of all data is calculated, the GD’s computation cost goes high,
especially when the data size is very high. It also takes more time to con-
verge when the data include more noise or biased data.
4.5.5 Adagrad
Manually tuning the learning rate is not needed here because, for each
parameter, it uses a different learning rate at a time based on the past gra-
dients. In this method, the learning rate adjusts adaptively to the squares
of all previous gradient values. With a more extensive learning rate, the
learning speed is faster. The method is proper for sparse gradient prob-
lems. On the other hand, there is possibility that through time, all gradi-
ent values will be larger and maybe make the learning rate close to zero,
and then the parameters do not update correctly. This method is not
proper for a non-convex problem.
4.5.6 Adam
Adam stands for Adaptive Moment estimation and calculates different
learning rates. This method is a combination of momentum and adaptive
method. This method is proper for non-convex problems with a large
amount of data and the high dimension of feature space. Most deep learn-
ing methods need large datasets, and then Adam can be one of the best fits
for deep learning projects. One of its issues is the possibility of not being
converge in some cases. Figure 4.13 shows the three methods’ changes
(GD, SGD, Adagrad, and Adam).
ANNs Fundamentals and Architectures ◾ 65
4.6.1 Linear Function
Here, at first, we show the simplest version of the neural network for mod-
eling linear functions:
f x w x b
x is input samples, w is the weight, and b is the bias. Figure 4.14 shows
a graph of connection between these parts with vertices and edges simula-
tion of neuron cells. The learning algorithm finds the best match for (w, b)
to find the best approximation of the model that fits data (x) to the target
(f(x)). The x is the matrix of input vectors that each vector includes the
feature values of the data objects. For example, if the object is an apple,
66 ◾ Deep Learning in Practice
x w w×x+b
FIGURE 4.14 Vertices are neurons (data sample), and edges are weights and
biases.
then the features are color, taste, and price (three values), and if we have
seven different instants for apple, then the x size is 7×3= 21. 7 rows, that
each row is representing an apple and each row has three columns that
demonstrate the three selected features (Figure 4.14).
The value of w is initiated randomly, and the network training purpose
is to find the best w, which gives us the best results (in the regression or
classification problems).
When we choose the linear activation function, it does not work for the
backpropagation because the derivative is constant, and it cannot help us
update the weights in returning to the inputs and find the best weights. It
does not work for nonlinear problems because the output is linear in any
architecture (even with several neurons and layers). Therefore, it cannot
simulate the nonlinear behavior of a problem. Figure 4.15 demonstrates
the linear function that its derivative is constant and cannot simulate the
nonlinear models.
derivative
linear function
4.6.2 Nonlinear Functions
In a real-world problem, there are some problems with nonlinearity in
their behaviors. Deploying nonlinear functions (Activation Function
(AF)) helps to simulate this type of model. We discussed some of them in
the previous section. Here, we have one of the most popular ones in a neu-
ral network (the sigmoid function):
f x sig w x b
Input Output
x w w×x + b
sig(w*x+b)
b
1
4.7 ANNS ARCHITECTURES
4.7.1 Feed Forward Neural Networks (FFNNs)
An FFNN is an artificial neural network without any form of a cycle in
connections between nodes. It flows the information from one direction
(input to the hidden layer and then output layer). FFNN is the simplest
form of NNs that the information is processed only in one direction.
Figure 4.18 shows the architecture of FFNN.
ANNs Fundamentals and Architectures ◾ 69
hidden layer
inputs outputs
layer layer
inputs outputs
FFNN is a simple network that has input, hidden, and output layers. The
combination of neurons in each layer will forward propagate to the next
layer. Then it goes to the activation function, and the outputs and their
combinations are the inputs of the next layer, and it will continue for all
layers, up to the last layer that gives the final outputs. For example, if the
network has one input, one hidden, and one output layer as shown in
Figure 4.19 and the X be the sample input vector, X= (x1, x2) and
Wh=(wh11,wh12,wh21,wh22) be the weights of the hidden layer and Wo=(wo11,
wo12, wo13, wo21, wo22, wo23, wo31, wo32, wo33) be the weights of the output layer
and a be the activation function then the prediction is equal to:
prediction=a(a(X*Wh)*Wo)
The algorithm uses the X data to calculate the combination of data (X)
and weights (W) and bias (b) and applies activation function on it through
all layers to get output. Two famous types of feedforwards networks are
single- and multi-layer perceptron. The pseudo of the algorithm is:
Forward (x):
y0=x
for l=1: L do:
cl=wlTyl-1 + bl
yl= a(cl)
end
return (yl)
4.7.2 Backpropagation
Backpropagation is a supervised learning algorithm, and it needs to be
trained by training sample data. Here are the steps of implementing back-
propagation algorithm:
4.7.3 Single-Layer Perceptron
It was invented in 1964 by Frank Rosenblatt at Cornell university. It is the
simplest feedforward neural network and does not contain any hidden layer.
A single-layer perceptron can only learn linear functions. Figure 4.21 shows
its architecture.
input layer
output layer
hidden layer
difference in
desired values
DOI: 10.1201/9781003025818-5 77
78 ◾ Deep Learning in Practice
Here, we present some definitions and concepts that you should know to
start working with any deep learning algorithms.
5.1.4 Vanishing Gradient
One of the challenges in using deep learning is the vanishing gradient
problem. The gradient is an optimization technique, and when the net-
work’s weights are going to be negative, through time, the values of the
weights will be very small and close to zero (for the values of the weights
that are close to zero, their gradient is zero). This is the problem: the net-
work and its weights cannot be updated, and it will be stuck at a point. So,
its name is vanishing gradient because the gradient does not work more in
this situation! There are several solutions for this problem, like using
Rectified Linear Unit (ReLU) that removes the negative values.
5.1.5 Channel
The input data in deep learning can have different channels. For example,
if the data are images and use RGB (standard for color image), there are
three channels for these data (Red, Green, and Blue).
DNNs Fundamentals and Architectures ◾ 79
5.1.6 Embedding
An embedding represents the input data (like image, text, and audio) to a vec-
tor. For example, we embed the data like images into the common space, and
in sentiment analysis, the embedding process converts words to vectors.
5.1.7 Fine-Tuning
It is a method of initializing a network’s parameters for a task and then
update these parameters with a new task. For example, there is a pre-
trained word embedding in the NLP system (like the word to vector), that
its parameters can be updated for a specific task like content analysis or
sentiment analysis.
5.1.8 Data Augmentation
Using some methods (depends on the data type) to generate some new
data from original data is one of the challenges in DL. For example, flip-
ping and rotation are two methods to create new data.
5.1.9 Generalization
The NN performance mostly depends on generalization potential that it
helps generalize the model and gives it this ability to operate with new
data. As ANNs, a DL network is better to cover generalization to increase
the model flexibility. There are several methods like regularization to
increase the generalization.
5.1.10 Regularization
It is a method that, by doing some minor changes in the learning process,
generalizes the model better and makes it more flexible. Some of the regu-
larization methods are L1, L2, and dropout that we discuss in the next
sections.
5.1.11 L1 and L2
These are the most popular kinds of regularization methods. Here, in gen-
eral, we add a term to regularize (regularization term) and update the cost
function.
5.1.12 Dropout
Dropout works by masking (dropping) some nodes in the network in the
training step (we will explain in more detail this concept later in this
chapter).
5.1.13 End-to-End Learning
Deep learning supports end-to-end learning that helps solves problems
more efficiently. For example, recognizing the face using the traditional AI
methods or artificial neural networks has some steps such as cropping,
translation, detection, and recognition. However, when you are using deep
learning, it does not need these steps, and the layers in the network extract
the features from data automatically (Chapter 7 provides an implementa-
tion example for face recognition by DL).
• self-driving cars,
• natural language processing,
• content analysis,
• visual recognition,
• fashion,
DNNs Fundamentals and Architectures ◾ 81
• chatbots,
• virtual agents,
• text generation,
• handwritten generation and machine translation,
• advertising, and
• fraud detection.
There are many research and projects for deploying deep learning, from
medical image analysis to drug discovery and genomes analysis. Also,
based on the data nature and type of the problems, some real-world prob-
lems can be solved by deep learning; for example, performing real-time
human behavior analysis, translating languages, and forecasting natural
catastrophes. The autonomous vehicle is one of the hot topics that many
large companies like google, apple, tesla, and uber invest in. The financial
industry is another area that used deep learning for fraud detection of
money transactions. It saved billions for financial and insurance insti-
tutes. By using some information like customer transactions and credit
scores, systems classify and detect fraud (Figure 5.1).
These days we use several virtual assistance services like Alexa, Siri,
and google assistance in our daily life. These products give you smart ser-
vices like desired songs or restaurants and interact with you smartly and
intelligently. They understand human language and provide you smart
FIGURE 5.1 There are several applications for deep learning techniques.
82 ◾ Deep Learning in Practice
and autonomous responses. They also give you some services, like reading
or writing your email and organizing your documents (Figure 5.2).
We review the implementation steps of a virtual assistant robot in
Chapter 7 (Figure 5.3).
Figure 5.4 shows an example of DNN. It has one input layer, one output
layer, and four hidden layers. Here, we present an example for DNN with
TensorFlow for the MNIST dataset. The input image size is 28×28, the hid-
den activation function is ReLU, and the output activation function is
SoftMax.
#MNIST Example
#input data and setting the network parameters and
hyper parameters
input_data_shape = (28, 28)
hidden_activation_function = 'relu'
output_activation_function = 'softmax'
# define the model
dnn_Model = models.Sequential()
# input layer
dnn_model.add(ks.layers.Flatten(input_shape = input_
data_shape, name = 'Input_layer'))
# first hidden layer
dnn_model.add(ks.layers.Dense(256, activation =
hidden_activation_function, name = 'Hidden_layer_1'))
# second hidden layer
dnn_model.add(ks.layers.Dense(192, activation =
hidden_activation_function, name = 'Hidden_layer_2'))
# third hidden layer
dnn_model.add(ks.layers.Dense(128, activation =
hidden_activation_function, name = 'Hidden_layer_3'))
84 ◾ Deep Learning in Practice
• convolution filters,
• pooling or subsampling,
• activation (transition) function, and
• fullyconnected layer.
FIGURE 5.5 CNN looks for the features of the data for training.
DNNs Fundamentals and Architectures ◾ 85
the data using feedforward and return the output values to the hidden and
input layer through the backpropagation (storing past data and using
them for forecasting).
RNN has several applications, such as predicting an event based on the
previous events and data (like the stock market data), and has two main types:
RNN shares the same parameters for all layers. It does not work very well
in long-term processing and cannot have intense layers. Figure 5.6 shows
an RNN architecture and shows how the output layer has a connection
with the hidden layer.
FIGURE 5.7 LSTM is a type of RNNs that uses backpropagation and has memory.
classify original from new data (fake data) (these two modules compete
with each other).
These models use different loss functions (its computation is very high),
and if one of them fails, the whole network fails. The generator and dis-
criminator weights can be updated through iterations. Figure 5.8 shows a
general structure of the GAN networks.
relu
relu
relu X
F(X)-X
F(X) relu
FIGURE 5.10 ResNet networks architecture.
5.4.1.2 Pooling Layers
CNNs also have pooling (ex. max-pooling, min pooling, and average pool-
ing) layers to find subregions and decrease computational time (Figure 5.13).
The pooling layer functionality depends on the size of the pooling win-
dows and the type of pooling. It operates on the convolution outputs to
reduce the size for less operation in the next step.
5.4.1.3 Dropout
It helps networks to memorize and generalize networks for using in differ-
ent applications. Sometimes removing some neurons, by maintaining the
generalization besides keeping the patterns, helps the network perfor-
mance. These nodes can be removed by making them off. For this pur-
pose, one of the popular ways is to make a random matrix of zero and
multiply it by the nodes’ values to make them off or on (Figure 5.14).
DNNs Fundamentals and Architectures ◾ 89
Random dropout does not change the weights, and it does mask over
the network to drop some nodes. Also, it is used during the training step.
5.4.1.4 Batch Normalization
Normalization techniques are using to make machine learning algorithms
more generalized to new sample data. For example, if data have a normal
distribution, then centroid the data by subtracting from the mean, and
then dividing the result by standard deviation is normalization. Batch
normalization is a method that helps to have a deeper network. In batch
normalization, the features are normalized by feeding to the batch nor-
malization layer. In general, it has a significant effect on improving train-
ing and convergence of speed. However, it is based on the batch’s
dimension, which leads to a strong dependence on the batch size setting.
The deep network data is in a four-dimensional (4D) vector (N, C, H, W)
order, where N is the batch axis, C is the channel axis, H and W are the
spatial height and width axes (Figure 5.15).
90 ◾ Deep Learning in Practice
5.4.2 Design a CNN
Here we implement a LeNet5 architecture. Let us explain LeNet5. It has a
sequence of convolution and pooling layers. It has two fully connected lay-
ers and one classifier layer. All other CNNs architecture has almost these
implementation steps. Let us go to the coding steps:
DNNs Fundamentals and Architectures ◾ 91
Step 1: In the first step, you should determine the libraries you need
for implementation and import them. These libraries help you make
your code more robust and accurate and make the computation cost
less.
import TensorFlow as tf
import NumPy as np
from tensorflow.examples.tutorials.mnist import
input _ data
Step 2: Setting up the number of samples for the training and testing
step and the image size and the number of classes.
batch_size = 128
test_size = 256
img_size = 28
num_classes = 10
# define a placeholder for data and y for labels
X = tf.placeholder("float", [None, img_size,
img_size, 1])
Y = tf.placeholder("float", [None, num _ classes])
By doing this, you can remove the vanishing gradient. LSTM is different
from RNNs in these three main aspects of controlling:
• inputs data,
• remembering sample data, and
• outputs.
5.5.3.1 Import Libraries
The first step is to import some libraries like NumPy, TensorFlow, and
Keras.
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
batch_size = 64
input_dim = 28
units = 64
DNNs Fundamentals and Architectures ◾ 95
output_size = 10
def build_model(allow_cudnn_kernel=True):
if allow_cudnn_kernel:
lstm_layer = keras.layers.LSTM(units, input_
shape=(None, input_dim))
else:
lstm_layer = keras.layers.RNN(keras.layers.
LSTMCell(units), input_shape=(None, input_dim) )
model = keras.models.Sequential([ lstm_layer,
keras.layers.BatchNormalization(), keras.layers.
Dense(output_size),])
return model
model = build_model(allow_cudnn_kernel=True)
model.compile(loss=keras.losses.SparseCategoricalCross
entropy(from_logits=True),optimizer="adam",metrics=["a
ccuracy"],)
model.fit(x _ train, y _ train, validation _ data=(x _ test,
y _ test), batch _ size=batch _ size, epochs=100)
Outputs:
Epoch 1/10
938/938 [==============================] - 12s 13ms/
step - loss: 0.3543 - accuracy: 0.8882 - val_loss:
0.1381 - val_accuracy: 0.9555
…
Output:
The predicted result is: [3], the target result is: 3
5.6.1 What is a GAN?
Generative Adversarial Network (GAN) is one of the most recent network
architectures in deep learning that; its concept and method give some new
advantages in using deep learning algorithms. There are several advan-
tages to use GAN:
1. Generative Goal:
maximize the similarities that make the discriminator misclassify the
fake data as real.
2. Discriminative Goal:
optimizing the differences between real and fake data (training loss
close to 0.5).
1. loading dataset,
2. data preprocessing,
3. defining the discriminator model,
4. defining the generator model,
5. combining the generator and discriminator model,
6. training the model, and
7. predict (generate data).
5.6.2.1 Loading Dataset
The dataset we use here is the fashion MNIST.
from keras.datasets.fashion_mnist import load_data
(real_train_images, real_train_labels), (real_test_
images, real_test_labels) = load_data()
print('Train', real_train_images.shape, real_train_
labels.shape)
print('Test', real _ test _ images.shape, real _ test _
labels.shape)
5.6.2.2 Data Preprocessing
In this step, you should do some processes on data like normalizing the
data, selecting the real data, generating random noise, and selecting the
fake data.
a) normalize the data which are between [0, 255] and [-1, 1]
def prepare_real_samples(samples):
prepared_samples = samples.astype('float32')
prepared_samples = (prepared_samples - 127.5) /
127.5
return prepared _ samples
discriminator = create_discriminator(INPUT_SHAPE,
N_CLASSES)
discriminator.summary()
Layer (type) Output Shape Param # Connected to
============================================================
input_1
(InputLayer) [(None, 1 0
____________________________________________________________
embedding
(Embedding) (None, 1, 50) 500 input_1[0][0]
____________________________________________________________
dense (Dense) (None, 1, 784) 9984 embedding[0][0]
____________________________________________________________
input_2
(InputLayer) [(None, 28, 28, 1)] 0
____________________________________________________________
reshape
(Reshape) (None, 28, 28, 1) 0 ense[0][0]
____________________________________________________________
concatenate
(Concatenate) (None, 28, 28, 2) 0 input_2[0][0]
____________________________________________________________
conv2d
(Conv2D) (None, 14, 14, 128) 2432 concatenate[0][0]
____________________________________________________________
leaky_re_lu
(LeakyReLU) (None, 14, 14, 128) 0 conv2d[0][0]
____________________________________________________________
conv2d_1
(Conv2D) (None, 7, 7, 128) 147584 leaky_re_lu[0][0]
____________________________________________________________
leaky_re_lu_1
(LeakyReLU) (None, 7, 7, 128) 0 conv2d_1[0][0]
____________________________________________________________
flatten
(Flatten) (None, 6272) 0 leaky_re_lu_1[0][0]
____________________________________________________________
dropout
(Dropout) (None, 6272) 0 flatten[0][0]
____________________________________________________________
dense_1
(Dense) (None, 1) 6273 dropout[0][0]
============================================================
Total params: 196,773
Trainable params: 196,773
Non-trainable params: 0
102 ◾ Deep Learning in Practice
leaky_re_lu_3
(LeakyReLU) (None, 14, 14, 128) 0 conv2d_transpose[0][0]
_____________________________________________________________________________
conv2d_transpose_1
(Conv2DTrans) (None, 28, 28, 128) 262272 leaky_re_lu_3[0][0]
_____________________________________________________________________________
leaky_re_lu_4
(LeakyReLU) (None, 28, 28, 128) 0 conv2d_transpose_1[0][0]
_____________________________________________________________________________
conv2d_2 (Conv2D) (None, 28, 28, 1) 6273 leaky_re_lu_4[0][0]
_____________________________________________________________________________
functional_1
(Functional) (None, 1) 196773 conv2d_2[0][0]
=============================================================================
Total params: 1,366,109
Trainable params: 1,169,336
Non-trainable params: 196,773
fake_images = (fake_images + 1) / 2
print(f'After {(i+1)*BATCHES*BATCH_SIZE}
samples.')
print('Image labels are D(image) and
Class(image)')
print(' Generated Images and
Discriminator Output:')
for i in range(10):
ax = plt.subplot(2, 5, i+1)
ax.axis('off')
ax.set_title(f'{disc_fake_
classifications[i][0]:.2f}, {fake_classes[i]}')
plt.imshow(fake_images[i].
reshape(INPUT_SHAPE[0], INPUT_SHAPE[1]))
plt.tight_layout()
plt.show()
print(' Real Images and Discriminator
Output:')
real_images, real_batch_labels, real_
classes = generate_real_batch(prepared_real_images,
real_train_labels, 10)
disc_real_classifications = discriminator.
predict([real_images, real_classes])
real_images = (real_images + 1) / 2
for i in range(10):
ax = plt.subplot(2, 5, i+1)
ax.axis('off')
ax.set_title(f"{disc_real_
classifications[i][0]:.2f} {real_classes[i]}")
plt.imshow(real_images[i])
plt.tight_layout()
plt.show()
EPOCHS = 50
BATCHES = 50
BATCH_SIZE = 128
IMAGE_NOISE_START = .1
LABEL_SMOOTHING_MAGNITUDE = .1
#more training for 50 epochs
discriminator = create_discriminator(INPUT_SHAPE, N_
CLASSES, label_smoothing = LABEL_SMOOTHING_MAGNITUDE)
DNNs Fundamentals and Architectures ◾ 107
28 × 28 × 3 = 2352
Here, the feature vector dimension is 2352 for small image data, and it
shows how the computation cost goes very high when we work with large
databases. However, some people prefer to use the grayscale image in their
works because, in the grayscale images, there is just one channel. For
example, in the previous example, it changed the dimension to 784. If we
have 10,000 images in our dataset, then:
Figure 6.2 shows a CNN that extracts different features of the images from
different patches in different layers in the N dimension space. Several
image data patches also show that all their values get a 3D matrix after
convolving filters on the image in each layer. If you look at a pixel in an
DNNs for Images Analysis ◾ 111
image, you cannot recognize what the image is about, but maybe a set of
pixels (a block or patch) give you some information about the picture.
Also, each image has several patterns that create its context. Figure 6.3
shows an example that the CNN is looking for the patterns in images
(maybe one pixel or even one patch does not show any patterns). The pat-
terns come from any changes in color density and other image data values,
and the CNNs filters detect different patterns in the images. They extract
these patterns as feature vectors for training the network model. Different
filters in different layers provide different feature maps that present differ-
ent data features. More features make the trained network more accurate
for testing its model with real-world problems. When we train a CNN
112 ◾ Deep Learning in Practice
1. supervised learning
• object detection,
• image classification, and
• image segmentation.
DNNs for Images Analysis ◾ 113
2. unsupervised learning
• image generation
There are different algorithms for these applications. Table 6.1 shows some
of these methods and algorithms.
6.2.1 Filter Parameters
Some filter parameters like size, type, and stride can be changed and affect
the learning process and the network’s performance. Here we review these
parameters in more detail to learn how they can change the network
performance.
changing the number and type of the filters, find the best architecture for
your problem. Please keep in mind that there are some verified and well-
known architectures in CNN that you can use for your problem. For
learning purposes, we suggest you target a problem and then create your
basic network and try to find its bottlenecks to make its performance bet-
ter. Now try the results with one of the popular trained CNN algorithms
or pre-trained networks and compare your results.
6.2.1.2 Filters Size
The size of the filters can be small or large! The most popular size is 3×3.
By choosing a larger size, finding larger patterns is possible (for keeping
the output size in each layer, you can pad the image with some zero value
around it (Figure 6.5)). For example, if we use 5×5 with the same stride, it
makes the computation cost higher because it should multiply with more
pixels, and it does not make the process faster when the stride is the same.
Also, we should keep the symmetry, and by choosing some even numbers
like 2 and 4, it does not cover the symmetry (for dividing the previous
layer pixels over the output pixel). On the other hand, in the last layer,
some distortion across the layer can happen. So, among these numbers for
filter size (2, 3, 4, and 5), the best value is 3.
0 0 0 0 0 0 0 0 0 0
0 0
0 0
0 0
32×32×3
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0 0
FIGURE 6.5 Padding one around the image with zero values.
DNNs for Images Analysis ◾ 115
popular value is one). The larger value for stride makes the output smaller
(we discuss its size more in the next section), and there are possibilities to
lose some patterns and information on data.
Another concept is padding, which is adding blank pixels to the image
frame. Padding increases the image size and prevents losing information
in the image frame when the kernel moves over the image data frame. For
example, Figure 6.5 shows a padding one around the 32×32×3 image that
their values are zero (zero paddings).
If you choose the stride large, it may make the computation cost less,
but it also affects the filter performance in extracting the feature in the
convolution process. The most popular size for padding and stride is one.
You can try different values to find how those values change the model
performance.
6.2.2 Number of Parameters
Here we review the number of parameters in each layer and explain how
many parameters exist in each layer. We check the input, convolutional,
pooling, and the fully connected layer’s parameters.
6.2.2.1 Input Layer
There are no parameters in this layer for learning (in this layer, we have input
data). However, there are some discussions about the input data and their
size. Also, sometimes before passing the data to the network, after prepro-
cessing, you can add a dimension to the data (training and testing) to make
a tensor (matrix) for easier computation in the model training and testing.
Figure 6.6 shows the main steps in converting an image to a vector. The fea-
ture vectors are the value that the learning models deploy for training.
6.2.2.2 Convolutional Layer
There are different convolutional layers in the CNN architectures. If I be
the number of input feature maps (previous layer filters) and O is the
number of output feature maps (current layer filters), and n×m is the filter
size, then the total number of parameters are:
Figure 6.7 shows four different feature maps from four filters. You can
see how these four filters extract different feature maps and what the dif-
ferences are.
6.2.2.3 Pooling Layer
There are no parameters for learning in this layer, but it has computation
cost, which can affect the learning performance. Therefore, choosing the
pooling layer size wisely and putting it correctly in the architecture and
the right place can improve performance.
((C × P) + 1 × C)
The current layer neuron number is C or output (O), and the previous
layer neuron number is P or input (I). The output of each layer is:
For example, if the input is an image and the image size is 28×28, and
the filter size is 3×3, then the output of the layer is:
(28 − (3 − 1)) = 26
stride=1,
kernel_ size=3×3,
number of filters in the first layer =32,
then the number of the parameters are:
In the next layer, we already have 32 learned filters from the previous
layer, and then the number of trainable parameters is:
C2 : 3 3 32 1 32 9, 248
C3 : 3 3 32 1 64 18, 496
And so on. The total parameters are the sum of all these values. For
example, in these three layers, there are 28,064 parameters:
6.2.3 Imagenet Challenge
Since 2010, a contest: ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), has been designed for testing different algorithms. The first
classification rate was 25%, and the best first achievement was belonging
to the AlexNet, which achieved a 10% error rate (however, the recent
advanced algorithms show better results). Also, there is a visual database
with more than 14 million images for a visual recognition research project
in this competition that people use to compare their results.
6.2.4 CNN Architecture
This section presents some of the most popular CNN algorithms. There
are several different architectures with different performances. The most
significant differences between these architectures are in:
• number of layers,
• elements in each layer, and
• the connection between the layers.
These three parameters define your model and include identifying the
network parameters and hyperparameters.
6.2.4.1 LeNet-5 (1998)
This network is one of the simplest deep neural networks that presented in
1998 and has seven layers:
• two convolutional,
• three fully connected layers, and
• two pooling layers.
gaussian connection
full connection
full connection
convolution convolution
subsampling subsampling
Figure 6.8 shows its architecture in more detail. Let’s discuss more details
about its filters and parameters. If the stride is one, the total parameters are:
(n + 2p − f)/s + 1 × (n + 2p − f )/s + 1 × Nc
where:
Nc: is the number of channels (number of filters used to convolve inputs),
p: padding (no padding in LeNet-5),
s: stride.
6.2.4.2 AlexNet (2012)
AlexNet is one of the first deep learning methods presented in ImageNet
that won it in 2012. You can see its DL architecture in Figure 6.9. Its archi-
tecture is like LeNet with more layers (deeper).
They presented dropout techniques for making the computation cost
less and achieved better results in comparison to other methods. They also
used ReLU in their methods to make the performance better. They used
max-pooling layers to make their methods faster. Its architecture has eight
layers and include:
6.2.4.3 GoogleNet/Inception-v1 (2014)
GoogleNet is a CNN with 22 layers in total. Its pretrained version is avail-
able, and you can download and use it. Figure 6.10 shows its architecture.
It won ILSVRC 2014 with an error rate of less than 7%. Its architecture
uses some techniques such as batch normalization to reduce the parame-
ters to 4–5 million. In addition, it has some inception (used from the next
version) modules, consisting of parallel convolutional filters in each layer.
As you can see in Figure 6.10, its network is very deep that it makes the
accuracy higher, but its computation cost is also high on the other side.
6.2.4.4 VGGNet-16 (2014)
VGGNet-16 is a CNN that was presented in 2014 and had 16 layers in its
architecture. It achieved 92.7% accuracy on the test image on ImageNet. It
is an improvement of AlexNet by replacing large kernel filters. It is very
slow to train, and its weights are very large. It has been used for many
image classification problems and has shown very interesting results. You
can implement it with the current DL platforms like TensorFlow, Keras, or
Pytorch. Figure 6.11 shows its architecture. Its architecture includes:
It has 138M parameters that take about 500 MB space (maybe the simplest
way to improve performance is increasing the layers).
As you can see in Figure 6.11, there are three fully connected layers and
13 convolutional layers.
6.2.4.5 Inception-v3 (2015)
It is the next generation of Inception-v1 (after Inception-v2 with some tweaks
in loss function and using some other techniques like batch normalization).
It has 24M parameters, and it is a CNN that is used for image analysis and
1 2 1 2 1 2 1 2 1 2 3 4
6.2.4.6 ResNet (2015)
With just increasing the network depth, the accuracy cannot increase
more than a threshold. This network, presented by Microsoft, solves
ResNet’s problem using a shortcut connection (residual) to build a deeper
network. ResNet won ILSVC in 2015 with less than a 3.6% error rate with
152 layers. It is a very deep CNN method. Figure 6.13 shows three archi-
tecture and compares VGG-19 with 34-layer plain and 34-layer residuals
to illustrate this complexity. It has been used in several computer vision
applications.
DNNs for Images Analysis ◾ 123
6.2.4.7 Inception-v4 (2016)
The improvement of the Inception-v3 is Inception-v4 that presented by
google in 2016 and has these changes in its architecture:
6.3.1 Import Libraries
In any program, the first step is to import libraries to use their methods in
our program. These libraries help us to do computing and plotting easier.
For example, NumPy, panda, TensorFlow, and Keras are the most popular
libraries used in DL modeling (here CNN) using python.
In the model implementation, we may check all these values and see the
model’s performance.
126 ◾ Deep Learning in Practice
6.4.1 Import Libraries
Importing libraries, as the first step, helps us to make the computation
easier. In this example, we use TensorFlow and Keras libraries to import.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Conv2D,MaxPo
ol2D,Flatten
from tensorflow.keras.callbacks import EarlyStopping
from sklearn.metrics import classification_report,
confusion_matrix
from PIL import Image
import seaborn as sns
One good coding practice and habit is to write some tests for each part
of code to check code. You can also do some data visualization to see the
data samples (to see some features like quality and distribution of data)
and use some data augmentation methods to generate data variations,
which may help train the model better and increase the accuracy.
model = Sequential()
model.add(Conv2D(filters=32, kernel_size=(4,4),
strides=(1,1), padding='valid', input_shape=(32,32,3),
activation="relu"))
model.add(Conv2D(filters=32, kernel_size=(4,4),
strides=(1,1), padding='valid', input_shape=(32,32,3),
activation="relu"))
model.add(MaxPool2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256,activation="relu"))
model.add(Dense(10, activation="softmax"))
model.compile(loss="categorical _ crossentropy",
optimizer="adam", metrics=["accuracy"])
model.summary()
Model: "sequential"
______________________________________________________
Layer (type) Output Shape Param #
======================================================
conv2d (Conv2D) (None, 29, 29, 32) 1568
______________________________________________________
DNNs for Images Analysis ◾ 129
max_pooling2d
(MaxPooling2D) (None, 14, 14, 32) 0
______________________________________________________
conv2d_1 (Conv2D) (None, 11, 11, 32) 16416
______________________________________________________
max_pooling2d_1
(MaxPooling2 (None, 5, 5, 32) 0
______________________________________________________
flatten (Flatten) (None, 800) 0
______________________________________________________
dense (Dense) (None, 256) 205056
______________________________________________________
dense_1 (Dense) (None, 10) 2570
======================================================
Total params: 225,610
Trainable params: 225,610
Non-
trainable params: 0
Figure 6.16 shows the confusion matrix of the trained model that was
tested by test data.
130 ◾ Deep Learning in Practice
6.5.1 Import Libraries
We use TensorFlow, Keras, and matplotlib for implementing the network
and visualization in this example.
import tensorflow as tf
from tensorflow.keras import datasets, layers, models
import matplotlib.pyplot as plt
model = models.Sequential()
model.add(layers.Conv2D(32, kernel_size=(3, 3),
activation='relu', input_shape=(32, 32, 3)))
model.add(layers.MaxPooling2D(pool_size=(2, 2)))
132 ◾ Deep Learning in Practice
Here is the summary of the model that shows the network structure
and the total and trainable parameters.
______________________________________________________
Layer (type) Output Shape Param #
======================================================
conv2d_12 (Conv2D) (None, 30, 30, 32) 896
______________________________________________________
max_pooling2d_11
(MaxPooling (None, 15, 15, 32) 0
______________________________________________________
conv2d_13 (Conv2D) (None, 13, 13, 64) 18496
______________________________________________________
max_pooling2d_12
(MaxPooling (None, 6, 6, 64) 0
______________________________________________________
conv2d_14 (Conv2D) (None, 4, 4, 128) 73856
______________________________________________________
max_pooling2d_13
(MaxPooling (None, 2, 2, 128) 0
______________________________________________________
dropout_3 (Dropout) (None, 2, 2, 128) 0
______________________________________________________
flatten_5 (Flatten) (None, 512) 0
______________________________________________________
dense_12 (Dense) (None, 256) 131328
______________________________________________________
dense_13 (Dense) (None, 128) 32896
______________________________________________________
dense_14 (Dense) (None, 10) 1290
======================================================
Total params: 258,762
Trainable params: 258,762
Non-
trainable params: 0
DNNs for Images Analysis ◾ 133
model.compile(optimizer='adam',loss='sparse_
categorical_crossentropy', metrics=['accuracy'])
history = model.fit(train _ images, train _ labels,
epochs=10, validation _ data=(test _ images,
test _ labels))
6.6 IMAGE SEGMENTATION
Image segmentation is partitioning the segments that are meaningful and
help to analyze the images better and easier. Several traditional image seg-
mentation methods do the segmentation based on the image color, inten-
sity, or texture. Image segmentation is one of the image analysis applications
of CNN. Here, we implement an example to show you how CNN can help
us in image segmentation.
6.6.1 Import Libraries
We use TensorFlow, python, and matplotlib libraries in this section for
model generation and visualization.
input_mask1 = tf.image.
resize(datapoint['segmentation_mask'], (128, 128))
if tf.random.uniform(()) > 0.5:
input_image1 = tf.image.
flip_left_right(input_image1)
input_mask1 = tf.image.
flip_left_right(input_mask1)
input_image1, input_mask1 = normalize(input_image1,
input_mask1)
return input_image1, input_mask1
train_lenght1 = info.splits['train'].num_examples
# define the dataset categories
batch_size= 64
buffer_size = 1000
steps_per_epoch = train_lenght1 // batch_size
train = dataset['train'].map(load_image_train, num_
parallel_calls=tf.data.experimental.AUTOTUNE)
test = dataset['test'].map(load_image_test)
train_dataset = train.cache().shuffle(BUFFER_SIZE).
batch(BATCH_SIZE).repeat()
train_dataset = train_dataset.prefetch(buffer_size=tf.
data.experimental.AUTOTUNE)
test _ dataset = test.batch(BATCH _ SIZE)
6.6.3 Segmentation Map
Here we define the true mask based on the input image. Figure 6.18 shows
the original and true mask of an image.
OUTPUT_CHANNELS = 3
base_model = tf.keras.applications.MobileNetV2(input_
shape=[128, 128, 3], include_top=False)
# Use the activations of these layers
layer_names = [
'block_1_expand_relu',
'block_3_expand_relu'
'block_6_expand_relu',
'block_13_expand_relu'
'block_16_project'
]
layers = [base_model.get_layer(name).output for name
in layer_names]
# Create the feature extraction model
down_stack = tf.keras.Model(inputs=base_model.input,
outputs=layers)
down_stack.trainable = False
up_stack = [
pix2pix.upsample(512, 3),
136 ◾ Deep Learning in Practice
pix2pix.upsample(256, 3),
pix2pix.upsample(128, 3),
pix2pix.upsample(64, 3),
]
def unet_model(output_channels):
inputs = tf.keras.layers.Input(shape=[128,128,3 ])
x = inputs
# Downsampling
skips = down_stack(x)
x = skips[-1]
skips = reversed(skips[:-1])
# Upsampling
for up, skip in zip(up_stack, skips):
x = up(x)
concat = tf.keras.layers.Concatenate()
x = concat([x, skip])
# This is the last layer
last = tf.keras.layers.Conv2DTranspose(
output_channels, 3, strides= 2,
padding= 'same' )
x = last(x)
return tf.keras.Model(inputs=inputs, outputs=x)
base _ model.summary()
The model summary shows the network architecture and the number
of trainable and total parameters. The network here is large, but it is inter-
esting to follow.
_____________________________________________________________________________
Layer (type) Output Shape Param # Connected to
=============================================================================
input_1
(InputLayer) [(None, 128, 128, 3) 0
_____________________________________________________________________________
Conv1_pad
(ZeroPadding2D) (None, 129, 129, 3) 0 input_1[0][0]
_____________________________________________________________________________
Conv1 (Conv2D) (None, 64, 64, 32) 864 Conv1_pad[0][0]
_____________________________________________________________________________
bn_Conv1
(BatchNormalization) (None, 64, 64, 32) 128 Conv1[0][0]
_____________________________________________________________________________
Conv1_relu (ReLU) (None, 64, 64, 32) 0 bn_Conv1[0][0]
_____________________________________________________________________________
expanded_conv_
depthwise (Depthw (None, 64, 64, 32) 288 Conv1_relu[0][0]
_____________________________________________________________________________
DNNs for Images Analysis ◾ 137
expanded_conv_
depthwise_BN (Bat (None, 64, 64, 32) 128 expanded_conv_depthwise[0][0]
_____________________________________________________________________________
expanded_conv_
depthwise_relu (R (None, 64, 64, 32) 0 expanded_conv_depthwise_BN[0][0]
_____________________________________________________________________________
expanded_conv_
project (Conv2D) (None, 64, 64, 16) 512 expanded_conv_depthwise_relu[0]
[0]
_____________________________________________________________________________
expanded_conv_
project_BN (Batch (None, 64, 64, 16) 64 expanded_conv_project[0][0]
_____________________________________________________________________________
block_1_expand
(Conv2D) (None, 64, 64, 96) 1536 expanded_conv_project_BN[0][0]
_____________________________________________________________________________
block_1_expand_BN
(BatchNormali (None, 64, 64, 96) 384 block_1_expand[0][0]
_____________________________________________________________________________
block_1_expand_
relu (ReLU) (None, 64, 64, 96) 0 block_1_expand_BN[0][0]
_____________________________________________________________________________
block_1_pad
(ZeroPadding2D) (None, 65, 65, 96) 0 block_1_expand_relu[0][0]
_____________________________________________________________________________
block_1_depthwise
(DepthwiseCon (None, 32, 32, 96) 864 block_1_pad[0][0]
_____________________________________________________________________________
block_1_depthwise
_BN (BatchNorm (None, 32, 32, 96) 384 block_1_depthwise[0][0]
_____________________________________________________________________________
block_1_depthwise_
relu (ReLU) (None, 32, 32, 96) 0 block_1_depthwise_BN[0][0]
_____________________________________________________________________________
block_1_project
(Conv2D) (None, 32, 32, 24) 2304 block_1_depthwise_relu[0][0]
_____________________________________________________________________________
block_1_project_
BN (BatchNormal (None, 32, 32, 24) 96 block_1_project[0][0]
_____________________________________________________________________________
block_2_expand
(Conv2D) (None, 32, 32, 144) 3456 block_1_project_BN[0][0]
_____________________________________________________________________________
block_2_expand_
BN (BatchNormali (None, 32, 32, 144) 576 block_2_expand[0][0]
_____________________________________________________________________________
block_2_expand_
relu (ReLU) (None, 32, 32, 144) 0 block_2_expand_BN[0][0]
_____________________________________________________________________________
block_2_depthwise
(DepthwiseCon (None, 32, 32, 144) 1296 block_2_expand_relu[0][0]
_____________________________________________________________________________
block_2_depthwise_
BN (BatchNorm (None, 32, 32, 144) 576 block_2_depthwise[0][0]
_____________________________________________________________________________
block_2_depthwise_
relu (ReLU) (None, 32, 32, 144) 0 block_2_depthwise_BN[0][0]
_____________________________________________________________________________
block_2_project
(Conv2D) (None, 32, 32, 24) 3456 block_2_depthwise_relu[0][0]
_____________________________________________________________________________
138 ◾ Deep Learning in Practice
block_2_project_
BN (BatchNormal (None, 32, 32, 24) 96 block_2_project[0][0]
_____________________________________________________________________________
block_2_add (Add) (None, 32, 32, 24) 0 block_1_project_BN[0][0]
block_2_project_
BN[0][0]
_____________________________________________________________________________
block_3_expand
(Conv2D) (None, 32, 32, 144) 3456 block_2_add[0][0]
_____________________________________________________________________________
block_3_expand_BN
(BatchNormali (None, 32, 32, 144) 576 block_3_expand[0][0]
_____________________________________________________________________________
block_3_expand_
relu (ReLU) (None, 32, 32, 144) 0 block_3_expand_BN[0][0]
_____________________________________________________________________________
block_3_pad
(ZeroPadding2D) (None, 33, 33, 144) 0 block_3_expand_relu[0][0]
_____________________________________________________________________________
block_3_depthwise
(DepthwiseCon (None, 16, 16, 144) 1296 block_3_pad[0][0]
_____________________________________________________________________________
block_3_depthwise_
BN (BatchNorm (None, 16, 16, 144) 576 block_3_depthwise[0][0]
_____________________________________________________________________________
block_3_depthwise_
relu (ReLU) (None, 16, 16, 144) 0 block_3_depthwise_BN[0][0]
_____________________________________________________________________________
block_3_project
(Conv2D) (None, 16, 16, 32) 4608 block_3_depthwise_relu[0][0]
_____________________________________________________________________________
block_3_project_BN
(BatchNormal (None, 16, 16, 32) 128 block_3_project[0][0]
_____________________________________________________________________________
block_4_expand
(Conv2D) (None, 16, 16, 192) 6144 block_3_project_BN[0][0]
_____________________________________________________________________________
block_4_expand_BN
(BatchNormali (None, 16, 16, 192) 768 block_4_expand[0][0]
_____________________________________________________________________________
block_4_expand_
relu (ReLU) None, 16, 16, 192) 0 block_4_expand_BN[0][0]
_____________________________________________________________________________
block_4_depthwise
(DepthwiseCon (None, 16, 16, 192) 1728 block_4_expand_relu[0][0]
_____________________________________________________________________________
block_4_depthwise_
BN (BatchNorm (None, 16, 16, 192) 768 block_4_depthwise[0][0]
_____________________________________________________________________________
block_4_depthwise_
relu (ReLU) (None, 16, 16, 192) 0 block_4_depthwise_BN[0][0]
_____________________________________________________________________________
block_4_project
(Conv2D) (None, 16, 16, 32) 6144 block_4_depthwise_relu[0][0]
_____________________________________________________________________________
block_4_project_
BN (BatchNormal (None, 16, 16, 32) 128 block_4_project[0][0]
_____________________________________________________________________________
block_4_add (Add) (None, 16, 16, 32) 0 block_3_project_BN[0][0]
block_4_project_
BN[0][0]
_____________________________________________________________________________
DNNs for Images Analysis ◾ 139
block_5_expand
(Conv2D) (None, 16, 16, 192) 6144 block_4_add[0][0]
_____________________________________________________________________________
block_5_expand_BN
(BatchNormali (None, 16, 16, 192) 768 block_5_expand[0][0]
_____________________________________________________________________________
block_5_expand_
relu (ReLU) (None, 16, 16, 192) 0 block_5_expand_BN[0][0]
_____________________________________________________________________________
block_5_depthwise
(DepthwiseCon (None, 16, 16, 192) 1728 block_5_expand_relu[0][0]
_____________________________________________________________________________
block_5_depthwise_
BN (BatchNorm (None, 16, 16, 192) 768 block_5_depthwise[0][0]
_____________________________________________________________________________
block_5_depthwise_
relu (ReLU) (None, 16, 16, 192) 0 block_5_depthwise_BN[0][0]
_____________________________________________________________________________
block_5_project
(Conv2D) (None, 16, 16, 32) 6144 block_5_depthwise_relu[0][0]
_____________________________________________________________________________
block_5_project_
BN (BatchNormal (None, 16, 16, 32) 128 block_5_project[0][0]
_____________________________________________________________________________
block_5_add (Add) (None, 16, 16, 32) 0 block_4_add[0][0]
block_5_project_
BN[0][0]
_____________________________________________________________________________
block_6_expand
(Conv2D) (None, 16, 16, 192) 6144 block_5_add[0][0]
_____________________________________________________________________________
block_6_expand_
BN (BatchNormali (None, 16, 16, 192) 768 block_6_expand[0][0]
_____________________________________________________________________________
block_6_expand_
relu (ReLU) (None, 16, 16, 192) 0 block_6_expand_BN[0][0]
_____________________________________________________________________________
block_6_pad
(ZeroPadding2D) (None, 17, 17, 192) 0 block_6_expand_relu[0][0]
_____________________________________________________________________________
block_6_depthwise
(DepthwiseCon (None, 8, 8, 192) 1728 block_6_pad[0][0]
_____________________________________________________________________________
block_6_depthwise_
BN (BatchNorm (None, 8, 8, 192) 768 block_6_depthwise[0][0]
_____________________________________________________________________________
block_6_depthwise_
relu (ReLU) (None, 8, 8, 192) 0 block_6_depthwise_BN[0][0]
_____________________________________________________________________________
block_6_project
(Conv2D) (None, 8, 8, 64) 12288 block_6_depthwise_relu[0][0]
_____________________________________________________________________________
block_6_project_
BN (BatchNormal (None, 8, 8, 64) 256 block_6_project[0][0]
_____________________________________________________________________________
block_7_expand
(Conv2D) (None, 8, 8, 384) 24576 block_6_project_BN[0][0]
_____________________________________________________________________________
block_7_expand_BN
(BatchNormali (None, 8, 8, 384) 1536 block_7_expand[0][0]
_____________________________________________________________________________
140 ◾ Deep Learning in Practice
block_7_expand_
relu (ReLU) (None, 8, 8, 384) 0 block_7_expand_BN[0][0]
_____________________________________________________________________________
block_7_depthwise
(DepthwiseCon (None, 8, 8, 384) 3456 block_7_expand_relu[0][0]
_____________________________________________________________________________
block_7_depthwise_
BN (BatchNorm (None, 8, 8, 384) 1536 block_7_depthwise[0][0]
_____________________________________________________________________________
block_7_depthwise_
relu (ReLU) (None, 8, 8, 384) 0 block_7_depthwise_BN[0][0]
_____________________________________________________________________________
block_7_project
(Conv2D) (None, 8, 8, 64) 24576 block_7_depthwise_relu[0][0]
_____________________________________________________________________________
block_7_project_
BN (BatchNormal (None, 8, 8, 64) 256 block_7_project[0][0]
_____________________________________________________________________________
block_7_add (Add) (None, 8, 8, 64) 0 block_6_project_BN[0][0]
block_7_project_
BN[0][0]
_____________________________________________________________________________
block_8_expand
(Conv2D) (None, 8, 8, 384) 24576 block_7_add[0][0]
_____________________________________________________________________________
block_8_expand_
BN (BatchNormali (None, 8, 8, 384) 1536 block_8_expand[0][0]
_____________________________________________________________________________
block_8_expand_
relu (ReLU) (None, 8, 8, 384) 0 block_8_expand_BN[0][0]
_____________________________________________________________________________
block_8_depthwise
(DepthwiseCon (None, 8, 8, 384) 3456 block_8_expand_relu[0][0]
_____________________________________________________________________________
block_8_depthwise_
BN (BatchNorm (None, 8, 8, 384) 1536 block_8_depthwise[0][0]
_____________________________________________________________________________
block_8_depthwise_
relu (ReLU) (None, 8, 8, 384) 0 block_8_depthwise_BN[0][0]
_____________________________________________________________________________
block_8_project
(Conv2D) (None, 8, 8, 64) 24576 block_8_depthwise_relu[0][0]
_____________________________________________________________________________
block_8_project_
BN (BatchNormal (None, 8, 8, 64) 256 block_8_project[0][0]
_____________________________________________________________________________
block_8_add (Add) (None, 8, 8, 64) 0 block_7_add[0][0]
block_8_project_
BN[0][0]
_____________________________________________________________________________
block_9_expand
(Conv2D) (None, 8, 8, 384) 24576 block_8_add[0][0]
_____________________________________________________________________________
block_9_expand_
BN (BatchNormali (None, 8, 8, 384) 1536 block_9_expand[0][0]
_____________________________________________________________________________
block_9_expand_
relu (ReLU) (None, 8, 8, 384) 0 block_9_expand_BN[0][0]
_____________________________________________________________________________
block_9_depthwise
(DepthwiseCon (None, 8, 8, 384) 3456 block_9_expand_relu[0][0]
_____________________________________________________________________________
DNNs for Images Analysis ◾ 141
block_9_depthwise_
BN (BatchNorm (None, 8, 8, 384) 1536 block_9_depthwise[0][0]
_____________________________________________________________________________
block_9_depthwise_
relu (ReLU) (None, 8, 8, 384) 0 block_9_depthwise_BN[0][0]
_____________________________________________________________________________
block_9_project
(Conv2D) (None, 8, 8, 64) 24576 block_9_depthwise_relu[0][0]
_____________________________________________________________________________
block_9_project_
BN (BatchNormal (None, 8, 8, 64) 256 block_9_project[0][0]
_____________________________________________________________________________
block_9_add (Add) (None, 8, 8, 64) 0 block_8_add[0][0]
block_9_project_
BN[0][0]
_____________________________________________________________________________
block_10_expand
(Conv2D) (None, 8, 8, 384) 24576 block_9_add[0][0]
_____________________________________________________________________________
block_10_expand_
BN (BatchNormal (None, 8, 8, 384) 1536 block_10_expand[0][0]
_____________________________________________________________________________
block_10_expand_
relu (ReLU) (None, 8, 8, 384) 0 block_10_expand_BN[0][0]
_____________________________________________________________________________
block_10_depthwise
(DepthwiseCo (None, 8, 8, 384) 3456 block_10_expand_relu[0][0]
_____________________________________________________________________________
block_10_depthwise_
BN (BatchNor (None, 8, 8, 384) 1536 block_10_depthwise[0][0]
_____________________________________________________________________________
block_10_depthwise_
relu (ReLU) (None, 8, 8, 384) 0 block_10_depthwise_BN[0][0]
_____________________________________________________________________________
block_10_project
(Conv2D) (None, 8, 8, 96) 36864 block_10_depthwise_relu[0][0]
_____________________________________________________________________________
block_10_project_
BN (BatchNorma (None, 8, 8, 96) 384 block_10_project[0][0]
_____________________________________________________________________________
block_11_expand
(Conv2D) (None, 8, 8, 576) 55296 block_10_project_BN[0][0]
_____________________________________________________________________________
block_11_expand_
BN (BatchNormal (None, 8, 8, 576) 2304 block_11_expand[0][0]
_____________________________________________________________________________
block_11_expand_
relu (ReLU) (None, 8, 8, 576) 0 block_11_expand_BN[0][0]
_____________________________________________________________________________
block_11_depthwise
(DepthwiseCo (None, 8, 8, 576) 5184 block_11_expand_relu[0][0]
_____________________________________________________________________________
block_11_depthwise_
BN (BatchNor (None, 8, 8, 576) 2304 block_11_depthwise[0][0]
_____________________________________________________________________________
block_11_depthwise_
relu (ReLU) (None, 8, 8, 576) 0 block_11_depthwise_BN[0][0]
_____________________________________________________________________________
block_11_project
(Conv2D) (None, 8, 8, 96) 55296 block_11_depthwise_relu[0][0]
_____________________________________________________________________________
142 ◾ Deep Learning in Practice
block_11_project_
BN (BatchNorma (None, 8, 8, 96) 384 block_11_project[0][0]
_____________________________________________________________________________
block_11_add (Add) (None, 8, 8, 96) 0 block_10_project_BN[0][0]
block_11_project_
BN[0][0]
_____________________________________________________________________________
block_12_expand
(Conv2D) (None, 8, 8, 576) 55296 block_11_add[0][0]
_____________________________________________________________________________
block_12_expand_
BN (BatchNormal (None, 8, 8, 576) 2304 block_12_expand[0][0]
_____________________________________________________________________________
block_12_expand_
relu (ReLU) (None, 8, 8, 576) 0 block_12_expand_BN[0][0]
_____________________________________________________________________________
block_12_depthwise
(DepthwiseCo (None, 8, 8, 576) 5184 block_12_expand_relu[0][0]
_____________________________________________________________________________
block_12_depthwise_
BN (BatchNor (None, 8, 8, 576) 2304 block_12_depthwise[0][0]
_____________________________________________________________________________
block_12_depthwise_
relu (ReLU) (None, 8, 8, 576) 0 block_12_depthwise_BN[0][0]
_____________________________________________________________________________
block_12_project
(Conv2D) (None, 8, 8, 96) 55296 block_12_depthwise_relu[0][0]
_____________________________________________________________________________
block_12_project_
BN (BatchNorma (None, 8, 8, 96) 384 block_12_project[0][0]
_____________________________________________________________________________
block_12_add (Add) (None, 8, 8, 96) 0 block_11_add[0][0]
block_12_project_
BN[0][0]
_____________________________________________________________________________
block_13_expand
(Conv2D) (None, 8, 8, 576) 55296 block_12_add[0][0]
_____________________________________________________________________________
block_13_expand_BN
(BatchNormal (None, 8, 8, 576) 2304 block_13_expand[0][0]
_____________________________________________________________________________
block_13_expand_
relu (ReLU) (None, 8, 8, 576) 0 block_13_expand_BN[0][0]
_____________________________________________________________________________
block_13_pad
(ZeroPadding2D) (None, 9, 9, 576) 0 block_13_expand_relu[0][0]
_____________________________________________________________________________
block_13_depthwise
(DepthwiseCo (None, 4, 4, 576) 5184 block_13_pad[0][0]
_____________________________________________________________________________
block_13_depthwise_
BN (BatchNor (None, 4, 4, 576) 2304 block_13_depthwise[0][0]
_____________________________________________________________________________
block_13_depthwise_
relu (ReLU) (None, 4, 4, 576) 0 block_13_depthwise_BN[0][0]
_____________________________________________________________________________
block_13_project
(Conv2D) (None, 4, 4, 160) 92160 block_13_depthwise_relu[0][0]
_____________________________________________________________________________
block_13_project_
BN (BatchNorma (None, 4, 4, 160) 640 block_13_project[0][0]
_____________________________________________________________________________
DNNs for Images Analysis ◾ 143
block_14_expand
(Conv2D) (None, 4, 4, 960) 153600 block_13_project_BN[0][0]
_____________________________________________________________________________
block_14_expand_
BN (BatchNormal (None, 4, 4, 960) 3840 block_14_expand[0][0]
_____________________________________________________________________________
block_14_expand_
relu (ReLU) (None, 4, 4, 960) 0 block_14_expand_BN[0][0]
_____________________________________________________________________________
block_14_depthwise
(DepthwiseCo (None, 4, 4, 960) 8640 block_14_expand_relu[0][0]
_____________________________________________________________________________
block_14_depthwise_
BN (BatchNor (None, 4, 4, 960) 3840 block_14_depthwise[0][0]
_____________________________________________________________________________
block_14_depthwise_
relu (ReLU) (None, 4, 4, 960) 0 block_14_depthwise_BN[0][0]
_____________________________________________________________________________
block_14_project
(Conv2D) (None, 4, 4, 160) 153600 block_14_depthwise_relu[0][0]
_____________________________________________________________________________
block_14_project_
BN (BatchNorma (None, 4, 4, 160) 640 block_14_project[0][0]
_____________________________________________________________________________
block_14_add (Add) (None, 4, 4, 160) 0 block_13_project_BN[0][0]
block_14_project_
BN[0][0]
_____________________________________________________________________________
block_15_expand
(Conv2D) (None, 4, 4, 960) 153600 block_14_add[0][0]
_____________________________________________________________________________
block_15_expand_
BN (BatchNormal (None, 4, 4, 960) 3840 block_15_expand[0][0]
_____________________________________________________________________________
block_15_expand_
relu (ReLU) (None, 4, 4, 960) 0 block_15_expand_BN[0][0]
_____________________________________________________________________________
block_15_depthwise
(DepthwiseCo (None, 4, 4, 960) 8640 block_15_expand_relu[0][0]
_____________________________________________________________________________
block_15_depthwise_
BN (BatchNor (None, 4, 4, 960) 3840 block_15_depthwise[0][0]
_____________________________________________________________________________
block_15_depthwise_
relu (ReLU) (None, 4, 4, 960) 0 block_15_depthwise_BN[0][0]
_____________________________________________________________________________
block_15_project
(Conv2D) (None, 4, 4, 160) 153600 block_15_depthwise_relu[0][0]
_____________________________________________________________________________
block_15_project_
BN (BatchNorma (None, 4, 4, 160) 640 block_15_project[0][0]
_____________________________________________________________________________
block_15_add (Add) (None, 4, 4, 160) 0 block_14_add[0][0]
block_15_project_
BN[0][0]
_____________________________________________________________________________
block_16_expand
(Conv2D) (None, 4, 4, 960) 153600 block_15_add[0][0]
_____________________________________________________________________________
block_16_expand_
BN (BatchNormal (None, 4, 4, 960) 3840 block_16_expand[0][0]
_____________________________________________________________________________
144 ◾ Deep Learning in Practice
block_16_expand_
relu (ReLU) (None, 4, 4, 960) 0 block_16_expand_BN[0][0]
_____________________________________________________________________________
block_16_depthwise
(DepthwiseCo (None, 4, 4, 960) 8640 block_16_expand_relu[0][0]
_____________________________________________________________________________
block_16_depthwise_
BN (BatchNor (None, 4, 4, 960) 3840 block_16_depthwise[0][0]
_____________________________________________________________________________
block_16_depthwise_
relu (ReLU) (None, 4, 4, 960) 0 block_16_depthwise_BN[0][0]
_____________________________________________________________________________
block_16_project
(Conv2D) (None, 4, 4, 320) 307200 block_16_depthwise_relu[0][0]
_____________________________________________________________________________
block_16_project_
BN (BatchNorma (None, 4, 4, 320) 1280 block_16_project[0][0]
_____________________________________________________________________________
Conv_1 (Conv2D) (None, 4, 4, 1280) 409600 block_16_project_BN[0][0]
_____________________________________________________________________________
Conv_1_bn
(BatchNormalization) (None, 4, 4, 1280) 5120 Conv_1[0][0]
_____________________________________________________________________________
out_relu (ReLU) (None, 4, 4, 1280) 0 Conv_1_bn[0][0]
=============================================================================
Total params: 2,257,984
Trainable params: 412,800
Non-
trainable params: 1,845,184
EPOCHS = 25
VAL_SUBSPLITS = 5
VALIDATION_STEPS = info.splits['test'].num_examples//
BATCH_SIZE//VAL_SUBSPLITS
model_history = model.fit(train_dataset,
epochs=EPOCHS,
steps_per_epoch=STEPS_PER_EPOCH,
validation_steps=VALIDATION_STEPS,
validation_data=test_dataset,
callbacks=[DisplayCallback()])
and for predicted mask after 3000 iteration, we can
have a better output.
import numpy as np
loss = model_history.history['loss']
val_loss = model_history.history['val_loss']
n=np.shape(loss)
print(n)
m=(np.ones(n)-loss)
epochs = range(EPOCHS)
plt.figure()
plt.plot(epochs, m, 'g', label='accuracy')
plt.title1('accuracy')
plt.xlabel('Epoch')
plt.ylabel('accuracy Value')
plt.ylim([0, 1])
plt.legend()
plt.show()
146 ◾ Deep Learning in Practice
FIGURE 6.19 The predicted mask using CNN after one iteration.
FIGURE 6.20 The predicted mask using CNN after 3000 iterations.
6.7.1 Import Libraries
Here, we use python, NumPy, matplotlib, and TensorFlow libraries for
computation, visualization, and network modeling.
lfw_url = "http://vis-www.cs.umass.edu/lfw/lfw-
deepfunneled.tgz"
lfw_path = "lfw- deepfunneled.tgz"
keras.utils.get_file(lfw_path, lfw_url, cache_dir=".",
cache_subdir="")
print("Extracting images: ", end="")
data = np.float32([ image for image in progress(read _
images(lfw _ path, size _ x=36, size _ y=36), every=200) ])
CODE_SIZE = 256
def generator():
model = keras.Sequential()
model.add(layers.Input(shape=(CODE_SIZE,),
name='code'))
model.add(layers.Dense(6*6*32, activation='relu'))
model.add(layers.Reshape((6,6,32)))
model.add(layers.Conv2DTranspose(128, kernel_size=5,
activation='relu'))
model.add(layers.Conv2DTranspose(128, kernel_size=3,
activation='relu'))
model.add(layers.Conv2DTranspose(64, kernel_size=3,
activation='relu'))
148 ◾ Deep Learning in Practice
model.add(layers.UpSampling2D())
model.add(layers.Conv2DTranspose(64, kernel_size=3,
activation='relu'))
model.add(layers.Conv2DTranspose(32, kernel_size=3,
activation='relu'))
model.add(layers.Conv2DTranspose(32, kernel_size=3,
activation='relu'))
model.add(layers.Conv2DTranspose(3, kernel_size=3))
return model
def discriminator():
model = keras.Sequential()
model.add(layers.Input(shape=IMAGE_SHAPE,
name="image"))
model.add(layers.Conv2D(32, kernel_size=3,
activation='elu'))
model.add(layers.Conv2D(32, kernel_size=5,
activation='elu'))
model.add(layers.Conv2D(64, kernel_size=3,
activation='elu'))
model.add(layers.MaxPool2D())
model.add(layers.Conv2D(64, kernel_size=3,
activation='elu'))
model.add(layers.Conv2D(128, kernel_size=5,
activation='elu'))
model.add(layers.Conv2D(128, kernel_size=3,
activation='elu'))
model.add(layers.Flatten())
model.add(layers.Dense(256, activation='tanh',
kernel_regularizer=regularizers.l2()))
model.add(layers.Dense(1, activation='sigmoid'))
return model
Now, you can call the generator and check the model summary.
keras.backend.clear_session()
gen = generator()
gen.summary()
print("Inputs :", gen.inputs)
print("Outputs:", gen.outputs)
DNNs for Images Analysis ◾ 149
The model summary here shows you the layers and the parameters
(trainable and non-trainable) and the total parameters of the network.
Model: "sequential"
______________________________________________________
Layer (type) Output Shape Param #
======================================================
dense (Dense) (None, 1152) 296064
______________________________________________________
reshape (Reshape) (None, 6, 6, 32) 0
______________________________________________________
conv2d_transpose
(Conv2DTran (None, 10, 10, 128) 102528
______________________________________________________
conv2d_transpose_1
(Conv2DTr (None, 12, 12, 128) 147584
______________________________________________________
conv2d_transpose_2
(Conv2DTr (None, 14, 14, 64) 73792
______________________________________________________
up_sampling2d
(UpSampling2D) (None, 28, 28, 64) 0
______________________________________________________
conv2d_transpose_3
(Conv2DTr (None, 30, 30, 64) 36928
______________________________________________________
conv2d_transpose_4
(Conv2DTr (None, 32, 32, 32) 18464
______________________________________________________
conv2d_transpose_5
(Conv2DTr (None, 34, 34, 32) 9248
______________________________________________________
conv2d_transpose_6
(Conv2DTr (None, 36, 36, 3) 867
======================================================
Total params: 685,475
Trainable params: 685,475
Non-
trainable params: 0
150 ◾ Deep Learning in Practice
Now call the discriminator function and then check the model
summary.
disc = discriminator()
disc.summary()
print("Inputs :", disc.inputs)
print("Outputs:", disc.outputs)
manager = tf.train.CheckpointManager(ckpt,
directory="./checkpoints", max_to_keep=10)
6.7.5 Generate Images
We can generate the image in this step. Figure 6.21 shows the image exam-
ples that the network has generated. More epochs generate the better results.
for _ in progress(range(30000)):
codes = sample_codes(100)
images = sample_images(100)
for n in range(5):
disc_opt.minimize(lambda: disc_loss(images,
codes), disc.trainable_weights)
gen_opt.minimize(lambda: gen_loss(codes), gen.
trainable_weights)
if epoch.numpy() % 100 == 0:
display.clear_output(wait=True)
print("Epoch:", epoch.numpy())
plot_images(2, 3)
plot_probas(1000)
manager.save()
epoch.assign _ add(1)
speech to text
sentiment analysis
Figure 7.1 shows the general architecture of the project. The virtual robot
gets audio and visual data through the camera and microphone and inter-
acts with the user.
7.2.2 Face Detection
In the first step, the face should be detected in the context by using a face
detection algorithm. One of the facial detection algorithms is the Viola-
Jones that use Haar’s signs (a set of black and white rectangular filters to
move over the images and find the most similar segments to the black or
white parts). This section presents the steps and code samples for imple-
menting facial detection using the CNN model (Figure 7.3).
7.2.2.1 Import Libraries
Importing some libraries to ease the computation and plotting. The main
libraries are TensorFlow, matplotlib, and NumPy, which help in creating a
network, plotting, and computing.
import TensorFlow as ft
import matplotlib.pyplot as plt
import NumPy as np
tf.enable_eager_execution()
import functools
7.2.2.2 Dataset
In this step, choose a database (there are many databases for this purpose)
to create training, validation, and testing data. Then, you can use Keras
libraries to import and create data segments.
Import keras
training _ data = tf.keras.utils.get _ file(…)
Here is the code for defining a CNN model using TensorFlow and Keras:
7.2.2.4 Model Training
Train the model by defining some parameters like batch normalization
size, number of epochs, and learning rate. After loading the training
images, convert them to the tensors and then do training as follows:
batch_size = 24
num_epochs = 100
learning_rate = 1e- 2
# optimizer definition
optimizer = tf.train.AdamOptimizer (learning_rate =
learning_rate)
# the loss definition
loss_history = util.LossHistory (smoothing_factor =
0.99)
# the training loop!
for epoch in range (num_epochs):
custom_msg = util.custom_progress_text ("Epoch:
%(epoch).0f Loss: %(loss)2.2f")
bar = util.create_progress_bar (custom_msg)
for idx in bar (range (loader.get_train_size ()//
batch_size)):
# convert the images to tensors
x, y = loader.get_batch (batch_size)
x = tf.convert_to_tensor (x, dtype=tf.float32)
y = tf.convert_to_tensor (y, dtype=tf.float32)
with tf.GradientTape () as tape:
logits = standard_classifier (x)
# compute the loss
loss_value = tf.nn.sigmoid_cross_entropy_with_
logits(labels = y, logits = logits)
custom_msg.update_mapping(epoch = epoch, loss =
loss_value.numpy().mean())
# backpropagation
grads = tape.gradient(loss_value, standard_
classifier.variables)
optimizer.apply_gradients (zip(grads,standard_
classifier.variables) global_step = tf.train.
get_or_create_global_step())
loss_history.append(loss_value.numpy().mean())
return loss _ history.get()
158 ◾ Deep Learning in Practice
7.2.2.5 Evaluate Performance
Evaluate the performance of the model by using validation data.
7.2.3 Landmark Detection
In this step, the algorithm finds the face landmarks/poses. It needs these
points for face alignment (here, we already collected the data and detected
the face) (Figure 7.4).
7.2.3.1 CNN Model
There are different CNN methods for finding the landmarks. VGG-16 is
one of the first CNN methods, and here we use its architecture for finding
the landmarks.
model = Sequential()
model.add (Conv2D(filters =16, kernel_size = 3,
activation = 'relu', input_shape = (28, 28, 1)))
model.add (MaxPooling2D(pool_size = 2))
model.add (Conv2D(filters = 32, kernel_size = 3,
activation = 'relu'))
model.add (MaxPooling2D(pool_size = 2))
model.add (Conv2D(filters = 64, kernel_size = 3,
activation = 'relu'))
model.add (MaxPooling2D(pool_size = 2))
model.add (Conv2D(filters = 128, kernel_size = 3,
activation = 'relu'))
model.add (MaxPooling2D(pool_size = 2))
model.add (Conv2D(filters = 256, kernel_size = 3,
activation = 'relu'))
model.add (MaxPooling2D(pool_size = 2))
model.add (Flatten())
model.add (Dense(512, activation = 'relu'))
DNNs for Virtual Assistant Robots ◾ 159
model.add (Dropout(0.2))
model.add (Dense(30))
model.summary()
The output is the model summary of the network that shows the layers and
the parameters.
Layer (type) Output Shape Param #
======================================================
conv2d_70 (Conv2D) (None, 94, 94, 16) 160
______________________________________________________
max_pooling2d_62
(MaxPooling) (None, 47, 47, 16) 0
______________________________________________________
conv2d_71 (Conv2D) (None, 45, 45, 32) 4640
______________________________________________________
max_pooling2d_63
(MaxPooling) (None, 22, 22, 32) 0
______________________________________________________
conv2d_72 (Conv2D) (None, 20, 20, 64) 18496
______________________________________________________
max_pooling2d_64
(MaxPooling) (None, 10, 10, 64) 0
______________________________________________________
conv2d_73
(Conv2D) (None, 8, 8, 128) 73856
______________________________________________________
max_pooling2d_65
(MaxPooling) (None, 4, 4, 128) 0
______________________________________________________
conv2d_74 (Conv2D) (None, 2, 2, 256) 295168
______________________________________________________
max_pooling2d_66
(MaxPooling) (None, 1, 1, 256) 0
______________________________________________________
flatten_5 (Flatten) (None, 256) 0
______________________________________________________
dense_10 (Dense) (None, 512) 131584
______________________________________________________
dropout_5 (Dropout) (None, 512) 0
______________________________________________________
dense_11 (Dense) (None, 30) 15390
======================================================
Total params: 539, 294
Trainable params: 539, 294
Non-trainable params: 0
160 ◾ Deep Learning in Practice
7.2.3.2 Model Training
Now compile and train the model:
epochs = 10
batch_size = 32
filepath = 'model_weights.ckpt'
checkpointer = ModelCheckpoint(filepath, verbose =1,
save_best_only = True, period = 5)
#Compile the model
model.compile(optimizer = 'adam', loss = 'mse',
metrics = ['accuracy'])
history = model.fit(train_input, train_output,
validation_split = 0.2, callbacks =
[checkpointer,hist], batch_size = batch_size, epochs =
epochs, verbose=1)
model.save('my _ model.h5')
7.2.4 Spoof Detection
Anti-spoofing is a very important step in facial recognition. For example,
people can use the photo of a person to access his/her account, and it is a
real problem in the real-world problems that deploy facial recognition sys-
tems for their security. For this purpose, there are different methods. One
of these methods is using infrared (IR) sensors. Also, you can use deep
learning for this purpose. You can deploy the VGG-16 architecture (the
same as the previous section here). You can create a supervised or unsu-
pervised method for detecting the live images for training the model
(depend on the live and fake image data that you have (supervised) or if
you don’t have enough data for training(unsupervised)). In real-time
problems, using IR gives you a real-time response (Figure 7.5).
DNNs for Virtual Assistant Robots ◾ 161
IR sensor
spoof live
7.2.6 Training
After extracting the features in the training step, the loss function like
Euclidean distance or angular distance extracts the final trained model.
The most popular loss function is Euclidean distance (Figure 7.7).
7.2.7 Testing
After the feature extraction on the test images, the algorithm uses some
threshold comparison and metric learning to do the face matching. These
techniques compare the encoding vectors from the training database with
the encoding vectors from faces in the test database to recognize the per-
son. The accuracy can be calculated with a different metric (we discussed
before (Figure 7.8)).
1. dataset collection,
2. data preprocessing,
3. model training,
4. model evaluation, and
5. evaluate the trained model.
7.3.1 Dataset Collection
There are several verified datasets like the RAVDESS (audiovisual data-
set). However, if you plan to run a specific project, it is better to collect
your own dataset. Depending on the output and the features you plan to
classify (your data should have those features), you can decide about the
data collection type. For example, there are some criteria like 3D data or
indoor and outdoor data for the image (this step extracts the data values).
For instance, in the RAVDESS dataset (Figure 7.9), the values are path,
actor, gender, intensity, repetition, and emotion states. Here the basic emo-
tion states are happiness, sadness, disgust, anger, calm, fear, surprise, and
neutral.
for i in dir_list1:
file_list = os.listdir('path' + i)
for f in file_list:
nm = f.split('.')[0].split('-')
path = ‘path’ + i + '/' + f
if int(actor)%2 == 0:
gender = "female"
else:
gender = "male"
src = int(nm[1])
emotion = int(nm[2])
if nm[3] == '01':
intensity = 0
else:
intensity = 1
if nm[4] == '01':
statement = 0
else:
statement = 1
if nm[5] == '01':
repeat = 0
164 ◾ Deep Learning in Practice
else:
repeat = 1
actor = int(nm[6])
data_df.loc[count] = [path, src, actor, gender,
intensity, statement, repeat, emotion]
count += 1
7.3.2 Data Preprocessing
In this step, do some preprocessing on the data to make data ready for
training. It may include data cleaning, feature extraction, labeling, and
data augmenting.
7.3.2.1 Labeling
If the data are from databases, they are usually annotated and if you plan
to collect your data, you should label them. The labeling is depending on
the type of the problem and the feature you would like to classify. For
example, in this project, the labels are male or females, positive or nega-
tive, and there are seven different emotion states:
label_list = []
for i in range(len(data_df)):
if data_df.emotion[i] == 1:
lb = "_neutral"
elif data_df.emotion[i] == 2:
lb = "_calm"
elif data_df.emotion[i] == 3:
lb = "_happy"
elif data_df.emotion[i] == 4:
lb = "_sad"
elif data_df.emotion[i] == 5:
lb = "_angry"
elif data_df.emotion[i] == 6:
lb = "_fearful"
elif data_df.emotion[i] == 7:
lb = "_disgust"
elif data_df.emotion[i] == 8:
lb = "_surprised"
else:
lb = "_none"
label_list.append(data_df.gender[i] + lb)
DNNs for Virtual Assistant Robots ◾ 165
7.3.3 Feature Extraction
For training, extract some audio features that make the differences between
sounds. The raw data in audio are in the time domain, and with some func-
tions like FFT, you can transfer them to the frequency domain. The audio
signals have some features like frequency (frequency domain), mel-
frequency cepstrum (MFC), or mel- frequency cepstral coefficients
(MFCCs) (texture of the sound), which you can extract them to use as a
feature vector for training. Standard libraries like librosa help you to do
some signal analysis, like loading the data and finding a sample rate:
data = pd.DataFrame(columns=['feature'])
for i in tqdm(range(len(data_df))):
X, sample_rate = librosa.load(data_df.path[i],
res_type = 'kaiser_fast',duration = input_duration,sr
= 22050*2, offset = 0.5)
sample_rate = np.array(sample_rate)
mfccs = np.mean(librosa.feature.mfcc(y = X, sr =
sample_rate, n_mfcc = 13), axis = 0)
feature = mfccs
data.loc[i] = [feature]
7.3.3.1 Data Augmentation
In deep learning, you need a large amount of data to train the model (there
are always some limitations for data). You can use data augmentation
techniques to generate new data and increase the database size. Here for
the audio signals, you can do some tasks like adding noise, shifting time
using NumPy, changing pitch, and changing speed using Librosa.
def noise(data)
noise_amp = 0.005*np.random.uniform()*np.
amax(data)
data = data.astype('float64') + noise_amp *
np.random.normal(size = data.shape[0])
return data
def shift(data):
s_range = int(np.random.uniform(low = -5, high =
5)*500)
return np.roll(data, s_range)
def speedNpitch(data):
length_change = np.random.uniform(low = 0.8,
high = 1)
speed_fac = 1.0 / length_change
166 ◾ Deep Learning in Practice
tmp = np.interp(np.arange(0,len(data),speed_
fac),np.arange(0,len(data)),data)
minlen = min(data.shape[0], tmp.shape[0])
data *= 0
data[0:minlen] = tmp[0:minlen]
return data
7.3.4 Model Training
In this step, you can define the model and optimizer, for CNN model.
Now you can train the model with a part of the dataset (training data
about 70 to 80 percent of the original dataset):
model = Sequential()
model.add(Conv1D(256, 5, padding = 'same',input_shape
= (X_train.shape[1],1)))
model.add(Activation('relu'))
model.add(Conv1D(256, 5, padding = 'same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(MaxPooling1D(pool_size = (8)))
model.add(Conv1D(128, 5, padding = 'same'))
model.add(Activation('relu'))
model.add(Conv1D(128, 5, padding = 'same'))
model.add(Activation('relu'))
model.add(Conv1D(128, 5, padding = 'same'))
model.add(Activation('relu'))
model.add(Conv1D(128, 5, padding = 'same'))
model.add(BatchNormalization())
model.add(Activation('relu'))
model.add(Dropout(0.25))
model.add(MaxPooling1D(pool_size = (8)))
model.add(Conv1D(64, 5, padding = 'same'))
model.add(Activation('relu'))
model.add(Conv1D(64, 5, padding = 'same'))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dense(14))
model.add(Activation('softmax'))
opt = keras.optimizers.SGD(lr = 0.0001, momentum = 0.0,
decay = 0.0, nesterov = False)
7.3.5 Model Evaluation
Now, you can validate and evaluate your model using the validation and
test dataset. You can change different parameters and hyperparameters in
the previous steps (1-3) to achieve the desired accuracy, optimize the
model, and move to the next step.
data_test = pd.DataFrame(columns=['feature'])
for i in tqdm(range(len(data_df))):
X, sample_rate = librosa.load(data_df.path[i],
res_type = 'kaiser_fast',duration = input_
duration,sr = 22050*2,offset = 0.5)
sample_rate = np.array(sample_rate)
mfccs = np.mean(librosa.feature.mfcc(y = X, sr =
sample_rate, n_mfcc = 13), axis = 0)
DNNs for Virtual Assistant Robots ◾ 169
feature = mfccs
data_test.loc[i] = [feature]
test_valid = pd.DataFrame(data_test['feature'].values.
tolist())
test_valid = np.array(test_valid)
test_valid_lb = np.array(data_df.label)
lb = LabelEncoder()
test_valid_lb = np_utils.to_categorical(lb.
fit_transform(test_valid_lb))
test _ valid = np.expand _ dims(test _ valid, axis=2)
7.4 SPEECH TO TEXT
You can do the speech-to-text in these steps by using different deep learn-
ing models. Here the model input is raw audio data, and the output is the
transcription of the person's spoken data. In this section, some parts like
loading or splitting data have been skipped. There are several databases
that you can use for this purpose. For example, LibriSpeech contains 1000
hours of speech, and the data derived from some audiobooks is one choice.
You can define three main steps here:
hh
Probability of A
RNN Probability of B
Inputs
…
7.4.1 Feature Extraction
Doing some popular processing on speech to extract the features. Here, we
extract two features: spectrograms and MFCC. There are several refer-
ences if you would like to learn more about these features. Here, the spec-
trogram function's output is a 2D tensor where the first dimension is the
time, and the second dimension is the frequency values. The second fea-
ture MFCC has the same concept as the spectrogram, and its feature vec-
tor has lower dimensions than the spectrogram. The features here are
spectrograms and Mel-Frequency Cepstral Coefficients (MFCCs). Several
libraries like librosa (we used in the previous section) can be used here to
extract these features.
7.4.3 Decoder
In this step, the coded data should convert to the original text format
coded in the first step. Here is the sample code for the RNN model using
Keras.
from keras import backend as K
from keras.models import Model
from keras.layers import (BatchNormalization, Conv1D,
Dense, Input, TimeDistributed, Activation,
Bidirectional, SimpleRNN, GRU, LSTM)
DNNs for Virtual Assistant Robots ◾ 171
7.4.4 Predictions Calculation
You can write a function to decode the predictions of your acoustic model.
7.5 SENTIMENT ANALYSIS
There are many applications for sentiment analysis. For example, suppose
a company created a product, and they would like to know about the user
feedback to improve their product. One of the methods is to check cus-
tomer feedback through social media comments like Twitter and Facebook.
For example, if there are thousands of words on customer feedback, doing
sentiment analysis helps get a general opinion about the positive or nega-
tive of a product. For doing sentiment analysis, you should know about
text processing. Text is another type of data (that is the output here). In
this step, we use these text data (extracted in the previous step) to do some
sentiment analysis to decide or generate a report. You should know two
definitions: a) bag of the word: it is a sequence of the word and b) word
embedding: it is the process of converting word to vector. If you have a bag
of words, using some methods like MLP is not good for this purpose. You
can use a deep neural network to convert a sequence of the words to a vec-
tor (encoding) and then convert the vector to a sequence of the desired
format (decoding). After converting each word to a vector, there is a matrix
(each row in this matrix is a vector that represents a word) for a set of the
word. Now the algorithm finds the probability of the word's occurrence
and the words around this word. Then by using SoftMax, it converts each
172 ◾ Deep Learning in Practice
word value and probabilities to one probability. Each word in the same
context has a similar vector. For example, the words kitchen and oven are
similar in comparison to kitchen and history.
There are two main parts in DAN:
There are several methods for this purpose, and one of them is the deep
averaging network (DAN). This network does word embeddings for all
words in the sentence, averaging these vectors (encoding) and then put-
ting them in the network. The network's output is a scalar (decoding) that
determines the sentence's value as positive or negative. In DAN, at first, we
get a sequence of words and then convert these words to the vectors and
then find the average of these vectors and pass this vector to the network
and do training by using some methods like Stochastic Gradient Descent
(SGD). However, DAN has a problem in the output that it does not do
order in the sequence. For example, when the sentence, be: The fire
destroyed the houses, maybe the output be: The houses destroyed fire, that
the concepts are different. For this purpose, we can use a recurrent neural
network.
These are the steps for sentiment analysis using DAN:
1. load dataset,
2. create DAN network,
3. train the network, and
4. evaluate model.
7.5.1 Load Dataset
In this step, load the dataset, put data in the training and testing category,
and label them positively and negatively. Create a dictionary of the words
for the word embedding process. You need a function to convert the words
to the vector. Train and test review can be stored as a data frame and the
vocabulary as a dictionary with key (word) and value(index). The vocabu-
lary is used to find a bag of word features.
DNNs for Virtual Assistant Robots ◾ 173
It means the sentence has one deep, two learning, one in, and one
practice.
1. The average function takes tensors as inputs which each one is a bag
of the word representation of reviews that its output is tensor for the
averaged review.
2. The forward function takes tensor as a bag of word representation
of reviews and calls the average function to find the averaged review
and send it to the layer to find the model.
8.1 DATA PREPROCESSING
The goal of this chapter is to learn how to find the best-trained model with
the minimum error. In the first step, let us review the data categories that
we need in our projects. In total, we can divide the original data into three
categories:
1. training data,
2. validation data, and
3. test data.
We have discussed about these three categories, and you know their defi-
nitions. The key point here is the percentage of each category that there are
some numbers like 70% for training, 15% for validation, and 15% for test-
ing. If the tested model on the test data reached the desired accuracy, we
can utilize it with new real-world data. These ratios are not fixed, and you
can change and check the results to find the best percentages. Figure 8.1
shows these three segments.
After the training step and finding the first model using the training
data, you can use the validation data to check if the trained model is good
for your goal or not. If it is not, you can find the reasons to improve the
model. The reasons are directly or indirectly related to several elements
like the nature of the data, features of the data, networks parameters, and
hyperparameters. The question is, which of these values can make the
model less appropriate for our project, and how can we change them?
DOI: 10.1201/9781003025818-8 175
176 ◾ Deep Learning in Practice
There is a lot of active research in this field, and in this chapter, we discuss
and analyze this question and present some solutions for it.
Figure 8.2 shows the validation error segments that calculating these
segments is the first step for analyzing the model and evaluating it. In the
formula, the variance, bias, and noise can be calculated as follows:
Variance hD X h X
2
Bias h X y X
2
Noise y X y X
2
Finding the Best Model ◾ 177
hD X y hD X h X h X y X
2 2 2
y X y X
2
Note y X y X 1/ n y x y x
2 2
( i 0: m )
i i
where:
y X 1 p y 1|X 2 p y 2|X
For example, if the sample is apple, then the x1 is the first sample, and
the y1 is the value that shows this sample is apple or not. The x1 is the fea-
ture vector that here, for example, is the price, shape, and color (so the x
dimension here is 3). The 𝑦¯(𝐱) is the expected label, and 𝑦(𝐱) is the label,
and the 𝐱 ={x1,…,xn}. Sometimes the noises are on the data, and in this
case, you can do some preprocessing on data like noise removal before
using data for training. But if you do all the preprocessing and realize the
noise value is still high, you should look for some other reasons to reduce
the noise or remove it.
Two main reasons that make the noises high are:
Remember that the definitions here are for the individual data sample and
not about the nature of the data. Figure 8.3 shows the noise on the image
data that corrupted the image completely.
FIGURE 8.3 The noise on data can change the results (here, the noise is on the
nature of data and can be removed by some preprocesses on data).
Finding the Best Model ◾ 179
8.3.1 Labeling
When the noise is high, then there are possibilities that the labeling is not
correct, and you should do labeling again. Let us explain more with an
example. If you label some orange data instead of apple, you have incor-
rectly labeled data that makes the training problematic, like noise data,
and made the model inaccurate. For solving this problem, do these steps:
If you find that the noise has been changed and decreased after the last
step, labeling is the main reason; otherwise, we should check other reasons
to remove the noise. Figure 8.4 shows how incorrect labeling can be noisy
and make the error value high.
8.3.2 Features
Another reason that can make the noise high is incorrect features. For
example, if you choose a feature like, apple seed, maybe it is not the correct
feature that can help train the model. Also, maybe choosing some features
gives a better model compared to others. For example, if you select the
color and shape for the object classification problem, maybe the features
FIGURE 8.4 Incorrect labeling data can make the noise value high (this is the
noise we discuss here).
180 ◾ Deep Learning in Practice
like shape and material can make the data definition more accurate. The
key point here is to analyze the problem correctly and determine which
features show a better data presentation. For example, for speech data, the
Mel frequency and spectrum show the audio data structure better. So, if
you realize the noise is high, do these steps:
Now, if the noise has been reduced, it shows the incorrect features are the
main reason.
Bias h X y X
2
Where:
h X 1 / m
i 1:m
hDi X
Finding the Best Model ◾ 181
y X 1 p y 1|X 2 p y 2|X
There are the same discussion and examples if you choose the incorrect
features and how it can affect the inaccurate modeling (Figure 8.5). The
𝑦¯(𝐱) is the expected label, and 𝑦(𝐱) is the label and the 𝐱 ={x1,…,in}.
8.4.1 Incorrect Classifier
When the bias is high, there is a possibility that the classifier is not correct,
and the classifier cannot train data behavior. For example, when you are
using linear classifiers for nonlinear data (Figure 8.6).
8.4.2 Incorrect Features
Choosing the correct features is one of the key points in finding the proper
model. The features are directly and indirectly effective on the training
and the classification results (Figure 8.7).
FIGURE 8.7 The wrong feature can make the error high.
Variance hD X h X
2
D hD X h X 1/ m
2 2
hDj X i h X i
i 1:m
Where:
h X 1 / m
i 1:m
hDi X
y X 1 p y 1|X 2 p y 2|X
8.5.1 New Dataset
You can change your dataset and find a different dataset or do more pre-
processing like data augmentation and other processes on your data. Then,
calculate the variance, check how new data change the variance, and check
how this change can make the trained model better (Figure 8.9).
184 ◾ Deep Learning in Practice
FIGURE 8.9 Different datasets or doing data preprocessing like data augmentation
can change the variance.
8.6 BIAS/VARIANCE
Now let us see the relationship between bias and variance and how their
values can be closed to the target. In total, there are four main states for the
tradeoff between bias and variance (L: Low, B: Bias, V: Variance, H: High).
1. LB/LV
2. HB/LV
3. LB/HV
4. HB/HV
Figure 8.10 shows these four states. The question is, what is the noise value
and its relations to these two parameters? Noises are a part of the data, and
as mentioned in the first step, you can do some preprocessing on the data
to remove or decrease the noise and then check some other reasons like
the features or labels to see that they have been chosen correctly or not and
do the right action. As you can see in Figure 8.10, the best state is LB/LV
(Low Bias and Low Variance), that the values are very close to the target.
Another question is how we can find the best values for bias and vari-
ance? Figure 8.11 demonstrates the tradeoff between bias and variance
and their relations to model complexity.
It corresponds to minimizing the total value that is equal to the sum-
mation of bias and variance. The dotted vertical line shows the point for
optimum model complexity. The point in Figure 8.11 is the best value for
Finding the Best Model ◾ 185
the total error. Thus, you can find the bias and variance values in your
model and then find the point that the total error value starts to increase.
The process is based on the training (or validation) error, test error, and
acceptable (or target) error. So, in the first step, check the value of these
two parameters and then follow the steps to find the reasons and solve
them.
One discussion is about the high variance and high bias on the data.
When the model has a small variance, and its bias is high, the model
underfits the target, and on the other hand, a model with high variance
leads to overfitting. Underfitting and overfitting are the problems that
make the model inaccurate. To avoid these two problems, we can find the
situations that make the model variance or bias high and then try to find
the solution for these high values. Here we review these two situations.
8.7.1 High Variance
A high variance in the data happens when the data are very spread out
from the mean value. When the variance is high, then in this situation,
overfitting can happen, and to solve this problem at first, we should find
Finding the Best Model ◾ 187
the variance value and find that its value is high or not. For finding the
high variance, you can look at:
After finding that the variance is high, you should know how to solve the
problem and the solutions. Let us look at some solutions for it:
As mentioned, these are the most popular solutions for the high-variance
problem. However, there is still some active research in this field.
8.7.2 High Bias
When the prediction or classification is not correct, then the bias is high.
When the data are very biased, then the concluding and final results are
not correct. With high bias, underfitting can happen.
Based on Figure 8.12, the bias is high when this happens:
189
190 ◾ Bibliography
Ghayoumi, M., (2017). A Quick Review of Deep Learning in Facial Expression, Journal
of Communication and Computer. doi:10.17265/1548-7709/2017.01.004
Ghayoumi, M., and Bansal, A. K., (2017). Emotion Analysis Using Facial Key Points
and Dihedral Group, International Journal of Advanced Studies in Computer
Science and Engineering (IJASCSE).
Ghayoumi, M., et al., (2006a). Color Images Segmentation Using a Self-Organizing
Network with Adaptive Learning Rate, International Journal of Information
Technology, Poland, pp. 72–80.
Ghayoumi, M., et al., (2006b). Correlation Error Reduction of Images in Stereo
Vision with Fuzzy Method and its Application on Cartesian Robot, 19th
Australian Joint Conference on Artificial Intelligence (AI2006).
Ghayoumi, M., et al., (2016a). A Formal Approach for Multimodal Integration to
Drive Emotions, Journal of Visual Languages and Sentient Systems, pp. 48–54.
Ghayoumi, M., et al., (2016b). Follower Robot with an Optimized Gesture
Recognition System, Robotics: Science and Systems.
Ghayoumi, M., et al., (2016c). The architecture of Emotion in Robots Using
Convolutional Neural Networks, Robotics: Science and Systems.
Ghayoumi, M., et al., (2016d). Emotion in Robots Using Convolutional Neural
Networks, Eighth International Conference on Social Robotics.
Ghayoumi, M., et al., (2016e). Multimodal Convolutional Neural Networks Model
for Emotion in Robots, FTC.
Ghayoumi, M., et al., (2017). Facial Expression Analysis Using Deep Learning
with Partial Integration to Other Modalities to Detect Emotion. Ph.D.,
Dissertation.
Ghayoumi, M., et al., (2018a). Local Sensitive Hashing (LSH) and CNN for Object
Recognition, ICMLA.
Ghayoumi, M., et al., (2018b). Cognitive-based Architecture for Emotion in Social
Robots, HRI.
Ghayoumi, M., et al., (2019). Fuzzy Knowledge-Based Architecture for Learning
and Interaction in Social Robots, Ai-HRI.
Goodfellow, I., Bengio, Y., and Courville, A., (2016). Deep Learning, MIT Press.
Goodfellow, I., and Fridman, L., (2019). Generative Adversarial Networks, Artificial
Intelligence Podcast.
Goodfellow, I. J., (2014). On distinguishability criteria for estimating generative
models, In International Conference on Learning Representations, Workshops
Track.
Graves, A., (2014). Generating Sequences with Recurrent Neural Networks,
arXiv:1308.0850v5.
Graves, A., and Jaitly, N., (2014). Towards End-to-End Speech Recognition with
Recurrent Neural Networks, ICML.
Hinton, G. E., (2007). How to Do Backpropagation in a Brain, Invited talk at the
NIPS’2007 Deep Learning Workshop.
Hochreiter, S., and Schmidhuber, J., (1997). Long Short-Term Memory, Neural
Computation, 9(8).
Hoon Ahn, B., (2020). Chameleon: Adaptive Code Optimization for Expedited
Deep Neural Network Compilation, ICLR.
Bibliography ◾ 191
Hornik, K., Stinchcombe, M., and White, H., (1989). Multilayer Feedforward
Networks are Universal Approximators, Neural Networks, 2.
Huszar, F., (2015). How (not) to train your generative model: schedule sampling,
likelihood, adversary? arXiv:1511.05101.
Ioffe, S., and Szegedy, C., (2015). Batch Normalization: Accelerating In-depth
Network Training by Reducing Internal Covariate Shift, ICML.
Jarrett, K., Kavukcuoglu, K., Ranzato, M., and LeCun, Y., (2009). What is the Best
Multi-Stage Architecture for Object Recognition? ICCV’09.
Jastrzebski, S., (2020). The Break-Even Point on Optimization Trajectories of Deep
Neural Networks, International Conference on Learning Algorithms (ICLR).
Kaiming, H., et al., (2016). Deep Residual Learning for Image Recognition, 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
Karpathy, A., and Fei-Fei, L., (2015). Deep Visual-Semantic Alignments for
Generating Image Descriptions, arXiv:1412.2306v2.
Karpathy, A., Johnson, J., and Fei-Fei, L., (2015). Visualizing and understanding
recurrent networks, arXiv preprint arXiv:1506.02078.
Kennedy, J., and Eberhart, R. C., (1995). Particle Swarm Optimization, Proceedings of
the IEEE Conference on Neural Networks IV, IEEE Service Center, New York.
Krizhevsky, A., (2010). Convolutional deep belief networks on CIFAR-10.
Technical report, University of Toronto, Unpublished Manuscript: http://
www.cs.utoronto.ca/kriz/conv-cifar10-aug2010.pdf.
Krizhevsky, A., Sutskever, I., and Hinton, E., (2012). ImageNet Classification
with Deep Convolutional Neural Networks, NIPS 12 Proceedings of the 25th
International Conference on Neural Information Processing Systems.
LeCun, Y., (1989). Generalization and network design strategies, Technical Report
CRG-TR-89-4, University of Toronto.
LeCun, Y., Bengio, Y., and Hinton, G., (2015). Deep Learning, Nature, Vol.
521(7553). Nature Publishing Group.
LeCun, Y., Kavukcuoglu, K., and Farabet, C., (2010). Convolutional networks and
applications in vision, In Circuits and Systems (ISCAS), Proceedings of 2010
IEEE International Symposium on, pp. 253–256. IEEE.
Leung, K. S., Jin, H. D., and Xu, Z. B., (2004). An Expanding Self Organizing Neural
Network for the Traveling Salesman Problem, Neurocomputing, Vol. 62.
Maclaurin, D., Duvenaud, D., and Adams, R. P., (2015). Gradient-based hyperpa-
rameter optimization through reversible learning, arXiv preprint arXiv:1502.
03492.
Martín, A., Paul B., Jianmin C., Zhifeng C., Andy D., et al., (2016). TensorFlow:
A System for Large-Scale Machine Learning, 12th USENIX Symposium on
Operating Systems Design and Implementation.
Merity, S., (2019). Single Headed Attention RNN: Stop Thinking with Your Head,
https://arxiv.org/pdf/1911.11423.pdf.
Mingxing, T., and Le, Q. V., (2019). EfficientNet: Rethinking Model Scaling for
Convolutional Neural Networks, https://arxiv.org/pdf/1905.11946.pdf.
Mitchell, T. M., (1997). Machine Learning, McGraw-Hill, New York.
Mnih, V., et al., (2015). Human-Level Control through Deep Reinforcement
Learning, Nature, Vol. 518.
192 ◾ Bibliography
Mordvintsev, A., Olah, C., and Tyka, M., (2015). Inceptionism: Going deeper into
neural networks, Google Research Blog.
Murphy, K. P., (2012). Machine Learning: A Probabilistic Perspective, MIT Press.
Nakkiran, P., (2019). Deep Double Descent: Where Bigger Models and More Data
Hurts, https://arxiv.org/pdf/1912.02292.pdf.
Nesterov, Y., (2004). Introductory Lectures on Convex Optimization: A Basic Course,
Applied Optimization, Kluwer Academic Publication.
Olshausen, B. A., Anderson, C. H., and Van Essen, D. C., (1993). A neurobio-
logical model of visual attention and invariant pattern recognition based on
dynamic routing of information, The Journal of Neuroscience, 13(11).
Pascanu, R., Mikolov, T., and Bengio, Y., (2013). On the difficulty of training recur-
rent neural networks, ICML’2013.
Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., and Fried, I., (2005). Invariant
visual representation by single neurons in the human brain, Nature,
435(7045).
Radford, A., Jozefowicz, R., and Sutskever, I., (2017). Learning to generate reviews
and discovering sentiment, arXiv preprint arXiv:1704.01444.
Rockafellar, R. T., (1997). Convex Analysis, Princeton landmarks in mathematics.
Russel, S. J., and Norvig, P., (2003). Artificial Intelligence: A Modern Approach,
Prentice-Hall.
Schmidhuber, J., (2015). Deep learning in neural networks, https://arxiv.org/
pdf/1404.7828.pdf.
Simonyan, K., Vedaldi, A., and Zisserman, A., (2013). Deep inside convolutional
networks: Visualising image classification models and saliency maps, arXiv
preprint arXiv:1312.6034.
Simonyan, K., and Zisserman, A., (2015). Very Deep Convolutional Networks for
Large-Scale Image Recognition, ICLR.
Srivastava, N., (2013). Improving Neural Networks with Dropout, Master’s thesis,
University of Toronto.
Sutskever, I., Martens, J., and Hinton, G., (2011). Generating Text with Recurrent
Neural Networks, Proceedings of the 28th International Conference on Machine
Learning.
Sutskever, I., Vinyals, O., and Le, Q. V., (2014). Sequence to Sequence Learning
with Neural Networks, arXiv.org > cs > arXiv:1409.3215.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D.,
Vanhoucke, V., Rabinovich, A., et al., (2015). Going deeper with convolu-
tions, doi:10.1109/cvpr.2015.7298594.
Taigman, Y. et al., (2014). DeepFace: Closing the Gap to Human-level Performance
in Face Verification, CVPR.
Toshev, A., and Szegedy, C., (2014). DeepPose: Human Pose Estimation Via Deep
Neural Networks, CVPR.
Uria, B., Murray, I., and Larochelle, H., (2014). A Deep and Tractable Density
Estimator, ICML’2014.
Volodymyr, M., Koray, K., David, S., Andrei, A. R., Joel, V. et al., (2015). Human-
level Control through Deep Reinforcement Learning, Macmillan Publishers.
Bibliography ◾ 193
Weston, J., Chopra, S., and Bordes, A., (2014). Memory networks, arXiv
preprintarXiv:1410.3916.
Xu, K., Ba, J. L., Kiros, R., Cho, K., Courville, A., Salakhutdinov, R., Zemel, R. S.,
and Bengie, Y., (2015). Show, Attend and Tell Neural Image Caption Generation
with Visual Attention, ICML’2015, arXiv:1502.03044.
Yan, H., (2020). On Robustness of Neural Ordinary Differential Equations, ICLR.
Ye, Ch., (2020). Network Deconvolution, ICLR.
Yosinski, J., Clune, J., Bengio, Y., and Lipson, H., (2014). How Transferable are
Features in Deep Neural Networks?, NIPS’2014.
Zeiler, M., and Fergus, R., (2013). Visualizing and Understanding Convolutional
Networks, European Conference on Computer Vision, pp. 818–833. https://
arxiv.org/pdf/1311.2901.pdf
WEBSITES
https://colah.github.io/
https://cs231n.github.io/neural-networks-3/
https://github.com/meln-ds/Facial-Image-Generation-GAN
https://keras.io/api/utils/backend_utils/
https://probml.github.io/pml-book
https://towardsdatascience.com/comparative-performance-of-deep-
learning-optimization-algorithms-using-numpy-24ce25c2f5e2
https://web.stanford.edu/~hastie/ElemStatLearn
https://www.bdhammel.com/learning-rates/
https://www.cs.cornell.edu/courses/cs4780/2018fa
https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/
https://www.python.org/
https://www.tensorflow.org/guide/
https://www.tensorflow.org/guide/
https://www.w3schools.com/python/python_intro.asp
Index
A B
accuracy metrics, 54–55 backpropagation, 72
activation function, 51 batch normalization, 89
ReLu, 59 bias, 180–182
leaky ReLu, 60 bias/variance, 184–185
sigmoid (sig), 58 biological neuron, 56–57
tanh/hyperbolic tantgent (tan), 59
Adagrad, 64
C
Adam, 64
AlexNet (2012), 119 categories, creating, 125
GoogleNet/Inception-v1 (2014), 120 comparison operators, 7
architecture confusion matrix, 53–54
ANNs, 51–75 convex, 63
DNNs, 77–107 convolutional layer, 115–116
array, 18 convolutional neural networks (CNNs), 84
data check, 21–22 architecture, 118–124
elements, 19–20 building, 39–42
filter, 25 designing, 90–93
iterating, 22–23 general strategies, 125–126
joining, 23–24 image analysis, 110–124
random numbers, 25–26 image classification, 130–133
shape/reshape, 22 layers, 87–90
slicing, 20–21 object recognition, 126–130, 146–151
sorting, 24–25 convolution layer, 87
splitting, 24 cross-entropy loss, 61
vectorization, 26–27
artificial neural networks (ANNs), 2
D
activation functions, 58–60
architectures, 68–75 data
linear/nonlinear functions, 65–67 augmentation, 79
loss function, 61–62 importing, 35–36
neurons, 56–57 loading, 125, 127, 130, 133, 147
optimization functions, 62–65 loading/normalizing, 36
artificial neuron, 57 preprocessing, 99–100, 175–176
195
196 ◾ Index
dataset, 39, 55, 94, 98, 127, 133, 147, 156, spoof detection, 160
162, 172, 183 testing, 161
generating, 125, 127, 130, 133, 147 training, 161
decision tree (DT), 1 false negative (FN), 54
deep learning (DL), 1–2, 77–80 false positive (FP), 54
algorithms, 82–87 feature extraction, 165, 170
applications, 80–82 feed forward neural networks (FFNNs),
deployment, 78 68–72
frameworks, 30 filter parameters
image analysis, 109 number/types, 113–114
deep neural networks (DNNs), 77–80 size, 114
convolutional neural networks (CNNs), stride/padding size, 114–115
84–93 finding problems, 185–187
deep learning, 77–80 fine tuning, 79
deep learning applications, 80–82 fully connected layer (FC), 90, 117–118
deep learning architectures, 82–87 function, 17
generative adversarial networks activation, 58–60
(GANs), 96–107 linear/nonlinear, 65–67
image analysis, 109 loss, 61–62
recurrent neural networks (RNNs), optimization, 62–65
93–95
virtual assistant robots, 153–173
G
dense layer, 40–41
dictionary, 11–12 generalization, 79
discriminator function, 148–150 generative adversarial networks (GANs),
dropout, 80, 88–89 85–86
defining, 96–97
fashion dataset example, 97–98
E
generator function, 147–148
embedding, 79 gradient descent (GD), 63–64
emotion recognition, 162–169
data preprocessing, 164
I
dataset collection, 162–164
feature extraction, 165–168 image analysis
model evaluation, 168–169 and CNNs, 110–124
testing model, 169 deep learning, 109
end-to-end learning, 80 image classification, 130–133
environment, activating, 42 object recognition, 126–130, 146–151
error, 51 segmentation, 133–145
evaluation, 176–177 imagenet large scale visual recognition
challenge (ILSVRC), 118
images, generating, 151
F
import libraries, 44–46, 70, 73, 94, 125,
facial detection/recognition, 154–161 127, 130, 133, 146, 155
architecture, 154 Inception-v3 (2015), 121–122
face detection, 155–158 Inception-v4 (2016), 124
face encoding, 161 input layer, 39–40, 115
landmark detection, 158–160 inputs, 51
Index ◾ 197
K array, 18–27
data types, 21
Keras
determinant, 28
implementing example, 46–49
dot product, 28
import libraries/modules, 44–45
hstack/vstack, 27
modules, 44
mathematical functions, 28
setup/installation, 42–46
ndarray, 18–19
L
O
L1/L2, 79–80
object-oriented programming (OOP), 18
learning
one hot encoding, 55–56
deep learning, 2
operators, 6–7
defining, 1–4
outputs, 51
machine learning, 1
overfitting, 52
rate, 62–63
LeNet-5 (1998), 118–119
list, 11 P
long short-term memory (LSTM), 85
parameters
loops, 9–10
filter, 113–115
number, 115–118
M pooling layer, 88, 116
precedence operators, 7
machine learning, 1
python
MNIST (dataset), 47–49
data types in, 5
models
dictionary, 11–12
building, 36–37
file, 17
making, 126, 128, 131, 135, 147
for loop, 9
testing, 41, 49, 71, 75, 126, 129, 133, 145
function, 17
training, 48, 71, 75, 95, 105, 126, 129,
keywords, 6
133, 144, 159
libraries 43
training/evaluating, 37–38, 41
list, 11
training/predicting, 45–46
numpy, 18–28
model selection
object, 18
bias, 180–182
operators, 6–7
bias/variance, 184–185
sequence, 8–9
data preprocessing, 175–176
statements/expressions, 7–8
evaluation, 176–177
string, 10–11
finding problems, 185–187
tuple, 13–16
noise, 177–180
variables, 5–6
variance, 182–184
while loop, 10
multi-layer perceptron (MLP), 73–75
R
N
random numbers
noise, 177–180
generating, 27
nonlinear functions, 67
module, 25–26
numpy, 18–28
rectified linear unit (ReLu), 59–60
198 ◾ Index
recurrent neural networks (RNNs), 84–85 building neural network using, 35–38
architecture, 93 FFN example, 70–72
designing, 94–96 MLP example, 73–75
long short-term memory (LSTM), using, 34–35
93–94 testing data, 52
regularization, 79 training data, 52
residual neural network learning true negative (TN), 54
(ResNets), 86 true positive (TP), 54
ResNet (2015), 122–123 tuple, 13–16
Rosenblatt, Frank, 72
U
S
underfitting, 52
segmentation, 133–145
map 134
V
self, 18
sentiment analysis, 171–173 validation data, 52
sequence, 8–9 vanishing gradient, 78
single-layer perceptron, 72–73 variance, 182–184
speech to text, 169–171 VGGNet-16 (2014), 121
stochastic gradient descent (SGD), 64 virtual assistant robots, 153–154
string, 10–11 emotion recognition, 162–169
support vector machine (SVM), 1, 56 facial detection/recognition, 154–161
sentiment analysis, 171–173
speech to text, 169–171
T
virtual environment, creating, 42
targets, 51
TensorFlow, 29
W
base of, 31–34
building CNN using, 39–42 weights, 51