Basics of TensorFlow
Basics of TensorFlow
Basics of TensorFlow
This chapter covers the basics of TensorFlow, the deep learning
framework. Deep learning does a wonderful job in pattern recognition,
especially in the context of images, sound, speech, language, and time-
series data. With the help of deep learning, you can classify, predict,
cluster, and extract features. Fortunately, in November 2015, Google
released TensorFlow, which has been used in most of Google’s products
such as Google Search, spam detection, speech recognition, Google
Assistant, Google Now, and Google Photos. Explaining the basic
components of TensorFlow is the aim of this chapter.
TensorFlow has a unique ability to perform partial subgraph
computation so as to allow distributed training with the help of
partitioning the neural networks. In other words, TensorFlow allows model
parallelism and data parallelism. TensorFlow provides multiple APIs.
The lowest level API—TensorFlow Core—provides you with complete
programming control.
Note the following important points regarding TensorFlow:
Tensors
Before you jump into the TensorFlow library, let’s get comfortable with
the basic unit of data in TensorFlow. A tensor is a mathematical object
and a generalization of scalars, vectors, and matrices. A tensor can be
represented as a multidimensional array. A tensor of zero rank (order) is
nothing but a scalar. A vector/array is a tensor of rank 1, whereas a
2
Chapter 1 BasiCs of tensorflow
3
Chapter 1 BasiCs of tensorflow
So, the structure of TensorFlow programs has two phases, shown here:
4
Chapter 1 BasiCs of tensorflow
To actually evaluate the nodes, you must run the computational graph
within a session.
A session encapsulates the control and state of the TensorFlow runtime.
The following code creates a Session object:
sess = tf.Session()
5
Chapter 1 BasiCs of tensorflow
6
Chapter 1 BasiCs of tensorflow
Generally, you have to deal with many images in deep learning, so you
have to place pixel values for each image and keep iterating over all images.
To train the model, you need to be able to modify the graph to tune
some objects such as weight and bias. In short, variables enable you to
add trainable parameters to a graph. They are constructed with a type and
initial value.
Let’s create a constant in TensorFlow and print it.
The first two steps belong to the construction phase, and the last two
steps belong to the execution phase. I will discuss the construction and
execution phases of TensorFlow now.
You can rewrite the previous code in another way, as shown here:
7
Chapter 1 BasiCs of tensorflow
Now you will explore how you create a variable and initialize it. Here is
the code that does it:
8
Chapter 1 BasiCs of tensorflow
Placeholders
A placeholder is a variable that you can feed something to at a later time. It
is meant to accept external inputs. Placeholders can have one or multiple
dimensions, meant for storing n-dimensional arrays.
9
Chapter 1 BasiCs of tensorflow
You can also consider a 2D array in place of the 1D array. Here is the
code:
This is a 2×4 matrix. So, if you replace None with 2, you can see the
same output.
But if you create a placeholder of [3, 4] shape (note that you will feed
a 2×4 matrix at a later time), there is an error, as shown here:
10
Chapter 1 BasiCs of tensorflow
Constants are initialized when you call tf.constant, and their values
can never change. By contrast, variables are not initialized when you call
tf.Variable. To initialize all the variables in a TensorFlow program, you
must explicitly call a special operation as follows.
11
Chapter 1 BasiCs of tensorflow
Creating Tensors
An image is a tensor of the third order where the dimensions belong to
height, width, and number of channels (Red, Blue, and Green).
Here you can see how an image is converted into a tensor:
12
Chapter 1 BasiCs of tensorflow
Fixed Tensors
Here is a fixed tensor:
13
Chapter 1 BasiCs of tensorflow
Sequence Tensors
tf.range creates a sequence of numbers starting from the specified value
and having a specified increment.
14
Chapter 1 BasiCs of tensorflow
Random Tensors
tf.random_uniform generates random values from uniform distribution
within a range.
15
Chapter 1 BasiCs of tensorflow
If you are not able to find the result, please revise the previous portion
where I discuss the creation of tensors.
Here you can see the result:
Working on Matrices
Once you are comfortable creating tensors, you can enjoy working on
matrices (2D tensors).
16
Chapter 1 BasiCs of tensorflow
Activation Functions
The idea of an activation function comes from the analysis of how a
neuron works in the human brain (see Figure 1-1). The neuron becomes
active beyond a certain threshold, better known as the activation potential.
It also attempts to put the output into a small range in most cases.
Sigmoid, hyperbolic tangent (tanh), ReLU, and ELU are most popular
activation functions.
Let’s look at the popular activation functions.
17
Chapter 1 BasiCs of tensorflow
18
Chapter 1 BasiCs of tensorflow
19
Chapter 1 BasiCs of tensorflow
ReLU6
ReLU6 is similar to ReLU except that the output cannot be more than six ever.
20
Chapter 1 BasiCs of tensorflow
21
Chapter 1 BasiCs of tensorflow
Loss Functions
The loss function (cost function) is to be minimized so as to get the best
values for each parameter of the model. For example, you need to get the
best value of the weight (slope) and bias (y-intercept) so as to explain the
target (y) in terms of the predictor (X). The method is to achieve the best
value of the slope, and y-intercept is to minimize the cost function/loss
function/sum of squares. For any model, there are numerous parameters,
and the model structure in prediction or classification is expressed in
terms of the values of the parameters.
You need to evaluate your model, and for that you need to define the
cost function (loss function). The minimization of the loss function can
be the driving force for finding the optimum value of each parameter. For
22
Chapter 1 BasiCs of tensorflow
tf.contrib.losses.absolute_difference
tf.contrib.losses.add_loss
23
Chapter 1 BasiCs of tensorflow
tf.contrib.losses.hinge_loss
tf.contrib.losses.compute_weighted_loss
tf.contrib.losses.cosine_distance
tf.contrib.losses.get_losses
tf.contrib.losses.get_regularization_losses
tf.contrib.losses.get_total_loss
tf.contrib.losses.log_loss
tf.contrib.losses.mean_pairwise_squared_error
tf.contrib.losses.mean_squared_error
tf.contrib.losses.sigmoid_cross_entropy
tf.contrib.losses.softmax_cross_entropy
tf.contrib.losses.sparse_softmax_cross_entropy
tf.contrib.losses.log(predictions,labels,weight=2.0)
24
Chapter 1 BasiCs of tensorflow
Optimizers
Now you should be convinced that you need to use a loss function to
get the best value of each parameter of the model. How can you get the
best value?
Initially you assume the initial values of weight and bias for the model
(linear regression, etc.). Now you need to find the way to reach to the
best value of the parameters. The optimizer is the way to reach the best
value of the parameters. In each iteration, the value changes in a direction
suggested by the optimizer. Suppose you have 16 weight values (w1, w2,
w3, …, w16) and 4 biases (b1, b2, b3, b4). Initially you can assume every
weight and bias to be zero (or one or any number). The optimizer suggests
whether w1 (and other parameters) should increase or decrease in the
next iteration while keeping the goal of minimization in mind. After many
iterations, w1 (and other parameters) would stabilize to the best value
(or values) of parameters.
In other words, TensorFlow, and every other deep learning framework,
provides optimizers that slowly change each parameter in order to
minimize the loss function. The purpose of the optimizers is to give
direction to the weight and bias for the change in the next iteration.
Assume that you have 64 weights and 16 biases; you try to change the
weight and bias values in each iteration (during backpropagation) so that
you get the correct values of weights and biases after many iterations while
trying to minimize the loss function.
Selecting the best optimizer for the model to converge fast and to learn
weights and biases properly is a tricky task.
Adaptive techniques (adadelta, adagrad, etc.) are good optimizers
for converging faster for complex neural networks. Adam is supposedly
the best optimizer for most cases. It also outperforms other adaptive
techniques (adadelta, adagrad, etc.), but it is computationally costly. For
sparse data sets, methods such as SGD, NAG, and momentum are not the
best options; the adaptive learning rate methods are. An additional benefit
25
Chapter 1 BasiCs of tensorflow
is that you won’t need to adjust the learning rate but can likely achieve the
best results with the default value.
26
Chapter 1 BasiCs of tensorflow
Common Optimizers
The following is a list of common optimizers:
27
Chapter 1 BasiCs of tensorflow
Metrics
Having learned some ways to build a model, it is time to evaluate the
model. So, you need to evaluate the regressor or classifier.
There are many evaluation metrics, among which classification
accuracy, logarithmic loss, and area under ROC curve are the most popular
ones.
Classification accuracy is the ratio of the number of correct predictions
to the number of all predictions. When observations for each class are not
much skewed, accuracy can be considered as a good metric.
tf.contrib.metrics.accuracy(actual_labels, predictions)
Metrics Examples
This section shows the code to demonstrate.
Here you create actual values (calling them x) and predicted values
(calling them y). Then you check the accuracy. Accuracy represents the
ratio of the number of times the actual equals the predicted values and
total number of instances.
28
Chapter 1 BasiCs of tensorflow
Common Metrics
The following is a list of common metrics:
29
Chapter 1 BasiCs of tensorflow
30