Deep Learning - DL-2

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 44

Deep Learning

Dr. Adven
DL
ML vs DL
So, What do you
think, what is Deep
Learning
Deep learning is a machine
learning technique that learns
features and task directly from
the data, where data may by
images, text or sound!
So, What we learn in this course
• Artificial Neural Network
• Convolutional neural network
• Recurrent neural network Keras
• Boltzmann machine Vs Deep BM
• Self organizing maps
Vs
• Autoencoder TensorFlow
• GAN (Generative Adversarial Network)
• Deep Q Learning
• Pre Train Model (CNN Architecture and
many more…
So, What we learn in this course
Supervised Unsupervised Reinforcement Learning

Artificial Neural Network


Regression and Classification
What is Neural
Neural Working
Artificial
Normalize/Standardize
Neural Network
𝒎

∑ 𝒘𝒊 𝒙𝒊
𝒊= 𝒙

Applying Activation
Function
What is an Activation Function?
Activation functions are an extremely important feature of the artificial neural networks.
They basically decide whether a neuron should be activated or not. Whether the
information that the neuron is receiving is relevant for the given information or should it
be ignored.

The activation function is the non linear transformation that we do over the input
signal. This transformed output is then seen to the next layer of neurons as input.

• Linear Activation Function


• Non Linear Activation Function
What is an Activation Function?
Linear Function
The function is a line or linear. Therefore, the output of
the functions will not be confined between any range

Non Linear Function


1.Threshold
They make it easy for the model to generalize or adapt 2.Sigmoid
with variety of data and to differentiate between the
output
3.Tanh
4.ReLU
The Nonlinear Activation Functions are mainly divided on 5.Leaky ReLU
the basis of their range or curves
6.Softmax
Threshold Function?
Sigmoid Function?
The Sigmoid Function curve looks like a S-shape
This function reduces extreme values or outliers in data without removing them.
It converts independent variables of near infinite range into simple probabilities
between 0 and 1, and most of its output will be very close to 0 or 1.
Rectifier (Relu) Function?
ReLU is the most widely used activation function while designing networks today.
First things first, the ReLU function is non linear, which means we can easily
backpropagate the errors and have multiple layers of neurons being activated by the
ReLU function.
Leaky Relu Function?
Leaky ReLU function is nothing but an improved version of the ReLU function. As we saw that
for the ReLU function, the gradient is 0 for x<0, which made the neurons die for activations in
that region. Leaky ReLU is defined to address this problem. Instead of defining the Relu
function as 0 for x less than 0, we define it as a small linear component of x.

What we have done here is that we have simply replaced the horizontal line with a non-zero, non-horizontal line.
Here a is a small value like 0.01 or so.
Hyperbolic tangent (tanh)?
Pronounced “tanch,” tanh is a hyperbolic trigonometric function
The tangent represents a ratio between the opposite and adjacent sides of a right triangle,
tanh represents the ratio of the hyperbolic sine to the hyperbolic cosine: tanh(x) = sinh(x) /
cosh(x)
Unlike the Sigmoid function, the normalized range of tanh is –1 to 1 The advantage of tanh is
that it can deal more easily with negative numbers
Softmax Function (for Multiple
Classification)?
Softmax function calculates the probabilities distribution of the event over ‘n’ different events. In general way of
saying, this function will calculate the probabilities of each target class over all possible target classes. Later the
calculated probabilities will be helpful for determining the target class for the given inputs.

The main advantage of using Softmax is the output probabilities range. The range will 0 to 1, and the sum of all
the probabilities will be equal to one. If the softmax function used for multi-classification model it returns the
probabilities of each class and the target class will have the high probability.

The formula computes the exponential (e-power) of the given input value and the sum of exponential values of
all the values in the inputs. Then the ratio of the exponential of the input value and the sum of exponential
values is the output of the softmax function.
Activation Function Example
How Neural Network Work
and
Back Propagation in deep learning
How Neural Network Work with many neurons
Back Propagation in deep learning
Back-propagation is the essence of neural net training. It is the method of fine-
tuning the weights of a neural net based on the error rate obtained in the
previous epoch (i.e., iteration). Proper tuning of the weights allows you to reduce
error rates and to make the model reliable by increasing its generalization.

Backpropagation is a short form for "backward propagation of errors." It is a


standard method of training artificial neural networks. This method helps to
calculate the gradient of a loss function with respects to all the weights in the
network.
Back Propagation in deep learning
Back Propagation in deep learning
Back Propagation in deep learning (epoch)
For further assistance

Visit Stack Exchange


https://stats.stackexchange.com/questions/154879/a-list-of-
cost-functions-used-in-neural-networks-alongside-applicatio
ns
Back Propagation in deep learning

What is Bias
Back Propagation (reduce Cost
function)
(Batch Gradient Descent,
Stochastic Gradient Descent, Mini-
batch Gradient Descent)
What is Bias
Bias is just like an intercept added in a linear equation. It is an additional
parameter in the Neural Network which is used to adjust the output along with
the weighted sum of the inputs to the neuron. Moreover, bias value allows you to
shift the activation function to either right or left.
output  =  sum (weights * inputs) + bias 
The output is calculated by multiplying the inputs with their weights and then
passing it through an activation function like the Sigmoid function, etc. Here, bias
acts like a constant which helps the model to fit the given data. The steepness of
the Sigmoid depends on the weight of the inputs.
A simpler way to understand bias is through a constant c of a linear function
y =mx + c
What is Bias

W*x+b
What is Gradient Descent (BGD)
Gradient Descent is an optimization technique that is used to improve deep learning
and neural network-based models by minimizing the cost function.
Gradient Descent is a process that occurs in the backpropagation phase where the
goal is to continuously resample the gradient of the model’s parameter in the
opposite direction based on the weight w, updating consistently until we reach
the global minimum of function J(w).

More Precisely,
Gradient descent is an algorithm, which is used to iterate through
different combinations of weights in an optimal way.....to find the best
combination of weights which has a minimum error.
Brute force algorithm

Curse of dimensionality

Brute Force Algorithms refers to a programming style that does not include any shortcuts to
improve performance, but instead relies on sheer computing power to try all possibilities until the
solution to a problem is found. A classic example is the traveling salesman problem (TSP).
What is Gradient Descent
What is Gradient Descent
Useful link

https://towardsdatascience.com/understanding-the-mathematics-b
ehind-gradient-descent-dde5dc9be06e
Stochastic gradient descent
The word ‘stochastic‘ means a system or a process that is linked with a random
probability. Hence, in Stochastic Gradient Descent, a few samples are selected
randomly instead of the whole data set for each iteration. In Gradient Descent,
there is a term called “batch” which denotes the total number of samples from a
dataset that is used for calculating the gradient for each iteration. In typical
Gradient Descent optimization, like Batch Gradient Descent, the batch is taken to
be the whole dataset. Although, using the whole dataset is really useful for
getting to the minima in a less noisy or less random manner, but the problem
arises when our datasets get really huge.
Stochastic gradient descent
Stochastic gradient descent (often abbreviated SGD) is an iterative method for
optimizing an objective function with suitable smoothness properties (e.g. differentiable
or subdifferentiable). ~Convex Loss function~
Stochastic gradient descent
Stochastic gradient descent
Mini Batch gradient descent
Mini-batch gradient descent is a variation of the gradient descent algorithm
that splits the training dataset into small batches that are used to calculate
model error and update model coefficients.
Implementations may choose to sum the gradient over the mini-batch which
further reduces the variance of the gradient.

Mini-batch gradient descent seeks to find a balance between the robustness


of stochastic gradient descent and the efficiency of batch gradient descent. It
is the most common implementation of gradient descent used in the field of
deep learning.
Mini Batch gradient descent

BGD SGD MBGD


Different types of Neural Network
• Perceptron (Multilayer Perceptron) & ANN
• Feedforward Neural Network – Artificial Neuron
• Convolutional Neural Network
• Recurrent Neural Network(RNN) –
Long Short Term Memory
ANN
A perceptron is a network with two layers, one input and one output. ... Artificial neural network, which has
input layer, output layer, and two or more trainable weight layers (constisting of Perceptrons) is called
multilayer perceptron or MLP
Feedforward Neural Network
It is one of the simplest types of artificial neural networks. In a feedforward neural network,
the data passes through different input nodes until it reaches the output node. In other words,
the data moves in only one direction from the first range until it reaches the output node. It is
also known as a front propagating wave which is usually obtained using a graded activation
function. Unlike more complex types of neural networks, backpropagation and data move in only
one direction. A feedforward neural network may consist of a single layer or may contain hidden
layers. In a feedful neural network, the products of the inputs and their weights are calculated.
This is then fed to the output.

whereas
Backpropagation is a training algorithm
consisting of 2 steps: 
•Feedforward the values.
•Calculate the error and propagate it back to
the earlier layers. 
Convolutional Neural Network
Convolutional Neural Networks (CNN) is one of the variants of neural networks used heavily in
the field of Computer Vision. It derives its name from the type of hidden layers it consists of.
The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully
connected layers, and normalization layers. Here it simply means that instead of using the
normal activation functions defined above, convolution and pooling functions are used as
activation functions.
Recurrent Neural Network
Recurrent Neural Network(RNN) are a type of Neural Network where the output from
previous step are fed as input to the current step. In traditional neural networks, all the inputs
and outputs are independent of each other, but in cases like when it is required to predict the
next word of a sentence, the previous words are required and hence there is a need to
remember the previous words. Thus RNN came into existence, which solved this issue with the
help of a Hidden Layer. The main and most important feature of RNN is Hidden state, which
remembers some information about a sequence.

You might also like