Ann MJJ-1
Ann MJJ-1
Ann MJJ-1
Prepared by MJJ
Introduction
Neural Unit
ANNs
⋮
xn
wn
Σ 𝜑=¿
𝑦=∑ 𝑤 𝑖 𝑥 𝑖
Conti..
where D is the set of training examples, td is the target output for training example
d, and od is the output of the linear unit for training example d.
DERIVATION OF THE GRADIENT
DESCENT RULE
This vector derivative is called the gradient of E with respect to , written
Step 1: Initialize the weights(a & b) with random values and calculate
Error (SSE)
Step 2: Calculate the gradient i.e. change in SSE when the weights (a & b)
are changed by a very small value from their original randomly initialized
value. This helps us move the values of a & b in the direction in which SSE
is minimized.
Step 3: Adjust the weights with the gradients to reach the optimal values
where SSE is minimized
Step 4: Use the new weights for prediction and to calculate the new SSE
Step 5: Repeat steps 2 and 3 till further adjustments to weights doesn’t
significantly reduce the Error
STOCHASTIC APPROXIMATION TO
GRADIENT DESCENT
Conti..
It takes the inputs, multiplied by the weights for each neuron, and
creates an output signal proportional to the input. In one sense, a linear
function is better than a step function because it allows multiple outputs,
not just yes and no.
However, a linear activation function has two major problems:
Not possible to use backpropagation (gradient descent) to train the model—the derivative of the
function is a constant, and has no relation to the input, X. So it’s not possible to go back and
understand which weights in the input neurons can provide a better prediction.
All layers of the neural network collapse into one—with linear activation functions, no matter
how many layers in the neural network, the last layer will be a linear function of the first layer
(because a linear combination of linear functions is still a linear function). So a linear activation
function turns the neural network into just one layer.
A neural network with a linear activation function is simply a linear regression model. It
has limited power and ability to handle complexity varying parameters of input data.
Sigmoid
Tanh
Relu
Choosing the right Activation Function