0% found this document useful (0 votes)
43 views14 pages

Presentation 1

The softmax function is used for classification problems to output class probabilities. It takes the outputs for each class and squeezes them between 0 and 1, then divides each by the sum of all outputs to give the probability of an input belonging to each class. Backpropagation is used to minimize the loss function and optimize weights by propagating the error backwards through the network. It calculates the partial derivative of the error with respect to each weight to determine how much changing that weight would affect the total error, then updates the weights to reduce error. Repeating this process iteratively trains the network to better map inputs to the correct outputs.

Uploaded by

Megha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views14 pages

Presentation 1

The softmax function is used for classification problems to output class probabilities. It takes the outputs for each class and squeezes them between 0 and 1, then divides each by the sum of all outputs to give the probability of an input belonging to each class. Backpropagation is used to minimize the loss function and optimize weights by propagating the error backwards through the network. It calculates the partial derivative of the error with respect to each weight to determine how much changing that weight would affect the total error, then updates the weights to reduce error. Repeating this process iteratively trains the network to better map inputs to the correct outputs.

Uploaded by

Megha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Softmax

• The softmax function is also a type of sigmoid function but is useful when we are trying to handle classification problems.
• The sigmoid function is able to handle just two classes. The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
• This essentially gives the probability of the input being in a particular class. It can be defined as –

• Let’s say for example we have the outputs as-


• [1.2 , 0.9 , 0.75], When we apply the softmax function we would get [0.42 , 0.31, 0.27]. So now we can use these as
probabilities for the value to be in each class.
• The softmax function is ideally used in the output layer of the classifier where we are actually trying to attain the
probabilities to define the class of each input.
• And for regression we can use linear function at output layer.
Understand Back-Propagation
• The goal of back-propagation is to reduce the loss function/error of the model and  optimize the weights so that the neural
network can learn how to correctly map arbitrary inputs to outputs.
• The current error is typically propagated backwards to a previous layer, where it is used to modify the weights and bias in
such a way that the error is minimized.
• To understand back-propagation , first we need to understand the working of neural network. So, we take one e.g
• In order to have some numbers to work with, here are the initial weights, the biases, and training inputs/outputs:

• Here inputs, i1 and i2 are 0.05 and 0.10 respectively.

• Weights for w1 to w8 are 0.15,0.20,0.25,0.30,0.40,0.45,0.50,0.55 respectively.

• Bias for b1 and b2 are 0.35 and 0.60.

• And expected output value of the model o1 and o2 is 0.01 and 0.99.
The Forward Pass

• In this stage neural network try to predict the expected output value with the initial input values i1,i2, and weights and
biases given above.

• For this we take total input values, respective weight, bias and apply the activation function on it.

• Here we use Sigmoid function .

• This how we calculate the total input value for first hidden layer neuron h1 is,
net_h1 = w1 * i1 + w2 * i2 + b1 * 1
net_h1 = 0.15 * 0.05 + 0.2 * 0.1 + 0.35 * 1 = 0.3775

• Then we applied sigmoid function to get output of h1,

out_h1 = 1\1+e^(-net_h1) = 1\1+e^(-0.3775) = 0.593269992

• We perform the same process for h2 also,


Outh2 = 0.596884378
• We repeat the same process for the output layer also, so get o1,

net_o1 = w5 * out_h1 + w6 * out_h2 + b2 * 1

net_o1 = 0.4 * 0.593269992 + 0.45 * 0.596884378 + 0.6 * 1 = 1.105905967

out_o1 = 1\1+e^(-net_o1) = 1\1+e^(-1.105905967) = 0.75136507

• We also find the o2 by applying same process,

net_o2 = 0.772928465
Calculating the total error
• We calculate the error for each output neuron using the squared error function and sum them to get the total error:

E_total = ∑1/2(target - output)^2

• The target output for o_1 is 0.01 but the neural network output 0.75136507, therefore its error is:

E_o1 = 1\2(target_o1 - out_o1)^2 = 1\2(0.01 - 0.75136507)^2 = 0.274811083

• We repeat same process for finding error of o2,

E_o2 = 0.023560026

• So, the total error of neural network is the sum of both the errors,

• E_total = E_o1+E_o2= 0.274811083+0.023560026 = 0.298371109


The Backwards Pass
• Our goal with back propagation is to update each of the weights in the network so that they cause the actual
output to be closer the target output, and minimizing the error for each output neuron and the network as a
whole.

Output Layer

• Consider w5. We want to know how much a change in w5 affects the total error, aka

• This known as partial derivation of E_total with respect to w5.

• By applying chain rule we can perform this derivation is in this way,

• Visually, here’s what we’re doing:


• Now we need to find out the value of the each equation,
• First how much the total error change with respect to output,

• When we take the partial derivative of the total error with respect to out_o1, the quantity 1\2(target_o2 - out_o2)^2 becomes
zero because out_o1 does not affect it which means we’re taking the derivative of a constant which is zero.

• Next, how much does the output of o1 change with respect to its total net input. So, the partial derivative of the logistic
function is the output multiplied by 1 minus the output:

• Finally, how much does the total net input of o1 change with respect to w5
• Now we will arrange all values together,

• To decrease the error, we then subtract this value from the current weight (optionally multiplied by some learning rate, eta,
which we’ll set to 0.5):

• We repeat this process to get the new weights for w6, w7, and w8:
Hidden Layer
• We continue the backwards pass by calculating new values for w1, w2, w3, and w4.
• So, for that we need to calculate,

• Visually, we are doing


• We’re going to use a similar process as we did for the output layer, but slightly different to account for the fact that the
output of each hidden layer neuron contributes to the output (and therefore error) of multiple output neurons.
• We know that out_h1 affects both out_o1 and out_o2 therefore the needs to take into consideration its effect on
the both output neurons:

• Starting with,

• We can calculate using values we calculated earlier:

• And is equal to w5:


• Bring all the values together,

• Following the same process for we get:

• So,

• Now that we have , we need to figure out and then for each weight:
• We calculate the partial derivative of the total net input to h1 with respect to w1 the same as we did for the output neuron:

• Putting all together,

• We can now update w1:

• Repeating this for w2, w3, and w4


• Finally, we’ve updated all of our weights in the model.

• When we fed forward the 0.05 and 0.1 inputs originally, the error on the network was 0.298371109.

• After this first round of back propagation, the total error is now down to 0.291027924.

• It might not seem like much, but after repeating this process 10,000 times, for example, the error plummets to
0.0000351085.

• At that point, when we feed forward 0.05 and 0.1, the two outputs neurons generate 0.015912196 (vs 0.01 target) and
0.984065734 (vs 0.99 target).

You might also like