Presentation 1
Presentation 1
• The softmax function is also a type of sigmoid function but is useful when we are trying to handle classification problems.
• The sigmoid function is able to handle just two classes. The softmax function would squeeze the outputs for each class
between 0 and 1 and would also divide by the sum of the outputs.
• This essentially gives the probability of the input being in a particular class. It can be defined as –
• And expected output value of the model o1 and o2 is 0.01 and 0.99.
The Forward Pass
• In this stage neural network try to predict the expected output value with the initial input values i1,i2, and weights and
biases given above.
• For this we take total input values, respective weight, bias and apply the activation function on it.
• This how we calculate the total input value for first hidden layer neuron h1 is,
net_h1 = w1 * i1 + w2 * i2 + b1 * 1
net_h1 = 0.15 * 0.05 + 0.2 * 0.1 + 0.35 * 1 = 0.3775
net_o2 = 0.772928465
Calculating the total error
• We calculate the error for each output neuron using the squared error function and sum them to get the total error:
• The target output for o_1 is 0.01 but the neural network output 0.75136507, therefore its error is:
E_o2 = 0.023560026
• So, the total error of neural network is the sum of both the errors,
Output Layer
• Consider w5. We want to know how much a change in w5 affects the total error, aka
• When we take the partial derivative of the total error with respect to out_o1, the quantity 1\2(target_o2 - out_o2)^2 becomes
zero because out_o1 does not affect it which means we’re taking the derivative of a constant which is zero.
• Next, how much does the output of o1 change with respect to its total net input. So, the partial derivative of the logistic
function is the output multiplied by 1 minus the output:
• Finally, how much does the total net input of o1 change with respect to w5
• Now we will arrange all values together,
• To decrease the error, we then subtract this value from the current weight (optionally multiplied by some learning rate, eta,
which we’ll set to 0.5):
• We repeat this process to get the new weights for w6, w7, and w8:
Hidden Layer
• We continue the backwards pass by calculating new values for w1, w2, w3, and w4.
• So, for that we need to calculate,
• Starting with,
• So,
• Now that we have , we need to figure out and then for each weight:
• We calculate the partial derivative of the total net input to h1 with respect to w1 the same as we did for the output neuron:
• When we fed forward the 0.05 and 0.1 inputs originally, the error on the network was 0.298371109.
• After this first round of back propagation, the total error is now down to 0.291027924.
• It might not seem like much, but after repeating this process 10,000 times, for example, the error plummets to
0.0000351085.
• At that point, when we feed forward 0.05 and 0.1, the two outputs neurons generate 0.015912196 (vs 0.01 target) and
0.984065734 (vs 0.99 target).