Feed Forward Neural Networks: Prof. Adel Abdennour

Prof.
Adel Abdennour
1
Feed Forward Neural Networks
Overview
Training Feed Forward Neural Nets by
Error Back Propagation Method
Example
2
Learning in a multilayer network proceeds the same way as for a
perceptron.
A training set of input patterns is presented to the network.
The network computes its output pattern, and if there is an error
or in other words a difference between actual and desired
output patterns the weights are adjusted to reduce this error.
The classical learning algorithm of FFNN is based on the
gradient descent Rule & Generalized Delta Rule.
3
The activation function used in FFNN are continuous
functions of the weights, differentiable everywhere.
FFNNs with typical activation function (Sigmoid Function)
are capable of approximating to any desired degree of
accuracy
4
x
e
x f
+
=
1
1
) (
Sigmoid Function
Linear Function
5
The mathematical form of Sigmoid Function is :
Gradient Descent Learning Rule:
Requires the definition of an error (or objective function)
The sum of squared errors is usually used
t
p
& o
p
= target and actual output for the pth- pattern
P
T
= total number of input-target vector pairs in the training set
E is total error
6
Gradient Descent Learning Rule:
Minimize error by moving weights along the
decreasing slope of error
iterate through the training set and adjust the
weights to minimize the gradient of the error
7
Giving a single training pattern, weights are updated using
Delta rule:
Where,
8
Generalized Delta Learning Rule:
Assumes activation function which is once differentiable at least.
Assumes that sigmoid function is used.
9
Taking Differential of the sigmoid function w.r.t net
p
we get:
Example:
10
x
1
= 0.1
79 . 0
3 . 1
1
1
=
+ e
83 . 0
6 . 1
1
1
=
+ e
Error Back propagation Method
Different methods are available to train a neural network, but Back
propagation method is most widely used
It has been used since the 1980s to adjust the weights
Basic Technique:
Calculates the error by taking the difference between the calculated
result and the actual result
The error is fed back through the network and the weights are
adjusted to minimize the error
11
Back-propagation training algorithm illustrated:
Back prop adjusts the weights of the NN in order
to minimize the network total mean squared error.
Network activation
Error computation
Forward Step
Error propagation
Backward Step
Hidden
layer
Output
layer
13
Symbols used for Layers:
14
First Step:
Calculate the sum of signals on each of
the neuron in the output layer i.e.
Apply sigmoid function on the output of the
each neuron
15
=
j
j
O
jk
W
k
N
)
k
N (
1
1
k
O f =
+
=
k
N
e
Second Step:
Determine the amount of error:
The equation can be simplified as:
Calculate the corrected weights from the
following expression:
16
)
k
N ( )
k
O
k
t (
k
'
f = o
)
k
O 1 (
k
O )
k
O
k
t (
k
= o
j
W W O o +
k
jk jk
Second Step:
Determine the amount of error in the hidden layer
by using the new weight W
jk
:
Now the new weights between the input and the
hidden layer can be calculated as:
Now apply these steps on every input and iterate
several times until you reach the lowest possible
error.
Then the network is ready to use
17
( )
o O = o
k
k jk
W
j
O 1
j j
i j ij
W
ij
W O o +
Let us understand this process through a simplified
example:
We shall select the learning rate = 0.5, to simplify
the operations
Network to be trained
18
Inputs and Outputs are:
We shall assume random weights initially and start
using the first row of the table of inputs and outputs
19
x
1
x
2
Target (t)
0 0 0
0 1 1
1 0 1
1 1 1

x
1
x
2
t W
11
W
12
W
21
W
22
W
10
W
20
0 0 0 1 0 0 1 1 1

We shall use the following notations for the
network:
h
i1
= input to the first neuron of the hidden layer
h
i2
= input to the second neuron of the hidden layer
h
o1
= output of the neuron 1 of the hidden layer
h
o2
= output of the neuron 2 of the hidden layer
N = input to the neuron of the output layer
O = output of the network
20
Thus we get the following values:
21
h
i1
= W
11
x
1
+ W
21
x
2
= (1) (0) + (0) (0)
= 0

h
i2
= W
12
x
1
+ W
22
x
2

= (0) (0) + (1) (0)
= 0

22

1 i
h
e 1
1
1 O
h
+
=

5 . 0
0
e 1
1
=
+
=

2 i
h
e 1
1
2 O
h
+
=

5 . 0
0
e 1
1
=
+
=

Now the total signal going to the output
layer can be calculated as:
And the Output of the network would be:
23
N = W
10
h
O1
+ W
20
h
O2
= (1) (0.5) + (1) (0.5)
= 1

1
e 1
1
N
e 1
1
O
+
=
+
=

= 0.73106

Now determining the error:
Now calculating the corrected weights between
the hidden and the output layer:
24
( ) ( ) O 1 O O t
O
= o
= (0-0.73106) (0.73106) (1-0.73106)
= -0.14373

1 O
h
O 10
W
10
W o +
= 1 + (0.5) (-.14373) (0.5)
= 0.9641

2 O
h
O 20
W
20
W o +
=1 + (0.5) (-.14373) (0.5)
= 0. 9641

Now we will continue on the same approach
towards the input layer.
25
( )
O 10
W
1 O
h 1
1 O
h
h
1
o = o
= (0.5) (1-0.5) (.9641) (-0.14373)
= -0.0346

( )
O 20
W
2 O
h 1
2 O
h
h
2
o = o
= (0.5) (1-0.5) (0.9641) (-0.14373)
= -0.0346

Now calculating the corrected weights between the input
and the hidden layer:
We note here that the weights have not changed and this
is normal because the inputs are equal to zero. But the
situation will change with the other inputs.
26
1 h 11 11
1
W W x o + =
= 1 + (0.5) (-0.0346) (0)
= 1
1 h 12 12
2
W W x o + =
= 0 + (1) (-0.0346) (0)
= 0
2 h 21 21
1
W W x o + =
= 0 + (1) (-0.0346) (0)
= 0
2 h 22 22
2
W W x o + =
= 1 + (1) (-0.0346) (0)
= 1

SO, the result of the first passage in the training process:
Now we will take the second row of the data
and re-train the network the same way as
the previous and will follow the same steps
27
x
1
x
2
t W
11
W
12
W
21
W
22
W
10
W
20
0 0 0 1 0 0 1 0.9641 0.9641

Using the values and weights that we have
acquired in the previous phase as a result
of training:
28
x
1
x
2
t W
11
W
12
W
21
W
22
W
10
W
20
0 1 1 1 0 0.005115 1.0043 0.97054 0.97544

The training process requires the same
steps to be performed many times to get the
lowest value of the error
The following table shows the weights after
thousand iterations
29
W
11
W
12
W
21
W
22
W
10
W
20
-3.5402 4.0244 -3.5248 4.5814 -11.9103 4.6940

As we see in the table are the actual results very close
to the desired results
Through the previous example, we see that the difficulty
of training does not lie in its understanding, but in the
effort required, especially with repeated operations,
sometimes for thousands of times. That's why this
process is usually performed with the computer
30
x
1
x
2
Target (t) Output (O)
0 0 0 0.0264
0 1 1 0.9867
1 0 1 0.9863
1 1 1 0.9908

Applications of Feed Forward Nets
31
Applications of Feed-forward nets
Pattern recognition
Face Recognition
Character Recognition
Sonar mine/rock recognition
Navigation of a car
Stock-market prediction
Pronunciation (NETtalk)
32
Example: Voice Recognition
Task: Learn to discriminate between two different
voices saying Hello
Data
Sources
Ahmad
Naseer
Format
Frequency distribution (60 bins)
33
Network architecture
Feed forward network
60 input (one for each frequency bin)
6 hidden
2 output (0-1 for Ahmed, 1-0 for Naseer)
34
Presenting the data
Ahmed
Naseer
35
Presenting the data (untrained network)
Ahmed
Naseer
0.43
0.26
0.73
0.55
36
Calculate error
Ahmed
Naseer
0.43 0 = 0.43
0.26 1 = 0.74
0.73 1 = 0.27
0.55 0 = 0.55
37
Back prop error and adjust weights
Ahmed
Naseer
0.43 0 = 0.43
0.26 1 = 0.74
0.73 1 = 0.27
0.55 0 = 0.55
1.17
0.82
38
Repeat process (sweep) for all training pairs
Present data
Calculate error
Back propagate error
Adjust weights
Repeat process multiple times
39
Presenting the data (trained network)
Ahmed
Naseer
0.01
0.99
0.99
0.01
40
Results Voice Recognition
Performance of trained network
Discrimination accuracy between known
Hellos = 100%
Discrimination accuracy between new Hellos
=100%
Additional Issues of Training Neural Networks
When Training these networks, attention must be given to
the following issues.
Optimization
Over fitting
Under fitting
Network size selection
Learning Rate
Lacking attention in any one of the above issues will lead
to an incomplete or ineffective neural network.
41
Optimization:
Several optimization algorithms for training NNs have
been developed. These algorithms are classified into two
classes:
Local Optimization
Where the algorithm may get stuck in a local optimum
without finding a global optimum. Gradient descent is
example of local optimizers
Global Optimization
Where the algorithm searches for the global optimum by
employing mechanism to search larger parts of the search
space
42
Global maximum
Local maximums
Local and Global optimization techniques can be combined to form
hybrid training algorithms
Learning consists of adjusting weights until an acceptable
empirical error has been reached. Two types of supervised
training algorithm exist, based on when weights are updated:
Stochastic/online training:
Where weights are adjusted after each pattern presentation. In
this case the next input pattern is selected randomly from the
training set, to prevent any bias that may occur due to the order
in which pattern occur in the training set.
Batch/offline training:
Where weight changes are accumulated and used to adjust
weights only after all training patterns have been presented.
44
45
46
47
Not good Training Proper Training Excessive Training
48
There are a large number of methods and instructions, used to
avoid such problems, the most important is Early Stopping.
In this the data is divided into three sections:
Training
Validation
Testing

Feed Forward Neural Networks: Prof. Adel Abdennour

Uploaded by

Copyright:

Available Formats

Feed Forward Neural Networks: Prof. Adel Abdennour

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Feed Forward Neural Networks: Prof. Adel Abdennour

Uploaded by

Copyright:

Available Formats

Prof.

You might also like