Artificial Neural Network: Lecture Module 22
Artificial Neural Network: Lecture Module 22
Artificial Neural Network: Lecture Module 22
Lecture Module 22
Neural Networks
Artificial neural network (ANN) is a machine learning
approach that models human brain and consists of a
number of artificial neurons.
Neuron in ANNs tend to have fewer connections than
biological neurons.
Each neuron in ANN receives a number of inputs.
An activation function is applied to these inputs which
results in activation level of neuron (output value of
the neuron).
Knowledge about the learning task is given in the
form of examples called training examples.
Contd..
y (u b)
The Neuron Diagram
Bias
b
x1 w1
Activation
Induced function
Field
Output
x2 w2
v () y
Input
values
Summing
function
xm wm
weights
Bias of a Neuron
m
v w x
j 0
j j
w0 b
Neuron Models
The choice of activation function determines the
neuron model.
Examples:
a if v c
step function: (v )
b if v c
a if v c
ramp function:
( v ) b if v d
a (( v c )( b a ) /( d c )) otherwise
c
Ramp Function
c d
Sigmoid function
The Gaussian function is the probability function of the
normal distribution. Sometimes also called the frequency
curve.
Network Architectures
Three different classes of network architectures
single-layer feed-forward
multi-layer feed-forward
recurrent
Input Output
layer layer
Hidden Layer
3-4-2 Network
FFNN for XOR
The ANN for XOR has two hidden nodes that realizes this non-
linear separation and uses the sign (step) activation function.
Arrows from input nodes to two hidden nodes indicate the
directions of the weight vectors (1,-1) and (-1,1).
The output node is used to combine the outputs of the two hidden
nodes.
Since we are representing two states by 0 (false) and 1 (true),
we will map negative outputs (1, 0.5) of hidden and output
layers to 0 and positive output (0.5) to 1.
FFNN NEURON MODEL
The classical learning algorithm of FFNN is based on
the gradient descent method.
For this reason the activation function used in FFNN
are continuous functions of the weights, differentiable
everywhere.
The activation function for node i may be defined as a
simple form of the sigmoid function in the following
manner:
Network activation
Forward Step
Error propagation
Backward Step
E (n)
1
EAV N
n 1
Weight Update Rule
The Backprop weight update rule is based on the
gradient descent method:
It takes a step in the direction yielding the maximum
decrease of the network error E.
This direction is the opposite of the gradient of E.
Iteration of the Backprop algorithm is usually
terminated when the sum of squares of errors of the
output values for all training data in an epoch is less
than some threshold such as 0.01
E
wij wij wij w ij -
w ij
Backprop learning algorithm
(incremental-mode)
n=1;
initialize weights randomly;
while (stopping criterion not satisfied or n <max_iterations)
for each example (x,d)
- run the network with input x and compute the output y
- update the weights in backward order starting from
those of the output layer:
w ji w ji w ji
with w ji computed using the (generalized) Delta rule
end-for
n = n+1;
end-while;
Stopping criterions
Total mean squared error change:
Back-prop is considered to have converged when the
absolute rate of change in the average squared error per
epoch is sufficiently small (in the range [0.1, 0.01]).
Generalization based criterion:
After each epoch, the NN is tested for generalization.
If the generalization performance is adequate then stop.
If this stopping criterion is used then the part of the training
set used for testing the network generalization will not used
for updating the weights.
NN DESIGN ISSUES
Data representation
Network Topology
Network Parameters
Training
Validation
Data Representation
Data representation depends on the problem.
In general ANNs work on continuous (real valued) attributes.
Therefore symbolic attributes are encoded into continuous
ones.
Attributes of different types may have different ranges of
values which affect the training process.
Normalization may be used, like the following one which
scales each attribute to assume values between 0 and 1.
xi mini
xi
max i mini
for each value xi of ith attribute, mini and maxi are the minimum and
maximum value of that attribute over the training set.
Network Topology
The number of layers and neurons depend on the
specific task.
In practice this issue is solved by trial and error.
Two types of adaptive algorithms can be used:
start from a large network and successively remove some
neurons and links until network performance degrades.
begin with a small network and introduce new neurons until
performance is satisfactory.
Network parameters
w ij 21N 1
|x i |
i 1,..., N
For weights from the input to the first layer
w jk 21N
i 1,..., N
(
1
wijx )
i
For weights from the first to the second layer
Choice of learning rate
input
d hidden
output
d
Learning and Training
During learning phase,
a recurrent network feeds its inputs through the network,
including feeding data back from outputs to inputs
process is repeated until the values of the outputs do not
change.
This state is called equilibrium or stability
Recurrent networks can be trained by using back-
propagation algorithm.
In this method, at each step, the activation of the
output is compared with the desired activation and
errors are propagated backward through the network.
Once this training process is completed, the network
becomes capable of performing a sequence of actions.
Hopfield Network
A Hopfield network is a kind of recurrent network as
output values are fed back to input in an undirected
way.
It consists of a set of N connected neurons with weights which
are symmetric and no unit is connected to itself.
There are no special input and output neurons.
The activation of a neuron is binary value decided by the sign
of the weighted sum of the connections to it.
A threshold value for each neuron determines if it is a firing
neuron.
A firing neuron is one that activates all neurons that are
connected to it with a positive weight.
The input is simultaneously applied to all neurons, which then
output to each other.
This process continues until a stable state is reached.
Activation Algorithm
Active unit represented by 1 and inactive by 0.
Repeat
Choose any unit randomly. The chosen unit may be
active or inactive.
For the chosen unit, compute the sum of the weights
on the connection to the active neighbours only, if any.
If sum > 0 (threshold is assumed to be 0), then the
chosen unit becomes active, otherwise it becomes
inactive.
If chosen unit has no active neighbours then ignore it,
and status remains same.
Until the network reaches to a stable state
Stable Networks
Weight Computation Method
Weights are determined using training examples.
Here
W is weight matrix
Xi is an input example represented by a vector of N values
from the set {1, 1}.
Here, N is the number of units in the network; 1 and -1
x1
1 w1
x2
y
wm1
m1
xm
One hidden layer with RBF activation functions
1... m1
Output layer with linear activation function.