A Brief Review of Feed-Forward Neural Networks
A Brief Review of Feed-Forward Neural Networks
A Brief Review of Feed-Forward Neural Networks
net/publication/228394623
CITATIONS READS
83 35,618
Some of the authors of this publication are also working on these related projects:
The effect of improvement of datasets on accuracy achievement in deep learning: an example of disease detection in hops plant View project
All content following this page was uploaded by Murat H. Sazli on 15 May 2014.
MURAT H. SAZLI
ABSTRACT
Artificial neural networks, or shortly neural networks, find applications in a very wide
spectrum. In this paper, following a brief presentation of the basic aspects of feed-forward neural
networks, their mostly used learning/training algorithm, the so-called back-propagation algorithm, have
been described.
1. INTRODUCTION
Artificial neural networks, as the name implies, are inspired from their
biological counterparts, the biological brain and the nervous system. Biological
brain is entirely different than the conventional digital computer in terms of its
structure and the way it processes information. In many ways, biological brain (or
12 MURAT H. SAZLI
human brain as its most perfect example) is far more advanced and superior to
conventional computers. The most important distinctive feature of a biological brain
is its ability to “learn” and “adapt”, while a conventional computer does not have
such abilities. Conventional computers accomplish specific tasks based upon the
instructions loaded to them, the so-called “programs” or “software”.
Basic building block of neural networks is a “neuron”. A neuron can be
perceived as a processing unit. In a neural network, neurons are connected with each
other through “synaptic weight”s, or “weight”s in short. Each neuron in a network
receives “weighted” information via these synaptic connections from the neurons
that it is connected to and produces an output by passing the weighted sum of those
input signals (either external inputs from the environment or the outputs of other
neurons) through an “activation function”.
There are two main categories of network architectures depending on the
type of the connections between the neurons, “feed-forward neural networks” and
“recurrent neural networks”. If there is no “feedback” from the outputs of the
neurons towards the inputs throughout the network, then the network is referred as a
“feed-forward neural network”. Otherwise, if there exists such a feedback, i.e. a
synaptic connection from the outputs towards the inputs (either their own inputs or
the inputs of other neurons), then the network is called a “recurrent neural network”.
Usually, neural networks are arranged in the form of “layer”s. Feed-forward neural
networks fall into two categories depending on the number of the layers, either
“single layer” or “multi-layer”.
3. BACK-PROPAGATION ALGORITHM
e j ( n) = d j − y j ( n) (1)
where d j is the desired output for neuron j and y j (n) is the actual output for
neuron j calculated by using the current weights of the network at iteration n. For a
certain input there is a certain desired output, which the network is expected to
produce. Presentation of each training example from the training set is defined as an
“iteration”.
Instantenous value of the error energy for the neuron j is given in Equation
(2):
1
ε j (n) = e 2j (n) (2)
2
Since the only visible neurons are the ones in the output layer, error signals for those
neurons can be directly calculated. Hence, the instantaneous value, ε (n) , of the
total error energy is the sum of all ε j (n) ’s for all neurons in the output layer, as
given in Equation (3):
1
ε (n) =
2 ∑ e2j (n) (3)
j∈Q
A BRIEF REVIEW OF FEED-FORWARD NEURAL NETWORKS 15
Suppose there are N patterns (examples) in the training set. The average
squared energy for the network is found by Equation (4):
N
1
ε av =
N ∑ ε ( n) (4)
n =1
It is important to note that the instantaneous error energy ε (n) and therefore the
average error energy, ε av , is a function of all the free parameters (i.e., synaptic
weights and bias levels) of the network. Back-propagation algorithm, as explained
in the following, provides the means to adjust the free parameters of the network to
minimize the average error energy, ε av . There are two different modes of back-
propagation algorithm: “sequential mode” and “batch mode”. In sequential mode,
weight updates are performed after the presentation of each training example. One
complete presentation of the training set is called an “epoch”. In the batch mode,
weight updates are performed after the presentation of all training examples, i.e.
after an epoch is completed. Sequential mode is also referred as on-line, pattern or
stochastic mode. This is the most frequently used mode of operation and explained
in the following.
Let us start by giving the output expression for the neuron j in Equation (5):
m
y j ( n) = f
∑ w ji (n) yi (n)
(5)
i =0
where m is the total number of inputs to the neuron j (excluding the bias) from the
previous layer and f is the activation function used in the neuron j, which is some
nonlinear function. Here w j 0 equals the bias b j applied to the neuron j and it
corresponds to the fixed input y0 = +1 .
The weight updates to be applied to the weights of the neuron j is
proportional to the partial derivative of the instantaneous error energy ε ( n) with
respect to the corresponding weight, i.e. ∂ε (n) / ∂w ji ( n) , and using the chain rule
of calculus it can be expressed in Equation (6):
16 MURAT H. SAZLI
From Equations (2), (1) and (5) respectively, Equation (7) is obtained.
∂ε (n)
= e j ( n) (7)
∂e j (n)
∂e j (n)
= −1 (8)
∂y j (n)
m
∂y j (n) m
∂
∑ w ji (n) yi (n)
∂w ji (n)
= f′
∑ w ji (n) yi (n)
i =0
∂w ji (n)
(9)
i =0
m
= f ′
∑ w ji (n) yi (n) yi (n)
i =0
where
m
m
∂f
∑ w ji (n) y i (n)
f′
∑
w ji (n) y i (n) =
m
i =0
(10)
i =0
∂
∑ w ji (n) y i (n)
i =0
Substituting Equations (7), (8) and (9) in Equation (6) yields Equation (11).
∂ε (n) m
∂w ji (n)
= −e j ( n) f
′
∑ w ji (n) yi (n) y j (n)
(11)
i =0
A BRIEF REVIEW OF FEED-FORWARD NEURAL NETWORKS 17
The correction ∆w ji (n) applied to w ji (n) is defined by the delta rule, given in
Equation (12).
∂ε (n)
∆w ji (n) = −η (12)
∂w ji (n)
4. CONCLUSIONS
ÖZET
Yapay sinir ağları, kısaca sinir ağları, çok geniş bir spektrumda uygulama alanları bulmaktadır. Bu
makalede, ileri-beslemeli sinir ağlarının temel özelliklerinin kısa bir tanıtımını takiben, bu tür sinir
ağlarında en yaygın kullanılan öğrenme/eğitme algoritması olan geri-yayınım algoritması tarif
edilmektedir.
ANAHTAR KELİMELER: Yapay sinir ağları, ileri-beslemeli sinir ağları, geri-yayınım algoritması
REFERENCES