Unit 2 - Soft Computing

Part- 3
ARTIFICIAL NEURAL
NETWORKS: AN
INTRODUCTION
DEFINITION OF NEURAL NETWORKS
According to the DARPA Neural Network Study (1988, AFCEA
International Press, p. 60):
• ... a neural network is a system composed of many simple processing

elements operating in parallel whose function is determined by network
structure, connection strengths, and the processing performed at
computing elements or nodes.
According to Haykin (1994)
A neural network is a massively parallel distributed processor that has a

natural propensity for storing experiential knowledge and making it
available for use. It resembles the brain in two respects:
• Knowledge is acquired by the network through a learning process.
• Interneuron connection strengths known as synaptic weights are
used to store the knowledge.
BRAIN COMPUTATION
The human brain contains about 10 billion nerve cells, or
neurons. On average, each neuron is connected to other
neurons through approximately 10,000 synapses.
BIOLOGICAL (MOTOR) NEURON
ARTIFICIAL NEURAL NET
Information-processing system.
Neurons process the information.
The signals are transmitted by means of connection links.
The links possess an associated weight.
The output signal is obtained by applying activations to the net

input.
MOTIVATION FOR NEURAL NET
Scientists are challenged to use machines more effectively for

tasks currently solved by humans.
Symbolic rules don't reflect processes actually used by humans.
Traditional computing excels in many areas, but not in others.

The major areas being:
Massive parallelism
Distributed representation and computation
Learning ability
Generalization ability
Adaptively
Inherent contextual information processing
Fault tolerance
Low energy consumption.

ARTIFICIAL NEURAL NET
W1
X1 Y
W2
X2
The figure shows a simple artificial neural net with two input neurons
(X1, X2) and one output neuron (Y). The inter connected weights are
given by W1 and W2.
ASSOCIATION OF BIOLOGICAL NET
WITH ARTIFICIAL NET
PROCESSING OF AN ARTIFICIAL NET
The neuron is the basic information processing unit of a NN. It consists
of:
1. A set of links, describing the neuron inputs, with weights W1, W2,
…, Wm.
2. An adder function (linear combiner) for computing the weighted

sum of the inputs (real numbers):
m
u = ∑ W jX j
j =1
3. Activation function for limiting the amplitude of the neuron output.

y = ϕ (u + b)
BIAS OF AN ARTIFICIAL NEURON
The bias value is added to the weighted sum
∑wixi so that we can transform it from the origin.
Yin = ∑wixi + b, where b is the bias

x1-x2= -1
x2
x1-x2=0
x1-x2= 1
x1
MULTI LAYER ARTIFICIAL NEURAL NET
INPUT: records without class attribute with normalized attributes
values.
INPUT VECTOR: X = { x1, x2, …, xn} where n is the number of

(non-class) attributes.
INPUT LAYER: there are as many nodes as non-class attributes, i.e.

as the length of the input vector.
HIDDEN LAYER: the number of nodes in the hidden layer and the
number of hidden layers depends on implementation.
OPERATION OF A NEURAL NET
- Bias
x0 w0j
x1 w1j
∑ f
Output y
xn wnj
Input Weight Weighted Activation

vector x vector w sum function
WEIGHT AND BIAS UPDATION
Per Sample Updating
• updating weights and biases after the presentation of each sample.
Per Training Set Updating (Epoch or Iteration)
• weight and bias increments could be accumulated in variables and

the weights and biases updated after all the samples of the
training set have been presented.
STOPPING CONDITION
All change in weights (∆wij) in the previous epoch are below some
threshold, or
The percentage of samples misclassified in the previous epoch is

below some threshold, or
A pre-specified number of epochs has expired.
In practice, several hundreds of thousands of epochs may be

required before the weights will converge.
NEURAL NETWORKS
Neural Network learns by adjusting the weights so as to be able
to correctly classify the training data and hence, after testing phase,
to classify unknown data.
Neural Network needs long time for training.
Neural Network has a high tolerance to noisy and incomplete

data.
BUILDING BLOCKS OF ARTIFICIAL NEURAL NET
Network Architecture (Connection between Neurons)
Setting the Weights (Training)
Activation Function
LAYER PROPERTIES
Input Layer: Each input unit may be designated by an attribute
value possessed by the instance.
Hidden Layer: Not directly observable, provides nonlinearities for

the network.
Output Layer: Encodes possible values.

TRAINING METHODS
Supervised Training - Providing the network with a series of
sample inputs and comparing the output with the expected
responses.
Unsupervised Training - Most similar input vector is assigned to

the same output unit.
Reinforcement Training - Right answer is not provided but

indication of whether ‘right’ or ‘wrong’ is provided.
ACTIVATION FUNCTION
ACTIVATION LEVEL – DISCRETE OR CONTINUOUS
HARD LIMIT FUCNTION (DISCRETE)

• Binary Activation function
• Bipolar activation function
• Identity function
SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)

• Binary Sigmoidal activation function
• Bipolar Sigmoidal activation function
ACTIVATION FUNCTION
Activation functions:
(A) Identity
(B) Binary step
(C) Bipolar step
(D) Binary sigmoidal
(E) Bipolar sigmoidal
(F) Ramp
CONSTRUCTING ANN
Determine the network properties:
• Network topology
• Types of connectivity
• Order of connections
• Weight range
Determine the node properties:

• Activation range
Determine the system dynamics

• Weight initialization scheme
• Activation – calculating formula
• Learning rule
PROBLEM SOLVING
Select a suitable NN model based on the nature of the problem.
Construct a NN according to the characteristics of the application

domain.
Train the neural network with the learning procedure of the

selected model.
Use the trained network for making inference or solving problems.

SALIENT FEATURES OF ANN
Adaptive learning
Self-organization
Real-time operation
Fault tolerance via redundant information coding
Massive parallelism
Learning and generalizing ability
Distributed representation
McCULLOCH–PITTS NEURON
Neurons are sparsely and randomly connected
Firing state is binary (1 = firing, 0 = not firing)
All but one neuron are excitatory (tend to increase voltage of other
cells)
• One inhibitory neuron connects to all other neurons

• It functions to regulate network activity (prevent too many
firings)
LINEAR SEPARABILITY
Linear separability is the concept wherein the separation of the

input space into regions is based on whether the network response
is positive or negative.
Consider a network having

positive response in the first
quadrant and negative response
in all other quadrants (AND
function) with either binary or
bipolar data, then the decision
line is drawn separating the
positive response region from
the negative response region.
HEBB NETWORK
Donald Hebb stated in 1949 that in the brain, the learning is performed
by the change in the synaptic gap. Hebb explained it:
“When an axon of cell A is near enough to excite cell B, and repeatedly

or permanently takes place in firing it, some growth process or
metabolic change takes place in one or both the cells such that A’s
efficiency, as one of the cells firing B, is increased.”
HEBB LEARNING
The weights between neurons whose activities are positively
correlated are increased:
dw ij
~ correlation ( x i , x j )
dt
Associative memory is produced automatically
The Hebb rule can be used for pattern association, pattern

categorization, pattern classification and over a range of other
areas.
DEFINITION OF SUPERVISED LEARNING NETWORKS
Training and test data sets
Training set; input & target are specified

PERCEPTRON NETWORKS
Linear threshold unit (LTU)
x1 w1
w0
w2
x2 Σ o
n
. Σ
. wn
w i xi
. i=0
n
xn 1 if Σ wi xi >0
f(xi)= { i=0
-1 otherwise
PERCEPTRON LEARNING
wi = wi + ∆wi
∆wi = η (t - o) xi
where
t = c(x) is the target value,
o is the perceptron output,
η Is a small constant (e.g., 0.1) called learning rate.
If the output is correct (t = o) the weights wi are not changed
If the output is incorrect (t ≠ o) the weights wi are changed such

that the output of the perceptron for the new weights is closer to t.
The algorithm converges to the correct classification

• if the training data is linearly separable
• η is sufficiently small
LEARNING ALGORITHM
Epoch : Presentation of the entire training set to the neural
network.
In the case of the AND function, an epoch consists of four sets of

inputs being presented to the network (i.e. [0,0], [0,1], [1,0],
[1,1]).
Error: The error value is the amount by which the value output by
the network differs from the target value. For example, if we
required the network to output 0 and it outputs 1, then Error = -1.
Target Value, T : When we are training a network we not only
present it with the input but also with a value that we require the
network to produce. For example, if we present the network with
[1,1] for the AND function, the training value will be 1.
Output , O : The output value from the neuron.
Ij : Inputs being presented to the neuron.
Wj : Weight from input neuron (Ij) to the output neuron.
LR : The learning rate. This dictates how quickly the network

converges. It is set by a matter of experimentation. It is typically
0.1.
TRAINING ALGORITHM
Adjust neural network weights to map inputs to outputs.
Use a set of sample patterns where the desired output (given the
inputs presented) is known.
The purpose is to learn to

• Recognize features which are common to good and bad
exemplars
MULTILAYER PERCEPTRON
Output Values
Output Layer
Adjustable
Weights
Input Layer
Input Signals
LAYERS IN NEURAL NETWORK
The input layer:
• Introduces input values into the network.
• No activation function or other processing.
The hidden layer(s):

• Performs classification of features.
• Two hidden layers are sufficient to solve any problem.
• Features imply more layers may be better.
The output layer:

• Functionally is just like the hidden layers.
• Outputs are passed on to the world outside the neural
network.
A training procedure which allows multilayer feed forward Neural
Networks to be trained.
Can theoretically perform “any” input-output mapping.
Can learn to solve linearly inseparable problems.

MULTILAYER FEEDFORWARD NETWORK
Inputs
Hiddens
I0
Outputs
h0
I1 o0
h1
I2 o1
h2 Outputs
I3 Hiddens
Inputs
MULTILAYER FEEDFORWARD NETWORK:
ACTIVATION AND TRAINING
For feed forward networks:
• A continuous function can be
• differentiated allowing
• gradient-descent.
• Back propagation is an example of a gradient-descent technique.
• Uses sigmoid (binary or bipolar) activation function.
In multilayer networks, the activation function is
usually more complex than just a threshold function,
like 1/[1+exp(-x)] or even 2/[1+exp(-x)] – 1 to allow for
inhibition, etc.
GRADIENT DESCENT
Gradient-Descent(training_examples, η)
Each training example is a pair of the form <(x1,…xn),t> where

(x1,…,xn) is the vector of input values, and t is the target output
value, η is the learning rate (e.g. 0.1)
Initialize each wi to some small random value
Until the termination condition is met, Do

• Initialize each ∆wi to zero
• For each <(x1,…xn),t> in training_examples Do

Input the instance (x1,…,xn) to the linear unit and compute
the output o
For each linear unit weight wi Do
• ∆wi= ∆wi + η (t-o) xi

• For each linear unit weight wi Do
• wi=wi+∆wi
MODES OF GRADIENT DESCENT
Batch mode : gradient descent
w=w - η ∇ED[w] over the entire data D
ED[w]=1/2Σd(td-od)2
Incremental mode: gradient descent

w=w - η ∇Ed[w] over individual training examples d
Ed[w]=1/2 (td-od)2
Incremental Gradient Descent can approximate Batch Gradient

Descent arbitrarily closely if η is small enough.
SIGMOID ACTIVATION FUNCTION
x0=1
x1 w1
w0 net=Σi=0n wi xi o=σ(net)=1/(1+e-net)
w2
x2 Σ o
.
. wn
σ(x) is the sigmoid function: 1/(1+e-x)
. dσ(x)/dx= σ(x) (1- σ(x))
xn
Derive gradient decent rules to train:
• one sigmoid function
∂E/∂wi = -Σd(td-od) od (1-od) xi
• Multilayer networks of sigmoid units
backpropagation
BACKPROPAGATION TRAINING ALGORITHM
Initialize each wi to some small random value.
Until the termination condition is met, Do
• For each training example <(x1,…xn),t> Do

Input the instance (x1,…,xn) to the network and compute the
network outputs ok
For each output unit k
δk=ok(1-ok)(tk-ok)
For each hidden unit h
δh=oh(1-oh) Σk wh,k δk
For each network weight w,j Do
wi,j=wi,j+∆wi,j where
∆wi,j= η δj xi,j
BACKPROPAGATION
Gradient descent over entire network weight vector
Easily generalized to arbitrary directed graphs
Will find a local, not necessarily global error minimum -in practice
often works well (can be invoked multiple times with different initial
weights)
Often include weight momentum term

∆wi,j(t)= η δj xi,j + α ∆wi,j (t-1)
Minimizes error training examples
Will it generalize well to unseen instances (over-fitting)?
Training can be slow typical 1000-10000 iterations (use Levenberg-

Marquardt instead of gradient descent)
APPLICATIONS OF BACKPROPAGATION
NETWORK
Load forecasting problems in power systems.
Image processing.
Fault diagnosis and fault detection.
Gesture recognition, speech recognition.
Signature verification.
Bioinformatics.
Structural engineering design (civil).

Unit 2 - Soft Computing

Uploaded by

Copyright:

Available Formats

Unit 2 - Soft Computing

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 2 - Soft Computing

Uploaded by

Copyright:

Available Formats

Part- 3

• ... a neural network is a system composed of many simple processing

According to Haykin (1994)

A neural network is a massively parallel distributed processor that has a

Neurons process the information.

The signals are transmitted by means of connection links.

The links possess an associated weight.

The output signal is obtained by applying activations to the net

Scientists are challenged to use machines more effectively for

Symbolic rules don't reflect processes actually used by humans.

Traditional computing excels in many areas, but not in others.

Distributed representation and computation

Inherent contextual information processing

Low energy consumption.

2. An adder function (linear combiner) for computing the weighted

3. Activation function for limiting the amplitude of the neuron output.

The bias value is added to the weighted sum

∑wixi so that we can transform it from the origin.

Yin = ∑wixi + b, where b is the bias

INPUT VECTOR: X = { x1, x2, …, xn} where n is the number of

INPUT LAYER: there are as many nodes as non-class attributes, i.e.

Input Weight Weighted Activation

• updating weights and biases after the presentation of each sample.

Per Training Set Updating (Epoch or Iteration)

• weight and bias increments could be accumulated in variables and

The percentage of samples misclassified in the previous epoch is

A pre-specified number of epochs has expired.

In practice, several hundreds of thousands of epochs may be

Neural Network needs long time for training.

Neural Network has a high tolerance to noisy and incomplete

Setting the Weights (Training)

Hidden Layer: Not directly observable, provides nonlinearities for

Output Layer: Encodes possible values.

Unsupervised Training - Most similar input vector is assigned to

Reinforcement Training - Right answer is not provided but

HARD LIMIT FUCNTION (DISCRETE)

SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)

(B) Binary step

(C) Bipolar step

(D) Binary sigmoidal

(E) Bipolar sigmoidal

Determine the node properties:

Determine the system dynamics

Construct a NN according to the characteristics of the application

Train the neural network with the learning procedure of the

Use the trained network for making inference or solving problems.

Firing state is binary (1 = firing, 0 = not firing)

• One inhibitory neuron connects to all other neurons

Linear separability is the concept wherein the separation of the

Consider a network having

“When an axon of cell A is near enough to excite cell B, and repeatedly

Associative memory is produced automatically

The Hebb rule can be used for pattern association, pattern

Training and test data sets

Training set; input & target are specified

If the output is correct (t = o) the weights wi are not changed

If the output is incorrect (t ≠ o) the weights wi are changed such

The algorithm converges to the correct classification

In the case of the AND function, an epoch consists of four sets of