Unit 2 - Soft Computing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Part- 3

ARTIFICIAL NEURAL
NETWORKS: AN
INTRODUCTION
DEFINITION OF NEURAL NETWORKS
According to the DARPA Neural Network Study (1988, AFCEA
International Press, p. 60):

• ... a neural network is a system composed of many simple processing


elements operating in parallel whose function is determined by network
structure, connection strengths, and the processing performed at
computing elements or nodes.

According to Haykin (1994)

A neural network is a massively parallel distributed processor that has a


natural propensity for storing experiential knowledge and making it
available for use. It resembles the brain in two respects:
• Knowledge is acquired by the network through a learning process.
• Interneuron connection strengths known as synaptic weights are
used to store the knowledge.
BRAIN COMPUTATION
The human brain contains about 10 billion nerve cells, or
neurons. On average, each neuron is connected to other
neurons through approximately 10,000 synapses.
BIOLOGICAL (MOTOR) NEURON
ARTIFICIAL NEURAL NET
 Information-processing system.

 Neurons process the information.

 The signals are transmitted by means of connection links.

 The links possess an associated weight.

 The output signal is obtained by applying activations to the net


input.
MOTIVATION FOR NEURAL NET

 Scientists are challenged to use machines more effectively for


tasks currently solved by humans.

 Symbolic rules don't reflect processes actually used by humans.

 Traditional computing excels in many areas, but not in others.


The major areas being:

 Massive parallelism

 Distributed representation and computation

 Learning ability

 Generalization ability

 Adaptively

 Inherent contextual information processing

 Fault tolerance

 Low energy consumption.


ARTIFICIAL NEURAL NET

W1
X1 Y

W2
X2

The figure shows a simple artificial neural net with two input neurons
(X1, X2) and one output neuron (Y). The inter connected weights are
given by W1 and W2.
ASSOCIATION OF BIOLOGICAL NET
WITH ARTIFICIAL NET
PROCESSING OF AN ARTIFICIAL NET
The neuron is the basic information processing unit of a NN. It consists
of:

1. A set of links, describing the neuron inputs, with weights W1, W2,
…, Wm.

2. An adder function (linear combiner) for computing the weighted


sum of the inputs (real numbers):
m
u = ∑ W jX j
j =1

3. Activation function for limiting the amplitude of the neuron output.


y = ϕ (u + b)
BIAS OF AN ARTIFICIAL NEURON

The bias value is added to the weighted sum

∑wixi so that we can transform it from the origin.

Yin = ∑wixi + b, where b is the bias


x1-x2= -1
x2
x1-x2=0

x1-x2= 1

x1
MULTI LAYER ARTIFICIAL NEURAL NET
INPUT: records without class attribute with normalized attributes
values.

INPUT VECTOR: X = { x1, x2, …, xn} where n is the number of


(non-class) attributes.

INPUT LAYER: there are as many nodes as non-class attributes, i.e.


as the length of the input vector.

HIDDEN LAYER: the number of nodes in the hidden layer and the
number of hidden layers depends on implementation.
OPERATION OF A NEURAL NET

- Bias
x0 w0j
x1 w1j
∑ f
Output y
xn wnj

Input Weight Weighted Activation


vector x vector w sum function
WEIGHT AND BIAS UPDATION
Per Sample Updating

• updating weights and biases after the presentation of each sample.

Per Training Set Updating (Epoch or Iteration)

• weight and bias increments could be accumulated in variables and


the weights and biases updated after all the samples of the
training set have been presented.
STOPPING CONDITION

 All change in weights (∆wij) in the previous epoch are below some
threshold, or

 The percentage of samples misclassified in the previous epoch is


below some threshold, or

 A pre-specified number of epochs has expired.

 In practice, several hundreds of thousands of epochs may be


required before the weights will converge.
NEURAL NETWORKS
 Neural Network learns by adjusting the weights so as to be able
to correctly classify the training data and hence, after testing phase,
to classify unknown data.

 Neural Network needs long time for training.

 Neural Network has a high tolerance to noisy and incomplete


data.
BUILDING BLOCKS OF ARTIFICIAL NEURAL NET
 Network Architecture (Connection between Neurons)

 Setting the Weights (Training)

 Activation Function
LAYER PROPERTIES
 Input Layer: Each input unit may be designated by an attribute
value possessed by the instance.

 Hidden Layer: Not directly observable, provides nonlinearities for


the network.

 Output Layer: Encodes possible values.


TRAINING METHODS
 Supervised Training - Providing the network with a series of
sample inputs and comparing the output with the expected
responses.

 Unsupervised Training - Most similar input vector is assigned to


the same output unit.

 Reinforcement Training - Right answer is not provided but


indication of whether ‘right’ or ‘wrong’ is provided.
ACTIVATION FUNCTION
 ACTIVATION LEVEL – DISCRETE OR CONTINUOUS

 HARD LIMIT FUCNTION (DISCRETE)


• Binary Activation function
• Bipolar activation function
• Identity function

 SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)


• Binary Sigmoidal activation function
• Bipolar Sigmoidal activation function
ACTIVATION FUNCTION

Activation functions:

(A) Identity

(B) Binary step

(C) Bipolar step

(D) Binary sigmoidal

(E) Bipolar sigmoidal

(F) Ramp
CONSTRUCTING ANN
 Determine the network properties:
• Network topology
• Types of connectivity
• Order of connections
• Weight range

 Determine the node properties:


• Activation range

 Determine the system dynamics


• Weight initialization scheme
• Activation – calculating formula
• Learning rule
PROBLEM SOLVING
 Select a suitable NN model based on the nature of the problem.

 Construct a NN according to the characteristics of the application


domain.

 Train the neural network with the learning procedure of the


selected model.

 Use the trained network for making inference or solving problems.


SALIENT FEATURES OF ANN

 Adaptive learning
 Self-organization
 Real-time operation
 Fault tolerance via redundant information coding
 Massive parallelism
 Learning and generalizing ability
 Distributed representation
McCULLOCH–PITTS NEURON
 Neurons are sparsely and randomly connected

 Firing state is binary (1 = firing, 0 = not firing)

 All but one neuron are excitatory (tend to increase voltage of other
cells)

• One inhibitory neuron connects to all other neurons


• It functions to regulate network activity (prevent too many
firings)
LINEAR SEPARABILITY

 Linear separability is the concept wherein the separation of the


input space into regions is based on whether the network response
is positive or negative.

 Consider a network having


positive response in the first
quadrant and negative response
in all other quadrants (AND
function) with either binary or
bipolar data, then the decision
line is drawn separating the
positive response region from
the negative response region.
HEBB NETWORK
Donald Hebb stated in 1949 that in the brain, the learning is performed
by the change in the synaptic gap. Hebb explained it:

“When an axon of cell A is near enough to excite cell B, and repeatedly


or permanently takes place in firing it, some growth process or
metabolic change takes place in one or both the cells such that A’s
efficiency, as one of the cells firing B, is increased.”
HEBB LEARNING
 The weights between neurons whose activities are positively
correlated are increased:
dw ij
~ correlation ( x i , x j )
dt

 Associative memory is produced automatically

 The Hebb rule can be used for pattern association, pattern


categorization, pattern classification and over a range of other
areas.
DEFINITION OF SUPERVISED LEARNING NETWORKS

 Training and test data sets

 Training set; input & target are specified


PERCEPTRON NETWORKS
 Linear threshold unit (LTU)

x1 w1
w0
w2
x2 Σ o
n
. Σ
. wn
w i xi
. i=0
n
xn 1 if Σ wi xi >0
f(xi)= { i=0
-1 otherwise
PERCEPTRON LEARNING

wi = wi + ∆wi
∆wi = η (t - o) xi
where
t = c(x) is the target value,
o is the perceptron output,
η Is a small constant (e.g., 0.1) called learning rate.

 If the output is correct (t = o) the weights wi are not changed

 If the output is incorrect (t ≠ o) the weights wi are changed such


that the output of the perceptron for the new weights is closer to t.

 The algorithm converges to the correct classification


• if the training data is linearly separable
• η is sufficiently small
LEARNING ALGORITHM
 Epoch : Presentation of the entire training set to the neural
network.

 In the case of the AND function, an epoch consists of four sets of


inputs being presented to the network (i.e. [0,0], [0,1], [1,0],
[1,1]).

 Error: The error value is the amount by which the value output by
the network differs from the target value. For example, if we
required the network to output 0 and it outputs 1, then Error = -1.
 Target Value, T : When we are training a network we not only
present it with the input but also with a value that we require the
network to produce. For example, if we present the network with
[1,1] for the AND function, the training value will be 1.

 Output , O : The output value from the neuron.

 Ij : Inputs being presented to the neuron.

 Wj : Weight from input neuron (Ij) to the output neuron.

 LR : The learning rate. This dictates how quickly the network


converges. It is set by a matter of experimentation. It is typically
0.1.
TRAINING ALGORITHM
 Adjust neural network weights to map inputs to outputs.

 Use a set of sample patterns where the desired output (given the
inputs presented) is known.

 The purpose is to learn to


• Recognize features which are common to good and bad
exemplars
MULTILAYER PERCEPTRON

Output Values
Output Layer

Adjustable
Weights

Input Layer
Input Signals
LAYERS IN NEURAL NETWORK
 The input layer:
• Introduces input values into the network.
• No activation function or other processing.

 The hidden layer(s):


• Performs classification of features.
• Two hidden layers are sufficient to solve any problem.
• Features imply more layers may be better.

 The output layer:


• Functionally is just like the hidden layers.
• Outputs are passed on to the world outside the neural
network.
 A training procedure which allows multilayer feed forward Neural
Networks to be trained.

 Can theoretically perform “any” input-output mapping.

 Can learn to solve linearly inseparable problems.


MULTILAYER FEEDFORWARD NETWORK

Inputs

Hiddens
I0
Outputs
h0
I1 o0
h1
I2 o1
h2 Outputs

I3 Hiddens

Inputs
MULTILAYER FEEDFORWARD NETWORK:
ACTIVATION AND TRAINING
 For feed forward networks:
• A continuous function can be
• differentiated allowing
• gradient-descent.
• Back propagation is an example of a gradient-descent technique.
• Uses sigmoid (binary or bipolar) activation function.
In multilayer networks, the activation function is
usually more complex than just a threshold function,
like 1/[1+exp(-x)] or even 2/[1+exp(-x)] – 1 to allow for
inhibition, etc.
GRADIENT DESCENT
 Gradient-Descent(training_examples, η)

 Each training example is a pair of the form <(x1,…xn),t> where


(x1,…,xn) is the vector of input values, and t is the target output
value, η is the learning rate (e.g. 0.1)

 Initialize each wi to some small random value

 Until the termination condition is met, Do


• Initialize each ∆wi to zero

• For each <(x1,…xn),t> in training_examples Do


 Input the instance (x1,…,xn) to the linear unit and compute
the output o
 For each linear unit weight wi Do

• ∆wi= ∆wi + η (t-o) xi


• For each linear unit weight wi Do
• wi=wi+∆wi
MODES OF GRADIENT DESCENT
 Batch mode : gradient descent
w=w - η ∇ED[w] over the entire data D
ED[w]=1/2Σd(td-od)2

 Incremental mode: gradient descent


w=w - η ∇Ed[w] over individual training examples d
Ed[w]=1/2 (td-od)2

 Incremental Gradient Descent can approximate Batch Gradient


Descent arbitrarily closely if η is small enough.
SIGMOID ACTIVATION FUNCTION
x0=1
x1 w1
w0 net=Σi=0n wi xi o=σ(net)=1/(1+e-net)
w2
x2 Σ o
.
. wn
σ(x) is the sigmoid function: 1/(1+e-x)
. dσ(x)/dx= σ(x) (1- σ(x))
xn
Derive gradient decent rules to train:
• one sigmoid function
∂E/∂wi = -Σd(td-od) od (1-od) xi
• Multilayer networks of sigmoid units
backpropagation
BACKPROPAGATION TRAINING ALGORITHM

 Initialize each wi to some small random value.

 Until the termination condition is met, Do

• For each training example <(x1,…xn),t> Do


 Input the instance (x1,…,xn) to the network and compute the
network outputs ok
 For each output unit k
 δk=ok(1-ok)(tk-ok)
 For each hidden unit h
 δh=oh(1-oh) Σk wh,k δk
 For each network weight w,j Do
 wi,j=wi,j+∆wi,j where
 ∆wi,j= η δj xi,j
BACKPROPAGATION

 Gradient descent over entire network weight vector

 Easily generalized to arbitrary directed graphs

 Will find a local, not necessarily global error minimum -in practice
often works well (can be invoked multiple times with different initial
weights)

 Often include weight momentum term


∆wi,j(t)= η δj xi,j + α ∆wi,j (t-1)

 Minimizes error training examples

 Will it generalize well to unseen instances (over-fitting)?

 Training can be slow typical 1000-10000 iterations (use Levenberg-


Marquardt instead of gradient descent)
APPLICATIONS OF BACKPROPAGATION
NETWORK
 Load forecasting problems in power systems.

 Image processing.

 Fault diagnosis and fault detection.

 Gesture recognition, speech recognition.

 Signature verification.

 Bioinformatics.

 Structural engineering design (civil).

You might also like