Soft Computing Unit I

Download as pdf or txt
Download as pdf or txt
You are on page 1of 182

Soft Computing

Unit I
Introduction to Soft
Computing
Introduction to Soft Computing
 Concept of computation
 Hard computing
 Soft computing
 How soft computing?

 Hard computing vs. Soft computing

 Hybrid computing
CONCEPT OF COMPUTATION

Antecedent Consequent

Control Action
Figure: Basic of computing
Important characteristics of
computing
 Should provide precise solution.

 Control action should be unambiguous and accurate.

 Suitable for problem, which is easy to model mathematically.


Hard computing
 In 1996, L. A. Zade (LAZ) introduced the term hard computing.
 According to LAZ: We term a computing as Hard computing, if
 Precise result is guaranteed.
 Control action is unambiguous.

 Control action is formally defined (i.e., with mathematical


model or algorithm).
Examples of hard computing
• Solving numerical problems (e.g., roots of polynomials,
integration, etc.).
• Searching and sorting techniques.
• Solving computational geometry problems (e.g.,
shortest tour in a graph, finding closet pair of points
given a set of points, etc.). many more…
Soft computing
• The term soft computing was proposed by the inventor
of fuzzy logic, Lotfi A. Zadeh. He describes it as
follows.

Defi nition : Soft computing


Soft computing is a collection of methodologies that aim to
exploit the tolerance for imprecision and uncertainty to achieve
tractability, robustness, and low solution cost. Its principal
constituents are fuzzy logic, neuro-computing, and probabilistic
reasoning. The role model for soft computing is the human
mind.
Characteristics of soft
computing
• It does not require any mathematical modeling of
problem solving.
• It may not yield the precise solution.

• Algorithms are adaptive (i.e., it can adjust


to the change of dynamic environment).
• Use some biological inspired methodologies such as
genetics, evolution, Ant’s behaviors, particles
swarming, human nervous system, etc.).
Examples of soft computing

Soft
⋮ computing

Example: Hand written character recognition (Artificial Neural


Networks)
Examples of soft computing

Soft
Bank with maximum return
computing

Example: Money allocation problem (Evolutionary Computing)s


Examples of soft computing

Example: Robot movement (Fuzzy Logic)


How soft computing?
How a student learns from his teacher?
 Teacher asks questions and tell the answers then.
 Teacher puts questions and hints answers and asks
whether the answers are correct or not.
 Student thus learn a topic and store in his memory.
Based on the knowledge he solves new problems.
 This is the way how human brain works.
 Based on this concept Artificial Neural Network is used
to solve problems.
How soft computing?
How world selects the best?
 It starts with a population (random).
 Reproduces another population (next generation).
 Rank the population and selects the superior individuals.
Genetic algorithm is based on this natural phenomena.
• Population is synonymous to solutions.
• Selection of superior solution is synonymous to
exploring the optimal solution.
How soft computing?
How a doctor treats his patient?
 Doctor asks the patient about suffering.
 Doctor find the symptoms of diseases.
 Doctor prescribed tests and medicines.
This is exactly the way Fuzzy Logic works.
• Symptoms are correlated with diseases with
uncertainty.
• Doctor prescribes tests/medicines fuzzily.
Advantages of soft
computing
 The simple mathematical calculation is performed
 Good efficiency
 Applicable in real-time
 Based on human reasoning.
Disadvantages of soft
computing
 It gives an approximate output value
 If a small error occurs the entire system stops working, to overcome its
entire system must be corrected from the beginning, which is time
taking process.
Hard computing vs. Soft
computing
Hard computing Soft computing
 It requires a precisely  It is tolerant of
stated analytical model imprecision,
and often a lot of uncertainty, partial
computation time. truth, and
approximation.
 It is based on binary
logic, crisp systems,  It is based on fuzzy
numerical analysis and logic, neural nets and
crisp software. probabilistic reasoning.
 It has the  It has the
characteristics of characteristics of
precision and approximation and
categoricity. dispositionality.
Hard computing vs. Soft
computing
Hard Soft computing
computing
 It is deterministic.  It is Non
Deterministic.
 It requires exact  It can deal with
ambiguous and
input data. noisy data.
 It is strictly  It allows parallel
sequential. computations.
 It produces  It does not
precise answers. guarantee .
Hybrid computing
It is a combination of the conventional hard computing and
emerging soft computing.

HC SC

Hybrid computing

Figure: Concept of Hybrid Computing


What is Fuzzy logic?

Fuzzy logic is a mathematical language to express


something.
This means it has grammar, syntax, semantic like a
language for communication.

There are some other mathematical languages also


known

• Relational algebra (operations on sets)


• Boolean algebra (operations on Boolean variables)
• Predicate logic (operations on well formed formulae (wff),
also called predicate propositions)

Fuzzy logic deals with Fuzzy set.


A brief history of Fuzzy Logic

First time introduced by Lotfi Abdelli Zadeh (1965),


University of California, Berkley, USA (1965).

He is fondly nick-named as LAZ


A brief history of Fuzzy logic

Dictionary meaning of fuzzy is not clear, noisy etc.


Example: Is the picture on this slide is fuzzy?

Antonym of fuzzy is crisp


Example: Are the chips crisp?
Example : Fuzzy logic vs. Crisp logic

Yes or No
Crisp answer
True or False

Yes

A
Crisp
Milk/Water liquid

N
o

Is the liquid
colorless?
Example : Fuzzy logic vs. Crisp logic

May be

May not be

Fuzzy answer Absolutely

Partially

etc
Example : Fuzzy logic vs. Crisp logic

Score

99
Extremely honest
 Ankit

 Rajesh
Very honest 75
 Santosh
Fuzzy
 Kabita 55
Honest at times

Extremely dishonest 35
Is the person
honest?
World is fuzzy!

Our world is
better
described
with fuzzily!
Concept of fuzzy system

Fuzzy
element(s)

I Fuzzy set(s)
N
P
U Fuzzy rule(s)
T

Fuzzy implication(s) O
(Inferences) U
T
P
Fuzzy system U
T
Concept of fuzzy set
To understand the concept of fuzzy set it is better, if we first
clear our idea of crisp set.
X = The entire population of India.
H = All Hindu population = { h1, h2, h3, ... , hL }
M = All Muslim population = { m1, m2, m3, ... , mN }

Universe of X
discourse

H
M

Here, All are the sets of finite numbers of individuals. Such a


set is called crisp set.
Example of fuzzy set

Let us discuss about fuzzy set. X = All students in IT60108.


S = All Good students.
S = { (s, g) | s ∈ X } and g(s) is a measurement of
goodness of the
student s.
Example:
S = { (Rajat, 0.8), (Kabita, 0.7), (Salman, 0.1), (Ankit, 0.9) }
etc.
Fuzzy set vs. Crisp set

Crisp Set Fuzzy Set

1. S = { s | s ∈ X } 1. F = (s, µ) | s ∈ X and
µ(s) is the degree of s.

2. It is a collection of
elements. 2. It is collection of
ordered pairs.
3. Inclusion of an el- 3. Inclusion of an el-
ement s ∈ X into S is ement s ∈ X into F is
crisp, that is, has strict fuzzy, that is, if present,
boundary yes or no. then with a degree of
membership.
Fuzzy set vs. Crisp set

Note: A crisp set is a fuzzy set, but, a fuzzy set is not


necessarily a crisp set.

Example:
H = { (h1, 1), (h2, 1), ... , (hL, 1) }
Person = { (p1 , 1), (p2 , 0), ... , (pN , 1) }
In case of a crisp set, the elements are with extreme values
of degree of membership namely either 1 or 0.

How to decide the degree of memberships of elements in a


fuzzy set?
City Banga Bomba Hyderabad Kharagpur Madras Delhi

lore y

DoM 0.95 0.90 0.80 0.01 0.65 0.75

How the cities of comfort can be judged?


Example: Course evaluation in a crisp way

EX = Marks ≥ 90
A = 80 ≤ Marks < 90
B = 70 ≤ Marks < 80
C = 60 ≤ Marks < 70
D = 50 ≤ Marks < 60
P = 35 ≤ Marks < 50
F = Marks < 35
Example: Course evaluation in a crisp way

1 F P D C B A EX

0
3 5 6 7 80 90
5 0 0 0 100
Example: Course evaluation in a fuzzy way

F P D C B A EX
1

0
3 5 6 7 80 90
5 0 0 0 100
Few examples of fuzzy set

High Temperature

Low Pressure

Color of Apple

Sweetness of Orange

Weight of Mango

Note: Degree of membership values lie in the range [0...1].


Some basic terminologies and notations

Definition 1: Membership function (and Fuzzy set)


If X is a universe of discourse and x ∈ X, then a fuzzy set A in
X is defined as a set of ordered pairs, that is
A = {(x, µA(x ))|x ∈ X } where µA(x) is called the membership
function for the fuzzy set A.

Note:
µA(x) map each element of X onto a membership grade (or
membership value) between 0 and 1 (both inclusive).

Question:
How (and who) decides µA(x) for a Fuzzy set A in X ?
Some basic terminologies and notations

Example:
x=All cities in India
A=City of comfort
A={(New Delhi,0.7),(Banglore,0.9),(Chennai,0.8),(Hydrabad,0.6),
(Kolkata,0.3),(Kharagpur,0)}
Membership function with discrete membership
values

The membership values may be of discrete


values.
A


Membership function with discrete membership
values

Either elements or their membership values (or both) also


may be of discrete values.

A ={(0,0.1),(1,0.30),(2,0.78)……(10,0.1)}

1.0
0.8
0.6 Note : X = discrete
µ

value
0.4

0.2
How you measure happiness ??
0 2 4 10
6 8

Number of children (X)

A = “Happy family”
Membership function with continuous
membership values

1.0
0.8
B (x) 1
4

0.6
 x 50 
1 
10  

0.4

0.2

0 50 10
0 B

Age (X)
Note : x = real value
B = “Middle aged” = R+
Fuzzy terminologies: Support

Support: The support of a fuzzy set A is the set of all


points x ∈ X
such that µA (x ) > 0

A
Fuzzy terminologies: Core

Core: The core of a fuzzy set A is the set of all points x in X


such that
µA (x ) = 1

core (A) = {x | µA(x) = 1}

1.0
µ

0.5

x
Fuzzy terminologies: Normality

Normality : A fuzzy set A is a normal if its core is non-empty.


In other words, we can always find a point x ∈ X such that
µA(x ) = 1.

1.0
Fuzzy terminologies: Crossover points

Crossover point : A crossover point of a fuzzy set A is a point


x∈X
at which µA (x ) = 0.5. That is Crossover (A) = {x |µA(x ) =
0.5}.
Fuzzy terminologies: Fuzzy Singleton

Fuzzy Singleton : A fuzzy set whose support is a single


point in X
with µA(x ) = 1 is called a fuzzy singleton. That is
|A| = { x | µA (x ) = 1} .
Fuzzy terminologies: α-cut and strong α-cut

α-cut and strong α-cut :

The α-cut of a fuzzy set A is a crisp set


defined by

Aα = { x | µA (x) ≥ α }

Strong α-cut is defined similarly :

Aα ’ = { x | µA (x) > α }

Note : Support(A) = A0’ and Core(A) = A1.


Fuzzy terminologies: Convexity
Convexity : A fuzzy set A is convex if and only if for any x1 and
x2 ∈ X
and any λ ∈ [0, 1]

µA (λx1 + (1 -λ)x2) ≥ min(µA(x1), µA(x2))

Note :
• A is convex if all its α- level sets are convex.
• Convexity (Aα)function
Membership
=⇒ Aα is composedNon-convex
of a single line
segment only.
is convex Membership
function

1. 1.
0 0
Fuzzy terminologies: Bandwidth

Bandwidth :

For a normal and convex fuzzy set, the bandwidth (or width)
is defined as the distance the two unique crossover points:
Bandwidth(A) = | x1 - x2 |
where µA(x1) = µA(x2) = 0.5
Fuzzy terminologies: Symmetry

Symmetry :

A fuzzy set A is symmetric if its membership function around


a certain point x = c, namely µA(x + c) = µA(x - c) for all x ∈
X.
Fuzzy terminologies: Open and Closed
A fuzzy set A is
Open left
If limx →−∞ µA (x) = 1 and limx → +∞ µA (x)
=0
Open right:
If limx →−∞µA(x) = 0 and limx → + ∞ µA(x) =
1
Closed
If : limx →−∞ µA (x) = limx → +∞ µA (x) = 0 Open right
Open left Closed
Fuzzy vs. Probability

Fuzzy : When we say about certainty of a thing

Example: A patient come to the doctor and he has to


diagnose so that medicine can be prescribed.
Doctor prescribed a medicine with certainty 60% that the
patient is suffering from flue. So, the disease will be cured
with certainty of 60% and uncertainty 40%. Here, in stead of
flue, other diseases with some other certainties may be.

Probability: When we say about the chance of an event to


occur

Example: India will win the T20 tournament with a chance


60% means that out of 100 matches, India own 60 matches.
Prediction vs. Forecasting

The Fuzzy vs. Probability is analogical to Prediction vs.


Forecasting

Prediction : When you start guessing about things.

Forecasting : When you take the information from the past


job and apply it to new job.

The main difference:


Prediction is based on the best guess from experiences.
Forecasting is based on data you have actually recorded and
packed from previous job.
Fuzzy Membership
Functions
Fuzzy membership functions
A fuzzy set is completely characterized by its membership
function (sometimes abbreviated as MF and denoted as
µ ). So, it would be important to learn how a membership
function can be expressed (mathematically or otherwise).
Note: A membership function can be on
a) a discrete universe of discourse and
b) a continuous universe of discourse.
Example:

1.0 1.0
0.8 0.8
0.6 0.6
µA

µB
0.4 0.4

0.2 0.2

0 10 20 30 40 50
0 2 4 6 8 10
60

Number of children (X) Age (X)

A = Fuzzy set of “Happy family” B = “Young age”


Fuzzy membership functions
So, membership function on a discrete universe of course is
trivial. However, a membership function on a continuous
universe of discourse needs a special attention.
Following figures shows a typical examples of membership
functions.

µ
µ

x x x

< triangular > < trapezoidal > < curve >


µ

x x

< non-uniform > < non-uniform >


Fuzzy MFs : Formulation and parameterization
In the following, we try to parameterize the different MFs
on a continuous universe of discourse.
Triangular MFs : A triangular MF is specified by three
parameters
{ a, b, c} and can be formulated  as follows.
0 if x ≤ a
 x −a
b−a if a ≤ x ≤
triangle (x ;a ,b ,c ) c−x
(1
=  c−b b if b ≤ x )

0 ≤ c if c ≤
x

1.0

a b c
Fuzzy MFs: Trapezoidal
A trapezoidal MF is specified by four parameters {a, b, c,
d } and can be defined as follows:

1 if x ≤ a
 x −a if a ≤ x
≤b
 b−a
trapeziod (x ; a, b, c, (2
2 if b ≤ x ≤ c
d) = d )
 d−x
−c if c ≤ x ≤ d
0 if d ≤ x

1.0

a c d
b
Fuzzy MFs: Gaussian

A Gaussian MF is specified by two parameters {c, σ}


and can be defined as below:

—12 ( x − c 2
gaussian(x;c,σ) ) .
σ
=e

c
0.
0.1 0.9
1 c c


Fuzzy MFs: Generalized bell

It is also called Cauchy MF. A generalized bell MF is specified


by three parameters {a, b, c} and is defined as:

bell(x; a, b, c)= 1
1+| x −a c |
2b

b
Slope at x
= 2a
b
Slope at y = 
x b y
2a

c- c c+
a a
Example: Generalized bell MFs

1
Example: 1+ 2
;
µ(x)=
a = b = 1 and cx =
0;
1.0

-1 0 1
Generalized bell MFs: Different shapes

Changing Changing
a b

Changing
a Changing a and
b
Fuzzy MFs: Sigmoidal MFs

Parameters: {a, c } ; where c = crossover point and a =


slope at c;
1
Sigmoid(x;a,c a
− [ x− c ]
)= 1+e

1.
0

Slope =
0.
a 5

c
Fuzzy MFs : Example

Example : Consider the following grading system for a


course.

Excellent = Marks ≤ 90
Very good = 75 ≤ Marks ≤ 90 Good = 60 ≤ Marks ≤
75 Average = 50 ≤ Marks ≤ 60 Poor = 35 ≤ Marks ≤
50 Bad= Marks ≤ 35
Grading System

A fuzzy implementation will look like the


following.

Bad poo Averag Goo Very Excellen


r e d Good t

1
.8
.6
.4
.2
0
1 2 3 6
40 70 80 90
0 0 0 0
50
marks
You can decide a standard fuzzy MF for each of the fuzzy
garde.
Operations on Fuzzy Sets
Basic fuzzy set operations: Union

Union (A ∪ B):

µA∪B (x ) = max{ µA (x ), µB
(x )}

Example:
A = {(x1 , 0.5), (x2, 0.1), (x3, 0.4)} and
B = {(x 1 , 0.2), (x2, 0.3), (x3, 0.5)};
C = A ∪ B = { (x1 , 0.5), (x2 , 0.3), (x3 , 0.5)}
µB µB
µA
µ
µAU
µA B

b q c b q c
a p a p x
x
Basic fuzzy set operations: Intersection

Intersection (A ∩ B):

µA∩B (x ) = min{ µA (x ), µB (x
)}

Example:
A = {(x1 , 0.5), (x2, 0.1), (x3, 0.4)} and
B = {(x 1 , 0.2), (x2, 0.3), (x3, 0.5)};
C = A ∩ B = { (x1 , 0.2), (x2 , 0.1), (x3 , 0.4)}

µA
µB µAᴖ
µ B

x b q c p b q c
a p a
x
Basic fuzzy set operations: Complement

Complement (AC ):

µAAC (x ) = 1-µA (x
)

Example:
A = {(x1 , 0.5), (x2, 0.1), (x3, 0.4)}
C = AC = {(x
µA 1 , 0.5), (x2, 0.9), (x3, 0.6)}
µA
1.0
µA’
µ

q p x q
p
x
Basic fuzzy set operations: Products

Algebric product or Vector product (A•B):

µA•B (x ) = µA (x ) • µB
(x )

Scalar product (α × A):

µαA (x ) = α · µA (x )
Basic fuzzy set operations: Sum and Difference

Sum (A + B):

µA+B (x ) = µA (x ) + µB (x ) − µA (x ) · µB (x )

Difference (A − B = A ∩ BC ):

µA−B (x ) = µA∩B C (x )

Disjunctive sum: A ⊕ B = (AC ∩ B) ∪ (A ∩ BC ))

Bounded Sum: | A(x) ⊕ B(x) |

µ|A( x ) ⊕B( x ) | = min{ 1, µA (x ) + µB (x )}

Bounded Difference: | A(x) g B(x) |

µ|A( x ) g B( x ) | = max{ 0, µA (x ) + µB (x ) − 1}
Basic fuzzy set operations: Equality and Power

Equality (A = B):

µA (x ) = µB (x )

Power of a fuzzy set Aα :

µAα (x ) =
{ µA (x )} α

If α < 1, then it is called dilation


If α > 1, then it is called
concentration
Basic fuzzy set operations: Cartesian product

Caretsian Product (A × B):

µA×B (x, y ) = min{ µA (x ), µB


(y )

Example 3:
A(x) = {(x 1 , 0.2), (x2, 0.3), (x3, 0.5), (x4, 0.6)}
B(y) = {(y1 , 0.8), (y2, 0.6), (y3, 0.3)} y1 y3
 y2
x1 0.2 0.2 0.2
x2
  0.3 0.3 
A × B = min{ µA (x ), µB 
0.3
x
3 .5 0.5 
(y )} = 0 0.3
x4 0.6
0.6 0.3
Properties of fuzzy sets

Commutativity :

A∪B = B∪A A∩B =


B∩A

Associativity :

A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩
C

Distributivity :

A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪
C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩
C)
Properties of fuzzy sets
Idempotence :
A ∪ A =
A
A ∩ A =
∅A ∪ ∅
= A A∩
Transitivity : ∅= ∅
If A ⊆ B, B ⊆ C then A ⊆
C
Involution :

(Ac ) c = A
De Morgan’s law :

(A ∩ B) c = Ac ∪ Bc
(A ∪ B) c = Ac ∩ Bc
Few Illustrations on Fuzzy
Sets
Example 1: Fuzzy Set Operations

Let A and B are two fuzzy sets defined over a universe of


discourse X with membership functions µA(x ) and µB (x ),
respectively. Two MFs µA(x ) and µB (x ) are shown graphically.
µA(x)

a1 a2 µB(x) b1 a1 =b2 a2 =b3 a4


x
a3

a4 x
Example 1: Plotting two sets on the same graph

Let’s plot the two membership functions on the


same graph

µB µA
µ

b1
a2 b 4 a3 a 4

a1 x
Example 1: Union and Intersection
The plots of union A ∪ B and intersection A ∩ B are
shown in the following.

µB µA

b1

a1

a2 b4 a3 a4 x
A B ( )x

a2 b4 A B ( )x
x b1 a1 a2 a3 a4
x
Example 1: Intersection

The plots of union µA¯ (x ) of the fuzzy set A is shown in the


following.
A ( )x

 ( )x
A
a b a b
x x
Fuzzy set operations: Practice

Consider the following two fuzzy sets A and B defined over a


universe of discourse [0,5] of real numbers with their
membership functions
µA (x ) = x
+x
1 and µB (x ) =
2−x
Determine the membership functions of the following and
draw them graphically.

i. A , B
ii. A ∪ B
iii. A ∩ B
iv. (A ∪ B) c [Hint: Use De’ Morgan law]
Example 2: A real-life example

Two fuzzy sets A and B with membership functions µA(x ) and


µB (x ), respectively defined as below.
A = Cold climate with µA (x ) as the MF.
B = Hot climate with µB (x ) as the M.F.

µA µB
1.0

0.5
µ

-15 -10 -5 0 5 10 15 20 25 30 35 40 45 50

Here, X being the universe of discourse representing entire


range of temperatures.
Example 2: A real-life example
What are the fuzzy sets representing the
following?

1 Not cold climate


2 Not hold climate
3 Extreme climate
4 Pleasant climate

Note: Note that ”Not cold climate” /= ”Hot climate” and


vice-versa.
Example 2 : A real-life example
Answer would be the following.
1 Not cold climate

A with 1 − µA (x ) as the MF.


2 Not hot climate
B with 1 − µB (x ) as the MF.
3 Extreme climate
A ∪ B with µA∪B (x ) = max(µA(x ), µB (x )) as the MF.
4 Pleasant climate
A ∩ B with µA∩B (x ) = min(µA(x ), µB (x )) as the MF.
The plot of the MFs of A ∪ B and A ∩ B are shown in the
following. Pleasant
Extreme
1.0 climate
µA µB
climate
1.0
A B

A B
0.
5
µ

- -
15 10 -5 0 5 10 15 20 25 30 35 40 45 50
5 15 25 5 25
x
x x
Few More on Membership
Functions
Generation of MFs
Given a membership function of a fuzzy set representing a
linguistic hedge, we can derive many more MFs representing
several other linguistic hedges using the concept of
Concentration and Dilation.
Concentration:
Ak = [µA (x )]k ; k > 1

Dilation:
Ak = [µA (x )]k ; k < 1

Example : Age = { Young, Middle-aged, Old }


Thus, corresponding to Young, we have : Not young, Very
young, Not very young and so on.
Similarly, with Old we can have : old, very old, very very old,
extremely old etc.
Thus, Extremely old = (((old ) 2 ) 2 ) 2 and so on
Or, More or less old = A0.5 = (old ) 0.5
Linguistic variables and values

Young Middle- Old


Aged

Very
Old

Very young

0 30 60 100

X=
Age

µyoung (x ) = bell(x, 20, 2, 0)1+(


= 21x )4
0
1
µold (x ) = bell(x, 30, 3, x 100
1+( −30 ) 6
100) =
µmiddle−aged = bell(x, 30, 60, 50)

Not young = µyoung (x ) = 1 − µyoung (x )


Young but not too young = µyoung (x ) ∩
µyoung (x )
Any questions??
Unit II : Artificial Neural Network -I 6L
1. Biological neuron, Artificial neuron model, concept of bias and
threshold ,
2. Activation functions,
3. McCulloch‐ Pits Neuron Model ,
4. learning paradigms: supervised, unsupervised, reinforcement,
5. Linear neuron model : concept of error energy ,
6. gradient descent algorithm and application of linear neuron for
linear regression,
7.Learning mechanisms: Hebbian, Delta Rule,
8. Perceptron and its limitations
Books Referred for unit IV are:

Text Books:

1. Fundamentals of Neural Networks: Architectures, Algorithms


and applications, Laurene Fausett, Pearson Education, Inc,
2008.

2. Principles of Soft Computing , S. N. Sivanandam, S. N.


Deepa, John Wiley & Sons, 2007.
Practical Sessions: (Use MATLAB / OCTAVE/ SCILAB /any
appropriate open source software.) (any 8 experiments)

1. Design & implementation of different Activation Function.

2. Implement simple logic network using MP neuron model

3. Implement a simple linear regressor with a single neuron


model
Introduction
• BRAIN COMPUTATION
- A highly complex non-linear and parallel computer with
structural constituents called “NEURONS”
- The human brain contains about 1011nerve cells, or neurons
- On an average, each neuron is connected to other neurons
through approximately 104 synapses

How is it then that the brain outperforms a digital computer?


ANN characteristics:
1. It is neutrally implemented mathematical model.
2. There exist a larger number of highly interconnected processing
elements called neurons in an ANN.

3. The interconnection with their weighted linkages holds the


informative knowledge.
4. The input signals arrive at the processing elements through
connection and connection weights.

5. The processing elements of the ANN have the ability to learn, recall
and generalize from the given data by suitable assignment or
adjustment of weights.
6. the computational power can be demonstrated only by the
collective behavior of neurons, and it should be noted that no single
neuron carries specific information.
Application of Neural Networks
• Photos and fingerprints could be recognized by imposing a fine grid
over the photo. Each square of the grid becomes an input to the NN.
• Lake water level could be predicted based upon precipitation patterns
and river/dam flows.
• Medical diagnosis is an ideal application for NN.
• Scheduling of buses, airplanes and elevators could be optimized by
predicting demand.
• Voice reorganization could be obtained by analyzing the audio
oscilloscope pattern, much like a stock market graph.
• Whether prediction may be possible. Inputs would include weather
reports from surrounding areas. Output(s) would be the future weather
in specific areas based on the input information. Effect such as ocean
currents and jet streams could be included.
• Air traffic control could be automated with the location, altitude,
direction and speed of each radar blip taken as input to the network.
The output would be the air traffic controllers instruction in response to
each blip
Application of Neural Networks
• Appraisal and valuation of property, buildings, automobiles, machinery
etc should be an easy task for a NN.

• Criminal sentencing could be predicted using a large sample of crime


details as input and the resulting sentences as output.

• Data mining, cleaning and validation could be achieved by determining


which records suspiciously diverge from the pattern of their peers.

• Employee hiring could be optimized if the NN were able to predict


which job applicant would show the best of job performance.

• Expert consultants could package their intuitive expertise into a NN to


analysis of past incidents.

• Fraud detection regarding credit cards, insurance or taxes could be


automated using a NN analysis of past incidents.
Application of Neural Networks
• Handwriting and typewriting could be recognized by imposing a grid
over the writing, and then each square of the grid becomes an input to
the NN. This is called “Optical Character Recognition”.

• Staff scheduling requirements for restaurants, retail stores, police


stations, banks etc. could be predicted based on the customer flow, day
week, paydays, holidays, weather, season etc.

• Traffic flows could be predicted so that signal timing could be


optimized. The NN could recognize “a weekday morning rush hour
during a school holiday” or “a typical winter Sunday morning”.
Advantages & Disadvantages of ANN:
Advantages:

1. A NN can perform tasks that a linear program cant.


2. When an element of the NN fails, it can continue without any
problem by their parallel nature.
3. A NN learns and does not need to be reprogrammed.
4. It can be implemented in any application.
5. It can be implemented without any problem.

Disadvantages:

1. The NN needs training to operate.


2. The architecture of a NN is different from the architecture of
microprocessors therefore needs to be emulated.
3. Requires high processing time for large NNs.
Biological Neuron

12
Cntd…

• ANN resembles brain in two aspects:


1. Knowledge is acquired by the network through a
learning process,
2. Inter-neuron connection strengths known as synaptic
weights are used to store the knowledge.

• An artificial neuron is characterized by:


1. Architecture (connection between neurons)
2. Training or learning (determining weights on the
connections)
3. Activation Function
Biological Neuron

• Cell structures

• Soma
• Dendrites
•Axon
• Synaptic terminals
(Synapses)

14
A Typical Nerve Cell
• Dendrite: receives the signals from
other neurons (Accept inputs)

• Soma: sums all the incoming signals.


(Process the inputs)

• Axon: when a particular amount of


input is received, then the cell fires. It
transmits signal through axon to other
cells. (Turn the processed inputs into
outputs)

• Synapses: It is biochemical device,


which converts a pre-synaptic electrical
signal into a chemical signal and then
back into a post-synaptic electrical
signal.(The electrochemical contact
between neurons)
Operation:
• Strength of synaptic connections decide the net
signal coming to the cell body.
• Free parameters refer to the strength of the
synapse.

• Every input is associated with synaptic strengths.

• Assuming some a-priori synaptic strength a cell


computes the response.
• If the response differs from the desired value, the
synaptic strengths are adjusted.
• This process may take many iterations.
Model of a Neuron
Three basic elements of the neuronal model:
• A set of synapses or connecting links, each of which is
characterized by a weight or strength of its own
- Specifically, a signal xj at the input of synapse j connected to
neuron k is multiplied by the synaptic weight W . k, j

- Unlike a synapse in the brain, the synaptic weight of an


artificial neuron may lie in a range that includes negative as well
as positive values

• An adder for summing the input signals, weighted by the


respective synapses of the neuron; the operations described
here constitute a linear combiner

• An activation function for limiting the amplitude of the output


of a neuron
Electrical Model of a Neuron

m
y _ ink  bk   xi .wki bk  x1.wk1  x2 .wk 2  ...  xm .wkm
i 1

yk  f ( y _ ink )
Effect of bias
• The neuronal model also includes
an externally applied bias, denoted
by bk
• The bias bk has the effect of
increasing or lowering the net input
of the activation function, depending
on whether it is positive or negative,
respectively
•The use of bias bk , has the effect of
applying an affine transformation to the
output y_ink of the linear combiner in
the model
• An affine transformation is any
transformation that preserves
collinearity (i.e., all points lying on a line
initially still lie on a line after
transformation)
•Depending on whether the bias bk is positive or negative, the
relationship between the induced local field or activation potential
y_ink of neuron k and the linear combiner output uk is modified
Another nonlinear model of a neuron;
wk0 accounts for the bias bk.
Artificial Neuron Model

m
y _ ink   xi .wki  (1).wk 0  x1.wk 1  x2 .wk 2  ...  xm .wkm
i 0

yk  f ( y _ ink )
ANN Topologies
ANN Architectures

1. single-layer feed-forward network;


2. multilayer feed-forward network;
3. single node with its own feedback;
4. single-layer recurrent network;
5. multilayer recurrent network.
6. Competitive network
ANN Architectures

1. single-layer feed-forward
network;
2. multilayer feed-forward network or Fully connected feed-
forward network with one hidden layer and one output layer
4. single-layer recurrent network or Recurrent network
with no self-feedback loops and no hidden neurons

n
Si  W ji X j
j 1
5. multilayer recurrent network or Recurrent network
with hidden neurons
6. Competitive network
Learning
• The main property of an ANN is its capability to learn

1. Parameter Learning:
It updates the connecting weights in a neural net

2. Structure Learning:
It focuses on the change in network structure
Learning Paradigms
LEARNING WITH A TEACHER (SUPERVISED or ASSOCIATIVE LEARNING )

x (input) Neural Y (actual output)


Network

Error (D-Y)
signals Error signal
D
generator (Desired Output)
Learning Paradigms
LEARNING WITH A TEACHER (SUPERVISED or ASSOCIATIVE LEARNING )

• In this the network is trained by providing with input and matching


output patterns.
• These input-output pairs can be provided by an external teacher or by
the system which contains the NN(self-supervised)
LEARNING WITHOUT A TEACHER

• There is no teacher to oversee the learning


process
• Hence there are no labeled examples of the
function to be learned by the network

• Two sub types:


- Reinforcement learning / Neurodynamic
programming
- Unsupervised or self-organized learning
Unsupervised Learning

ANN
X W Y
(input) (Actual output)
Reinforcement learning

Neural
Network
W
x (input) Y (Actual output)

Error signal
Error (D-Y) generator

signals R
(Reinforcement
signal)
Reinforcement learning

• Learning of an input-output mapping is performed through


continued interaction with the environment in order to minimize a
scalar index of performance.

• Built around a critic that converts a primary reinforcement signal


received from the environment into a higher quality reinforcement
signal called the heuristic reinforcement signal, both of which are
scalar inputs.

• The system observes a temporal sequence of stimuli (i.e., state


vectors) also received from the environment, which eventually
result in the generation of the heuristic reinforcement signal.

• The goal of learning is to minimize a cost-to-go function, defined as


the expectation of the cumulative cost of actions taken over a
sequence of steps instead of simply the immediate cost.
Activation Functions
• To map non-linear output functions we require non-
linear activation functions

• The activation functions should be Continuous

• Monotonically increasing

• Differentiable
Monotonicity

• Monotonically increasing function

• Monotonically decreasing function

• Non Monotonic function


ACTIVATION FUNCTION
 ACTIVATION LEVEL – DISCRETE OR CONTINUOUS

* HARD LIMIT FUCNTION (DISCRETE)

• Binary Activation function


• Bipolar activation function
• Identity function

* SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)


• Binary Sigmoidal activation function
• Bipolar Sigmoidal activation function
Activation Functions- Identity function
or Linear Function or Lazy man operator

f(x) = x for all x


yk  y _ ink
Activation Functions- Threshold Function or
Binary Step Function

ify _ ink   , yk  1 f(x) = { 1 if x>= theta


otherwise, yk  0 0 if x<theta
Activation Functions- Signum Function or
Bipolar Step Function

fy _ ink   , yk  1 f(x) = { 1 if x>= theta


otherwise, yk  1 -1 if x<theta
Activation Functions- Binary Sigmoidal Function
or Logistic Sigmoidal Function or Unipolar
Sigmoidal Function
f(x) = 1 /(1+e^-sigmax)
Sigma is the steepness parameter. Range is 0 to 1

The derivative of f(x) is given by


f'(x) =sigma.f(x)[l- f(x)]
Activation Functions- Bipolar Sigmoidal Function
The function is defined as:
f(x) = (1-e^-sigma.x) /(1+e^-sigma.x)

Sigma is the steepness parameter. Range is 0-1to 1

The derivative of f(x) is given by


f'(x) = (sigma/2).[1 + f(x)][l - f(x)]

The bipolar sigmoidal function is closely related ro hyperbolic tangent


function, which is written as
f(x) = (e^x-e^-x)/ (e^x+e^-x)/ = (1-e^-sigma.x) /(1+e^-sigma.x)

The derivative of hyperbolic tangent function is-


h'(x) =[I + h(x)][l- h(x)]
Activation Functions- Ramp Function

•The ramp function is defined as –

f(x) = 1, if X > 1
x, if 0 <= x <= 1
0, if x< 0
ACTIVATION FUNCTION

Activation functions:

(A) Identity

(B) Binary step

(C) Bipolar step

(D) Binary sigmoidal

(E) Bipolar sigmoidal

(F) Ramp
IMP terminologies in ANN
Weights:
• each neuron is connected to other neurons by means of directed
communication link, associated with weights.
• It contains information about input signal

Threshold:
• It is a set value based upon which the final output of the network may
be calculated.
• It is used in the activation function
• A comparison is made between the calculated net input and the
threshold to obtain the network output.
• For each and every application, there is a threshold limit.

1ifnet  
f (net )  
1ifnet  
IMP terminologies in ANN
Weight matrix or weight connection matrix:

Consider a recurrent network with 3 neurons. The network weights at


the current time instant are given by the following symmetric matrix:
W = w11 w12 … w13 = 0 -0.4 0.2
w21 w22 … W23 -0.4 0 0.5
w31 w32 … w33 0.2 0.5 0

Learning rate:
• The learning rate is denoted by "a." It is used to ,control the amount of
weight adjustment of at each step of training.
• The learning rate, ranging from 0 to 1, determines the rate of learning
at each time step.
IMP terminologies in ANN
Bias:
The bias included in the network has its impact in calculating the net
input. The bias is included by adding a component x0=1 to the input
vector x. Thus, the input vector becomes,
X=(1, x1,x2,…,xn) c(Bias)

weight m
input X Y y = mx+c

n
yinj   xi wij  x0 w0 j x1w1 j  x2 w2 j  ...  xn wnj
i 0
n
 w0 j   xi wij
i 1
n
 b j   xi wij
i 1
Advantages of bias over threshold

• Using bias can remove the dependency on


threshold
• We can modify the bias (more adaptable) but
threshold is fixed
• Bias helps in simplifying the separation
boundaries
• Computational flexibility and ease is more if we
use bias as thresholds may be different for
different output neurons
• Hence use of bias is more preferable
McCULLOCH–PITTS NEURON-MP
 Neurons are sparsely and randomly connected

 Firing state is binary (1 = firing, 0 = not firing)

 All but one neuron are excitatory (tend to increase voltage of other cells)

• One inhibitory neuron connects to all other neurons


• It functions to regulate network activity (prevent too many firings)
Mc Culloch-Pitts Neuron
• McCulloch-Pitts
neuron Y may
receive signals from
any number of
other neurons

• Each connection
path is either
excitatory, with
weight w > 0, or
inhibitory, with
weight - p (p > 0)
Cntd…
• The condition that inhibition is absolute requires that theta
for the activation function satisfy the inequality

  nw  p
• Y will fire if it receives k or more excitatory inputs and no
inhibitory inputs, where

kw    (k  1) w

• Although all excitatory weights coming into any particular unit


must be the same, the weights coming into one unit, say Y1 ,
do not have to be the same as the weights coming into
another unit, say Y2
LINEAR SEPARABILITY

 Linear separability is the concept wherein the separation of the


input space into regions is based on whether the network response
is positive or negative.

 Consider a network having


positive response in the first
quadrant and negative response
in all other quadrants (AND
function) with either binary or
bipolar data, then the decision
line is drawn separating the
positive response region from
the negative response region.
Cntd…

• The output is
n
yin  b   xi wi
i 1

• There exists a boundary between the regions where yin>0 & yin<0.
This region may be called as decision boundary & can be determined
by the relation n
b   xi wi  0
i 1

• If there exists weights with bias for which the training input vectors
having +ve response +1, lie on one side of the decision boundary & all
the other vectors having –ve response -1, lie on the other side of the
decision boundary then we can conclude the problem is “linearly
separable”.
yin  b  x1w1  x2 w2
b  x1w1  x2 w2  0
Cntd…
• If w2 is not equal to 0 then w1 b
x2   x1 
w2 w2

• The requirement of +ve response if the net us


b + x1w1 + x2w2 > 0
If threshold value is used
Net input receive > theta (threshold)
yin > Theta
x1w1 + x2w2 > Theta
The separating line equation will

x1w1  x2 w2  
w1 
x2   x1  ( withw2  0)
w2 w2
AND Function using MP
model
AND-NOT Function using
MP model
AND-NOT Function using MP model
XOR Function using MP model
XOR Function using
MP model
XOR Function using MP model
XOR Function using MP model
LINEAR SEPARABILITY: OR Function
LINEAR SEPARABILITY: OR Function
Linear neuron model : application of linear
neuron for linear regression,
Linear Regression:
• In statistics, linear regression is an
approach to modeling the relationship
between a scalar variable y and one or
more explanatory variables denoted X.
• Once we know the equation of this
fitted line we can use it to find y given
X.
• Say , we have conducted an
experiment that gives us some output
value for a set of inputs.
• There are various conventional
methods to do this ,e.g. Ordinary least
squares.
• The best fit line has equation :
Y = mX + c
• We have a choice of two parameters:
m and c
Cntd…
• A linear neuron can be used for this task
since output of a linear neuron is:
• Y2 = w20 (w0) + x1.w21(w1)
• If we keep x0 = 1, then w21 or w1 is slope
and w20 or w0 is the intercept, hence we
have to modify w21 or w1 and w20 or w0
to obtain the best fit line.
• If we make a good choice of w21 or w1 and
w20 or w0 our job is over!
• Many real world problems involve more
than one independent variables then the
regression is called multiple regression.
• For example, if we have two independent
variables , then it becomes a 2-D problem,
the new network would different.
• Now, we have to adjust two slopes and one
intercept.
• This can be extended to solve n-
dimensional problems.
Cntd…
• This can be extended to solve n- dimensional problems.
Concept of error energy

• We can have a combined error measure by squaring


and adding all the errors and dividing it by number
of observations giving MEAN SQUARE ERROR (MSE),
this in fact is the ERROR ENERGY

• This MSE shows how good or bad the line is fitted!


Cntd…

• Considering a 1-D problem ,error energy at a point p is :


• The total error energy is given by:
(Target output – calculated output)^2
• Why do we have to measure this error energy?
• In order to get the best fit the error energy should be minim
Gradient descent algorithm:
Gradient descent algorithm:
Gradient descent algorithm:
Gradient descent algorithm:
Hebb network
• “When an axon of cell A is near enough to excite cell B, & repeatedly or
permanently takes place in firing it, some growth process or metabolic
change takes place in one or both the cells such that A’s efficiency, as
one of the cells firing B, is increased”.

• The weight update in Hebb rule is given by


Wi(new) = Wi(old) + Xi*Y

• It is suited for bipolar data than binary data.


* Bipolar: equation will satisfy
* Binary: equation will not satisfy for two training pair (on, off) &
(off, off)
Hebb Network – Flowchart of training algorithm

• The training algorithm is used


for the calculation of
adjustment of weights.

• The flowchart of training


algorithm of Hebb’s network is
shown in fig.

• S:t refers for each training


inputs and target output pair.
Hebb Network – Training algorithm
Cntd…
Hebb Problem
Hebb Problem –
Hebb Problem –
Hebb Problem –
Hebb Problem –
Hebb Problem –

• The decision boundary differentiates • The decision boundary separates


only the first and fourth inputs and (1,1) from (1,-1) and (-1,-1) and not
not all negative responses are (-1,1).
separated from positive responses.
Hebb Problem –

• The decision boundary is


same for third and fourth
training pairs.
• The decision boundary line
is obtained from these
input training pair
separates the positive
response region to the
negative response region.
• Hence, the weights
obtained from this are the
final weights and are given
by –
w1 = 2, w2 = 2, b = -2
Hebb Problem –
Delta network
Delta network
Perceptron Networks
• It is single-layer feed-forward networks called simple perceptrons.
• It is designed by Rosenblatt in 1962 & Minsky-Papert (1969, 1988).
• Basically, discovered by Block in 1962.

• Mainly it consists of three units:


1. Sensory unit
2. Associator unit
3. Response unit

Sensory Associator Response


unit Unit Unit
Inputs xi desired o/p
PERCEPTRON NETWORKS
 Linear threshold unit (LTU)

x1 w1
w0
w2
x2  n
o
. 
. i=0
wi xi
. wn
n
xn 1 if  wi xi >0
f(xi)= { i=0
-1 otherwise
Flowchart
Training algorithm:
Problem
Problem
Problem
Problem
Thank you

You might also like