Soft Computing Unit I

Soft Computing
Unit I
Introduction to Soft
Computing
Introduction to Soft Computing
 Concept of computation
 Hard computing
 Soft computing
 How soft computing?
 Hard computing vs. Soft computing
 Hybrid computing
CONCEPT OF COMPUTATION
Antecedent Consequent
Control Action
Figure: Basic of computing
Important characteristics of
computing
 Should provide precise solution.
 Control action should be unambiguous and accurate.
 Suitable for problem, which is easy to model mathematically.

Hard computing
 In 1996, L. A. Zade (LAZ) introduced the term hard computing.
 According to LAZ: We term a computing as Hard computing, if
 Precise result is guaranteed.
 Control action is unambiguous.
 Control action is formally defined (i.e., with mathematical

model or algorithm).
Examples of hard computing
• Solving numerical problems (e.g., roots of polynomials,
integration, etc.).
• Searching and sorting techniques.
• Solving computational geometry problems (e.g.,
shortest tour in a graph, finding closet pair of points
given a set of points, etc.). many more…
Soft computing
• The term soft computing was proposed by the inventor
of fuzzy logic, Lotfi A. Zadeh. He describes it as
follows.
Defi nition : Soft computing

Soft computing is a collection of methodologies that aim to
exploit the tolerance for imprecision and uncertainty to achieve
tractability, robustness, and low solution cost. Its principal
constituents are fuzzy logic, neuro-computing, and probabilistic
reasoning. The role model for soft computing is the human
mind.
Characteristics of soft
computing
• It does not require any mathematical modeling of
problem solving.
• It may not yield the precise solution.
• Algorithms are adaptive (i.e., it can adjust

to the change of dynamic environment).
• Use some biological inspired methodologies such as
genetics, evolution, Ant’s behaviors, particles
swarming, human nervous system, etc.).
Examples of soft computing
Soft
⋮ computing
Example: Hand written character recognition (Artificial Neural

Networks)
Soft
Bank with maximum return
computing
Example: Money allocation problem (Evolutionary Computing)s

Example: Robot movement (Fuzzy Logic)

How soft computing?
How a student learns from his teacher?
 Teacher asks questions and tell the answers then.
 Teacher puts questions and hints answers and asks
whether the answers are correct or not.
 Student thus learn a topic and store in his memory.
Based on the knowledge he solves new problems.
 This is the way how human brain works.
 Based on this concept Artificial Neural Network is used
to solve problems.
How soft computing?
How world selects the best?
 It starts with a population (random).
 Reproduces another population (next generation).
 Rank the population and selects the superior individuals.
Genetic algorithm is based on this natural phenomena.
• Population is synonymous to solutions.
• Selection of superior solution is synonymous to
exploring the optimal solution.
How soft computing?
How a doctor treats his patient?
 Doctor asks the patient about suffering.
 Doctor find the symptoms of diseases.
 Doctor prescribed tests and medicines.
This is exactly the way Fuzzy Logic works.
• Symptoms are correlated with diseases with
uncertainty.
• Doctor prescribes tests/medicines fuzzily.
Advantages of soft
computing
 The simple mathematical calculation is performed
 Good efficiency
 Applicable in real-time
 Based on human reasoning.
Disadvantages of soft
computing
 It gives an approximate output value
 If a small error occurs the entire system stops working, to overcome its
entire system must be corrected from the beginning, which is time
taking process.
Hard computing vs. Soft
computing
Hard computing Soft computing
 It requires a precisely  It is tolerant of
stated analytical model imprecision,
and often a lot of uncertainty, partial
computation time. truth, and
approximation.
 It is based on binary
logic, crisp systems,  It is based on fuzzy
numerical analysis and logic, neural nets and
crisp software. probabilistic reasoning.
 It has the  It has the
characteristics of characteristics of
precision and approximation and
categoricity. dispositionality.
Hard computing vs. Soft
computing
Hard Soft computing
computing
 It is deterministic.  It is Non
Deterministic.
 It requires exact  It can deal with
ambiguous and
input data. noisy data.
 It is strictly  It allows parallel
sequential. computations.
 It produces  It does not
precise answers. guarantee .
Hybrid computing
It is a combination of the conventional hard computing and
emerging soft computing.
HC SC
Hybrid computing
Figure: Concept of Hybrid Computing

What is Fuzzy logic?
Fuzzy logic is a mathematical language to express

something.
This means it has grammar, syntax, semantic like a
language for communication.
There are some other mathematical languages also

known
• Relational algebra (operations on sets)

• Boolean algebra (operations on Boolean variables)
• Predicate logic (operations on well formed formulae (wff),
also called predicate propositions)
Fuzzy logic deals with Fuzzy set.

A brief history of Fuzzy Logic
First time introduced by Lotfi Abdelli Zadeh (1965),

University of California, Berkley, USA (1965).
He is fondly nick-named as LAZ

A brief history of Fuzzy logic
Dictionary meaning of fuzzy is not clear, noisy etc.

Example: Is the picture on this slide is fuzzy?
Antonym of fuzzy is crisp

Example: Are the chips crisp?
Example : Fuzzy logic vs. Crisp logic
Yes or No
Crisp answer
True or False
Yes
A
Crisp
Milk/Water liquid
N
o
Is the liquid
colorless?
May be
May not be
Fuzzy answer Absolutely
Partially
etc
Score
99
Extremely honest
 Ankit
 Rajesh
Very honest 75
 Santosh
Fuzzy
 Kabita 55
Honest at times
Extremely dishonest 35
Is the person
honest?
World is fuzzy!
Our world is
better
described
with fuzzily!
Concept of fuzzy system
Fuzzy
element(s)
I Fuzzy set(s)
N
P
U Fuzzy rule(s)
T
Fuzzy implication(s) O
(Inferences) U
T
P
Fuzzy system U
T
Concept of fuzzy set
To understand the concept of fuzzy set it is better, if we first
clear our idea of crisp set.
X = The entire population of India.
H = All Hindu population = { h1, h2, h3, ... , hL }
M = All Muslim population = { m1, m2, m3, ... , mN }
Universe of X
discourse
H
M
Here, All are the sets of finite numbers of individuals. Such a

set is called crisp set.
Example of fuzzy set
Let us discuss about fuzzy set. X = All students in IT60108.

S = All Good students.
S = { (s, g) | s ∈ X } and g(s) is a measurement of
goodness of the
student s.
Example:
S = { (Rajat, 0.8), (Kabita, 0.7), (Salman, 0.1), (Ankit, 0.9) }
etc.
Fuzzy set vs. Crisp set
Crisp Set Fuzzy Set
1. S = { s | s ∈ X } 1. F = (s, µ) | s ∈ X and
µ(s) is the degree of s.
2. It is a collection of
elements. 2. It is collection of
ordered pairs.
3. Inclusion of an el- 3. Inclusion of an el-
ement s ∈ X into S is ement s ∈ X into F is
crisp, that is, has strict fuzzy, that is, if present,
boundary yes or no. then with a degree of
membership.
Fuzzy set vs. Crisp set
Note: A crisp set is a fuzzy set, but, a fuzzy set is not

necessarily a crisp set.
Example:
H = { (h1, 1), (h2, 1), ... , (hL, 1) }
Person = { (p1 , 1), (p2 , 0), ... , (pN , 1) }
In case of a crisp set, the elements are with extreme values
of degree of membership namely either 1 or 0.
How to decide the degree of memberships of elements in a

fuzzy set?
City Banga Bomba Hyderabad Kharagpur Madras Delhi
lore y
DoM 0.95 0.90 0.80 0.01 0.65 0.75
How the cities of comfort can be judged?

Example: Course evaluation in a crisp way
EX = Marks ≥ 90
A = 80 ≤ Marks < 90
B = 70 ≤ Marks < 80
C = 60 ≤ Marks < 70
D = 50 ≤ Marks < 60
P = 35 ≤ Marks < 50
F = Marks < 35
Example: Course evaluation in a crisp way
1 F P D C B A EX
0
3 5 6 7 80 90
5 0 0 0 100
Example: Course evaluation in a fuzzy way
F P D C B A EX
1
0
3 5 6 7 80 90
5 0 0 0 100
Few examples of fuzzy set
High Temperature
Low Pressure
Color of Apple
Sweetness of Orange
Weight of Mango
Note: Degree of membership values lie in the range [0...1].

Some basic terminologies and notations
Definition 1: Membership function (and Fuzzy set)

If X is a universe of discourse and x ∈ X, then a fuzzy set A in
X is defined as a set of ordered pairs, that is
A = {(x, µA(x ))|x ∈ X } where µA(x) is called the membership
function for the fuzzy set A.
Note:
µA(x) map each element of X onto a membership grade (or
membership value) between 0 and 1 (both inclusive).
Question:
How (and who) decides µA(x) for a Fuzzy set A in X ?
Some basic terminologies and notations
Example:
x=All cities in India
A=City of comfort
A={(New Delhi,0.7),(Banglore,0.9),(Chennai,0.8),(Hydrabad,0.6),
(Kolkata,0.3),(Kharagpur,0)}
Membership function with discrete membership
values
The membership values may be of discrete

values.
A

Membership function with discrete membership
values
Either elements or their membership values (or both) also

may be of discrete values.
A ={(0,0.1),(1,0.30),(2,0.78)……(10,0.1)}
1.0
0.8
0.6 Note : X = discrete
µ
value
0.4
0.2
How you measure happiness ??
0 2 4 10
6 8
Number of children (X)
A = “Happy family”
Membership function with continuous
membership values
1.0
0.8
B (x) 1
4
0.6
 x 50 
1 
10  
0.4
0.2
0 50 10
0 B
Age (X)
Note : x = real value
B = “Middle aged” = R+
Fuzzy terminologies: Support
Support: The support of a fuzzy set A is the set of all

points x ∈ X
such that µA (x ) > 0
A
Fuzzy terminologies: Core
Core: The core of a fuzzy set A is the set of all points x in X

such that
µA (x ) = 1
core (A) = {x | µA(x) = 1}
1.0
µ
0.5
x
Fuzzy terminologies: Normality
Normality : A fuzzy set A is a normal if its core is non-empty.

In other words, we can always find a point x ∈ X such that
µA(x ) = 1.
1.0
Fuzzy terminologies: Crossover points
Crossover point : A crossover point of a fuzzy set A is a point

x∈X
at which µA (x ) = 0.5. That is Crossover (A) = {x |µA(x ) =
0.5}.
Fuzzy terminologies: Fuzzy Singleton
Fuzzy Singleton : A fuzzy set whose support is a single

point in X
with µA(x ) = 1 is called a fuzzy singleton. That is
|A| = { x | µA (x ) = 1} .
Fuzzy terminologies: α-cut and strong α-cut
α-cut and strong α-cut :
The α-cut of a fuzzy set A is a crisp set

defined by
Aα = { x | µA (x) ≥ α }
Strong α-cut is defined similarly :
Aα ’ = { x | µA (x) > α }
Note : Support(A) = A0’ and Core(A) = A1.

Fuzzy terminologies: Convexity
Convexity : A fuzzy set A is convex if and only if for any x1 and
x2 ∈ X
and any λ ∈ [0, 1]
µA (λx1 + (1 -λ)x2) ≥ min(µA(x1), µA(x2))
Note :
• A is convex if all its α- level sets are convex.
• Convexity (Aα)function
Membership
=⇒ Aα is composedNon-convex
of a single line
segment only.
is convex Membership
function
1. 1.
0 0
Fuzzy terminologies: Bandwidth
Bandwidth :
For a normal and convex fuzzy set, the bandwidth (or width)
is defined as the distance the two unique crossover points:
Bandwidth(A) = | x1 - x2 |
where µA(x1) = µA(x2) = 0.5
Fuzzy terminologies: Symmetry
Symmetry :
A fuzzy set A is symmetric if its membership function around

a certain point x = c, namely µA(x + c) = µA(x - c) for all x ∈
X.
Fuzzy terminologies: Open and Closed
A fuzzy set A is
Open left
If limx →−∞ µA (x) = 1 and limx → +∞ µA (x)
=0
Open right:
If limx →−∞µA(x) = 0 and limx → + ∞ µA(x) =
1
Closed
If : limx →−∞ µA (x) = limx → +∞ µA (x) = 0 Open right
Open left Closed
Fuzzy vs. Probability
Fuzzy : When we say about certainty of a thing
Example: A patient come to the doctor and he has to

diagnose so that medicine can be prescribed.
Doctor prescribed a medicine with certainty 60% that the
patient is suffering from flue. So, the disease will be cured
with certainty of 60% and uncertainty 40%. Here, in stead of
flue, other diseases with some other certainties may be.
Probability: When we say about the chance of an event to

occur
Example: India will win the T20 tournament with a chance

60% means that out of 100 matches, India own 60 matches.
Prediction vs. Forecasting
The Fuzzy vs. Probability is analogical to Prediction vs.

Forecasting
Prediction : When you start guessing about things.
Forecasting : When you take the information from the past

job and apply it to new job.
The main difference:

Prediction is based on the best guess from experiences.
Forecasting is based on data you have actually recorded and
packed from previous job.
Fuzzy Membership
Functions
Fuzzy membership functions
A fuzzy set is completely characterized by its membership
function (sometimes abbreviated as MF and denoted as
µ ). So, it would be important to learn how a membership
function can be expressed (mathematically or otherwise).
Note: A membership function can be on
a) a discrete universe of discourse and
b) a continuous universe of discourse.
Example:
1.0 1.0
0.8 0.8
0.6 0.6
µA
µB
0.4 0.4
0.2 0.2
0 10 20 30 40 50
0 2 4 6 8 10
60
Number of children (X) Age (X)
A = Fuzzy set of “Happy family” B = “Young age”

Fuzzy membership functions
So, membership function on a discrete universe of course is
trivial. However, a membership function on a continuous
universe of discourse needs a special attention.
Following figures shows a typical examples of membership
functions.
µ
µ
x x x
< triangular > < trapezoidal > < curve >

µ
x x
< non-uniform > < non-uniform >

Fuzzy MFs : Formulation and parameterization
In the following, we try to parameterize the different MFs
on a continuous universe of discourse.
Triangular MFs : A triangular MF is specified by three
parameters
{ a, b, c} and can be formulated  as follows.
0 if x ≤ a
 x −a
b−a if a ≤ x ≤
triangle (x ;a ,b ,c ) c−x
(1
=  c−b b if b ≤ x )

0 ≤ c if c ≤
x
1.0
a b c
Fuzzy MFs: Trapezoidal
A trapezoidal MF is specified by four parameters {a, b, c,
d } and can be defined as follows:

1 if x ≤ a
 x −a if a ≤ x
≤b
 b−a
trapeziod (x ; a, b, c, (2
2 if b ≤ x ≤ c
d) = d )
 d−x
−c if c ≤ x ≤ d
0 if d ≤ x
1.0
a c d
b
Fuzzy MFs: Gaussian
A Gaussian MF is specified by two parameters {c, σ}

and can be defined as below:
—12 ( x − c 2
gaussian(x;c,σ) ) .
σ
=e
c
0.
0.1 0.9
1 c c

Fuzzy MFs: Generalized bell
It is also called Cauchy MF. A generalized bell MF is specified

by three parameters {a, b, c} and is defined as:
bell(x; a, b, c)= 1
1+| x −a c |
2b
b
Slope at x
= 2a
b
Slope at y = 
x b y
2a
c- c c+
a a
Example: Generalized bell MFs
1
Example: 1+ 2
;
µ(x)=
a = b = 1 and cx =
0;
1.0
-1 0 1
Generalized bell MFs: Different shapes
Changing Changing
a b
Changing
a Changing a and
b
Fuzzy MFs: Sigmoidal MFs
Parameters: {a, c } ; where c = crossover point and a =

slope at c;
1
Sigmoid(x;a,c a
− [ x− c ]
)= 1+e
1.
0
Slope =
0.
a 5
c
Fuzzy MFs : Example
Example : Consider the following grading system for a

course.
Excellent = Marks ≤ 90
Very good = 75 ≤ Marks ≤ 90 Good = 60 ≤ Marks ≤
75 Average = 50 ≤ Marks ≤ 60 Poor = 35 ≤ Marks ≤
50 Bad= Marks ≤ 35
Grading System
A fuzzy implementation will look like the

following.
Bad poo Averag Goo Very Excellen

r e d Good t
1
.8
.6
.4
.2
0
1 2 3 6
40 70 80 90
0 0 0 0
50
marks
You can decide a standard fuzzy MF for each of the fuzzy
garde.
Operations on Fuzzy Sets
Basic fuzzy set operations: Union
Union (A ∪ B):
µA∪B (x ) = max{ µA (x ), µB
(x )}
Example:
A = {(x1 , 0.5), (x2, 0.1), (x3, 0.4)} and
B = {(x 1 , 0.2), (x2, 0.3), (x3, 0.5)};
C = A ∪ B = { (x1 , 0.5), (x2 , 0.3), (x3 , 0.5)}
µB µB
µA
µ
µAU
µA B
b q c b q c
a p a p x
x
Basic fuzzy set operations: Intersection
Intersection (A ∩ B):
µA∩B (x ) = min{ µA (x ), µB (x
)}
Example:
A = {(x1 , 0.5), (x2, 0.1), (x3, 0.4)} and
B = {(x 1 , 0.2), (x2, 0.3), (x3, 0.5)};
C = A ∩ B = { (x1 , 0.2), (x2 , 0.1), (x3 , 0.4)}
µA
µB µAᴖ
µ B
x b q c p b q c
a p a
x
Basic fuzzy set operations: Complement
Complement (AC ):
µAAC (x ) = 1-µA (x
)
Example:
A = {(x1 , 0.5), (x2, 0.1), (x3, 0.4)}
C = AC = {(x
µA 1 , 0.5), (x2, 0.9), (x3, 0.6)}
µA
1.0
µA’
µ
q p x q
p
x
Basic fuzzy set operations: Products
Algebric product or Vector product (A•B):
µA•B (x ) = µA (x ) • µB
(x )
Scalar product (α × A):
µαA (x ) = α · µA (x )
Basic fuzzy set operations: Sum and Difference
Sum (A + B):
µA+B (x ) = µA (x ) + µB (x ) − µA (x ) · µB (x )
Difference (A − B = A ∩ BC ):
µA−B (x ) = µA∩B C (x )
Disjunctive sum: A ⊕ B = (AC ∩ B) ∪ (A ∩ BC ))
Bounded Sum: | A(x) ⊕ B(x) |
µ|A( x ) ⊕B( x ) | = min{ 1, µA (x ) + µB (x )}
Bounded Difference: | A(x) g B(x) |
µ|A( x ) g B( x ) | = max{ 0, µA (x ) + µB (x ) − 1}
Basic fuzzy set operations: Equality and Power
Equality (A = B):
µA (x ) = µB (x )
Power of a fuzzy set Aα :
µAα (x ) =
{ µA (x )} α
If α < 1, then it is called dilation

If α > 1, then it is called
concentration
Basic fuzzy set operations: Cartesian product
Caretsian Product (A × B):
µA×B (x, y ) = min{ µA (x ), µB

(y )
Example 3:
A(x) = {(x 1 , 0.2), (x2, 0.3), (x3, 0.5), (x4, 0.6)}
B(y) = {(y1 , 0.8), (y2, 0.6), (y3, 0.3)} y1 y3
 y2
x1 0.2 0.2 0.2
x2
  0.3 0.3 
A × B = min{ µA (x ), µB 
0.3
x
3 .5 0.5 
(y )} = 0 0.3
x4 0.6
0.6 0.3
Properties of fuzzy sets
Commutativity :
A∪B = B∪A A∩B =

B∩A
Associativity :
A ∪ (B ∪ C) = (A ∪ B) ∪ C
A ∩ (B ∩ C) = (A ∩ B) ∩
C
Distributivity :
A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪
C)
A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩
C)
Properties of fuzzy sets
Idempotence :
A ∪ A =
A
A ∩ A =
∅A ∪ ∅
= A A∩
Transitivity : ∅= ∅
If A ⊆ B, B ⊆ C then A ⊆
C
Involution :
(Ac ) c = A
De Morgan’s law :
(A ∩ B) c = Ac ∪ Bc
(A ∪ B) c = Ac ∩ Bc
Few Illustrations on Fuzzy
Sets
Example 1: Fuzzy Set Operations
Let A and B are two fuzzy sets defined over a universe of

discourse X with membership functions µA(x ) and µB (x ),
respectively. Two MFs µA(x ) and µB (x ) are shown graphically.
µA(x)
a1 a2 µB(x) b1 a1 =b2 a2 =b3 a4

x
a3
a4 x
Example 1: Plotting two sets on the same graph
Let’s plot the two membership functions on the

same graph
µB µA
µ
b1
a2 b 4 a3 a 4
a1 x
Example 1: Union and Intersection
The plots of union A ∪ B and intersection A ∩ B are
shown in the following.
µB µA
b1
a1
a2 b4 a3 a4 x
A B ( )x
a2 b4 A B ( )x
x b1 a1 a2 a3 a4
x
Example 1: Intersection
The plots of union µA¯ (x ) of the fuzzy set A is shown in the

following.
A ( )x
 ( )x
A
a b a b
x x
Fuzzy set operations: Practice
Consider the following two fuzzy sets A and B defined over a

universe of discourse [0,5] of real numbers with their
membership functions
µA (x ) = x
+x
1 and µB (x ) =
2−x
Determine the membership functions of the following and
draw them graphically.
i. A , B
ii. A ∪ B
iii. A ∩ B
iv. (A ∪ B) c [Hint: Use De’ Morgan law]
Example 2: A real-life example
Two fuzzy sets A and B with membership functions µA(x ) and

µB (x ), respectively defined as below.
A = Cold climate with µA (x ) as the MF.
B = Hot climate with µB (x ) as the M.F.
µA µB
1.0
0.5
µ
-15 -10 -5 0 5 10 15 20 25 30 35 40 45 50
Here, X being the universe of discourse representing entire

range of temperatures.
Example 2: A real-life example
What are the fuzzy sets representing the
following?
1 Not cold climate

2 Not hold climate
3 Extreme climate
4 Pleasant climate
Note: Note that ”Not cold climate” /= ”Hot climate” and

vice-versa.
Example 2 : A real-life example
Answer would be the following.
1 Not cold climate
A with 1 − µA (x ) as the MF.

2 Not hot climate
B with 1 − µB (x ) as the MF.
3 Extreme climate
A ∪ B with µA∪B (x ) = max(µA(x ), µB (x )) as the MF.
4 Pleasant climate
A ∩ B with µA∩B (x ) = min(µA(x ), µB (x )) as the MF.
The plot of the MFs of A ∪ B and A ∩ B are shown in the
following. Pleasant
Extreme
1.0 climate
µA µB
climate
1.0
A B
A B
0.
5
µ
- -
15 10 -5 0 5 10 15 20 25 30 35 40 45 50
5 15 25 5 25
x
x x
Few More on Membership
Functions
Generation of MFs
Given a membership function of a fuzzy set representing a
linguistic hedge, we can derive many more MFs representing
several other linguistic hedges using the concept of
Concentration and Dilation.
Concentration:
Ak = [µA (x )]k ; k > 1
Dilation:
Ak = [µA (x )]k ; k < 1
Example : Age = { Young, Middle-aged, Old }

Thus, corresponding to Young, we have : Not young, Very
young, Not very young and so on.
Similarly, with Old we can have : old, very old, very very old,
extremely old etc.
Thus, Extremely old = (((old ) 2 ) 2 ) 2 and so on
Or, More or less old = A0.5 = (old ) 0.5
Linguistic variables and values
Young Middle- Old

Aged
Very
Old

Very young
0 30 60 100
X=
Age
µyoung (x ) = bell(x, 20, 2, 0)1+(

= 21x )4
0
1
µold (x ) = bell(x, 30, 3, x 100
1+( −30 ) 6
100) =
µmiddle−aged = bell(x, 30, 60, 50)
Not young = µyoung (x ) = 1 − µyoung (x )

Young but not too young = µyoung (x ) ∩
µyoung (x )
Any questions??
Unit II : Artificial Neural Network -I 6L
1. Biological neuron, Artificial neuron model, concept of bias and
threshold ,
2. Activation functions,
3. McCulloch‐ Pits Neuron Model ,
4. learning paradigms: supervised, unsupervised, reinforcement,
5. Linear neuron model : concept of error energy ,
6. gradient descent algorithm and application of linear neuron for
linear regression,
7.Learning mechanisms: Hebbian, Delta Rule,
8. Perceptron and its limitations
Books Referred for unit IV are:
Text Books:
1. Fundamentals of Neural Networks: Architectures, Algorithms

and applications, Laurene Fausett, Pearson Education, Inc,
2008.
2. Principles of Soft Computing , S. N. Sivanandam, S. N.

Deepa, John Wiley & Sons, 2007.
Practical Sessions: (Use MATLAB / OCTAVE/ SCILAB /any
appropriate open source software.) (any 8 experiments)
1. Design & implementation of different Activation Function.
2. Implement simple logic network using MP neuron model
3. Implement a simple linear regressor with a single neuron

model
Introduction
• BRAIN COMPUTATION
- A highly complex non-linear and parallel computer with
structural constituents called “NEURONS”
- The human brain contains about 1011nerve cells, or neurons
- On an average, each neuron is connected to other neurons
through approximately 104 synapses
How is it then that the brain outperforms a digital computer?

ANN characteristics:
1. It is neutrally implemented mathematical model.
2. There exist a larger number of highly interconnected processing
elements called neurons in an ANN.
3. The interconnection with their weighted linkages holds the

informative knowledge.
4. The input signals arrive at the processing elements through
connection and connection weights.
5. The processing elements of the ANN have the ability to learn, recall
and generalize from the given data by suitable assignment or
adjustment of weights.
6. the computational power can be demonstrated only by the
collective behavior of neurons, and it should be noted that no single
neuron carries specific information.
Application of Neural Networks
• Photos and fingerprints could be recognized by imposing a fine grid
over the photo. Each square of the grid becomes an input to the NN.
• Lake water level could be predicted based upon precipitation patterns
and river/dam flows.
• Medical diagnosis is an ideal application for NN.
• Scheduling of buses, airplanes and elevators could be optimized by
predicting demand.
• Voice reorganization could be obtained by analyzing the audio
oscilloscope pattern, much like a stock market graph.
• Whether prediction may be possible. Inputs would include weather
reports from surrounding areas. Output(s) would be the future weather
in specific areas based on the input information. Effect such as ocean
currents and jet streams could be included.
• Air traffic control could be automated with the location, altitude,
direction and speed of each radar blip taken as input to the network.
The output would be the air traffic controllers instruction in response to
each blip
• Appraisal and valuation of property, buildings, automobiles, machinery
etc should be an easy task for a NN.
• Criminal sentencing could be predicted using a large sample of crime

details as input and the resulting sentences as output.
• Data mining, cleaning and validation could be achieved by determining

which records suspiciously diverge from the pattern of their peers.
• Employee hiring could be optimized if the NN were able to predict

which job applicant would show the best of job performance.
• Expert consultants could package their intuitive expertise into a NN to

analysis of past incidents.
• Fraud detection regarding credit cards, insurance or taxes could be

automated using a NN analysis of past incidents.
• Handwriting and typewriting could be recognized by imposing a grid
over the writing, and then each square of the grid becomes an input to
the NN. This is called “Optical Character Recognition”.
• Staff scheduling requirements for restaurants, retail stores, police

stations, banks etc. could be predicted based on the customer flow, day
week, paydays, holidays, weather, season etc.
• Traffic flows could be predicted so that signal timing could be

optimized. The NN could recognize “a weekday morning rush hour
during a school holiday” or “a typical winter Sunday morning”.
Advantages & Disadvantages of ANN:
Advantages:
1. A NN can perform tasks that a linear program cant.

2. When an element of the NN fails, it can continue without any
problem by their parallel nature.
3. A NN learns and does not need to be reprogrammed.
4. It can be implemented in any application.
5. It can be implemented without any problem.
Disadvantages:
1. The NN needs training to operate.

2. The architecture of a NN is different from the architecture of
microprocessors therefore needs to be emulated.
3. Requires high processing time for large NNs.
Biological Neuron
12
Cntd…
• ANN resembles brain in two aspects:

1. Knowledge is acquired by the network through a
learning process,
2. Inter-neuron connection strengths known as synaptic
weights are used to store the knowledge.
• An artificial neuron is characterized by:

1. Architecture (connection between neurons)
2. Training or learning (determining weights on the
connections)
3. Activation Function
Biological Neuron
• Cell structures
• Soma
• Dendrites
•Axon
• Synaptic terminals
(Synapses)
14
A Typical Nerve Cell
• Dendrite: receives the signals from
other neurons (Accept inputs)
• Soma: sums all the incoming signals.

(Process the inputs)
• Axon: when a particular amount of

input is received, then the cell fires. It
transmits signal through axon to other
cells. (Turn the processed inputs into
outputs)
• Synapses: It is biochemical device,

which converts a pre-synaptic electrical
signal into a chemical signal and then
back into a post-synaptic electrical
signal.(The electrochemical contact
between neurons)
Operation:
• Strength of synaptic connections decide the net
signal coming to the cell body.
• Free parameters refer to the strength of the
synapse.
• Every input is associated with synaptic strengths.
• Assuming some a-priori synaptic strength a cell

computes the response.
• If the response differs from the desired value, the
synaptic strengths are adjusted.
• This process may take many iterations.
Model of a Neuron
Three basic elements of the neuronal model:
• A set of synapses or connecting links, each of which is
characterized by a weight or strength of its own
- Specifically, a signal xj at the input of synapse j connected to
neuron k is multiplied by the synaptic weight W . k, j
- Unlike a synapse in the brain, the synaptic weight of an

artificial neuron may lie in a range that includes negative as well
as positive values
• An adder for summing the input signals, weighted by the

respective synapses of the neuron; the operations described
here constitute a linear combiner
• An activation function for limiting the amplitude of the output

of a neuron
Electrical Model of a Neuron
m
y _ ink  bk   xi .wki bk  x1.wk1  x2 .wk 2  ...  xm .wkm
i 1
yk  f ( y _ ink )
Effect of bias
• The neuronal model also includes
an externally applied bias, denoted
by bk
• The bias bk has the effect of
increasing or lowering the net input
of the activation function, depending
on whether it is positive or negative,
respectively
•The use of bias bk , has the effect of
applying an affine transformation to the
output y_ink of the linear combiner in
the model
• An affine transformation is any
transformation that preserves
collinearity (i.e., all points lying on a line
initially still lie on a line after
transformation)
•Depending on whether the bias bk is positive or negative, the
relationship between the induced local field or activation potential
y_ink of neuron k and the linear combiner output uk is modified
Another nonlinear model of a neuron;
wk0 accounts for the bias bk.
Artificial Neuron Model
m
y _ ink   xi .wki  (1).wk 0  x1.wk 1  x2 .wk 2  ...  xm .wkm
i 0
yk  f ( y _ ink )
ANN Topologies
ANN Architectures
1. single-layer feed-forward network;

2. multilayer feed-forward network;
3. single node with its own feedback;
4. single-layer recurrent network;
5. multilayer recurrent network.
6. Competitive network
ANN Architectures
1. single-layer feed-forward
network;
2. multilayer feed-forward network or Fully connected feed-
forward network with one hidden layer and one output layer
4. single-layer recurrent network or Recurrent network
with no self-feedback loops and no hidden neurons
n
Si  W ji X j
j 1
5. multilayer recurrent network or Recurrent network
with hidden neurons
6. Competitive network
Learning
• The main property of an ANN is its capability to learn
1. Parameter Learning:
It updates the connecting weights in a neural net
2. Structure Learning:
It focuses on the change in network structure
Learning Paradigms
LEARNING WITH A TEACHER (SUPERVISED or ASSOCIATIVE LEARNING )
x (input) Neural Y (actual output)

Network
Error (D-Y)
signals Error signal
D
generator (Desired Output)
Learning Paradigms
LEARNING WITH A TEACHER (SUPERVISED or ASSOCIATIVE LEARNING )
• In this the network is trained by providing with input and matching

output patterns.
• These input-output pairs can be provided by an external teacher or by
the system which contains the NN(self-supervised)
LEARNING WITHOUT A TEACHER
• There is no teacher to oversee the learning

process
• Hence there are no labeled examples of the
function to be learned by the network
• Two sub types:

- Reinforcement learning / Neurodynamic
programming
- Unsupervised or self-organized learning
Unsupervised Learning
ANN
X W Y
(input) (Actual output)
Reinforcement learning
Neural
Network
W
x (input) Y (Actual output)
Error signal
Error (D-Y) generator
signals R
(Reinforcement
signal)
Reinforcement learning
• Learning of an input-output mapping is performed through

continued interaction with the environment in order to minimize a
scalar index of performance.
• Built around a critic that converts a primary reinforcement signal

received from the environment into a higher quality reinforcement
signal called the heuristic reinforcement signal, both of which are
scalar inputs.
• The system observes a temporal sequence of stimuli (i.e., state

vectors) also received from the environment, which eventually
result in the generation of the heuristic reinforcement signal.
• The goal of learning is to minimize a cost-to-go function, defined as

the expectation of the cumulative cost of actions taken over a
sequence of steps instead of simply the immediate cost.
Activation Functions
• To map non-linear output functions we require non-
linear activation functions
• The activation functions should be Continuous
• Monotonically increasing
• Differentiable
Monotonicity
• Monotonically increasing function
• Monotonically decreasing function
• Non Monotonic function

ACTIVATION FUNCTION
 ACTIVATION LEVEL – DISCRETE OR CONTINUOUS
* HARD LIMIT FUCNTION (DISCRETE)
• Binary Activation function

• Bipolar activation function
• Identity function
* SIGMOIDAL ACTIVATION FUNCTION (CONTINUOUS)

• Binary Sigmoidal activation function
• Bipolar Sigmoidal activation function
Activation Functions- Identity function
or Linear Function or Lazy man operator
f(x) = x for all x

yk  y _ ink
Activation Functions- Threshold Function or
Binary Step Function
ify _ ink   , yk  1 f(x) = { 1 if x>= theta

otherwise, yk  0 0 if x<theta
Activation Functions- Signum Function or
Bipolar Step Function
fy _ ink   , yk  1 f(x) = { 1 if x>= theta

otherwise, yk  1 -1 if x<theta
Activation Functions- Binary Sigmoidal Function
or Logistic Sigmoidal Function or Unipolar
Sigmoidal Function
f(x) = 1 /(1+e^-sigmax)
Sigma is the steepness parameter. Range is 0 to 1
The derivative of f(x) is given by

f'(x) =sigma.f(x)[l- f(x)]
Activation Functions- Bipolar Sigmoidal Function
The function is defined as:
f(x) = (1-e^-sigma.x) /(1+e^-sigma.x)
Sigma is the steepness parameter. Range is 0-1to 1
The derivative of f(x) is given by

f'(x) = (sigma/2).[1 + f(x)][l - f(x)]
The bipolar sigmoidal function is closely related ro hyperbolic tangent

function, which is written as
f(x) = (e^x-e^-x)/ (e^x+e^-x)/ = (1-e^-sigma.x) /(1+e^-sigma.x)
The derivative of hyperbolic tangent function is-

h'(x) =[I + h(x)][l- h(x)]
Activation Functions- Ramp Function
•The ramp function is defined as –
f(x) = 1, if X > 1
x, if 0 <= x <= 1
0, if x< 0
ACTIVATION FUNCTION
Activation functions:
(A) Identity
(B) Binary step
(C) Bipolar step
(D) Binary sigmoidal
(E) Bipolar sigmoidal
(F) Ramp
IMP terminologies in ANN
Weights:
• each neuron is connected to other neurons by means of directed
communication link, associated with weights.
• It contains information about input signal
Threshold:
• It is a set value based upon which the final output of the network may
be calculated.
• It is used in the activation function
• A comparison is made between the calculated net input and the
threshold to obtain the network output.
• For each and every application, there is a threshold limit.
1ifnet  
f (net )  
1ifnet  
Weight matrix or weight connection matrix:
Consider a recurrent network with 3 neurons. The network weights at

the current time instant are given by the following symmetric matrix:
W = w11 w12 … w13 = 0 -0.4 0.2
w21 w22 … W23 -0.4 0 0.5
w31 w32 … w33 0.2 0.5 0
Learning rate:
• The learning rate is denoted by "a." It is used to ,control the amount of
weight adjustment of at each step of training.
• The learning rate, ranging from 0 to 1, determines the rate of learning
at each time step.
Bias:
The bias included in the network has its impact in calculating the net
input. The bias is included by adding a component x0=1 to the input
vector x. Thus, the input vector becomes,
X=(1, x1,x2,…,xn) c(Bias)
weight m
input X Y y = mx+c
n
yinj   xi wij  x0 w0 j x1w1 j  x2 w2 j  ...  xn wnj
i 0
n
 w0 j   xi wij
i 1
n
 b j   xi wij
i 1
Advantages of bias over threshold
• Using bias can remove the dependency on

threshold
• We can modify the bias (more adaptable) but
threshold is fixed
• Bias helps in simplifying the separation
boundaries
• Computational flexibility and ease is more if we
use bias as thresholds may be different for
different output neurons
• Hence use of bias is more preferable
McCULLOCH–PITTS NEURON-MP
 Neurons are sparsely and randomly connected
 Firing state is binary (1 = firing, 0 = not firing)
 All but one neuron are excitatory (tend to increase voltage of other cells)
• One inhibitory neuron connects to all other neurons

• It functions to regulate network activity (prevent too many firings)
Mc Culloch-Pitts Neuron
• McCulloch-Pitts
neuron Y may
receive signals from
any number of
other neurons
• Each connection
path is either
excitatory, with
weight w > 0, or
inhibitory, with
weight - p (p > 0)
Cntd…
• The condition that inhibition is absolute requires that theta
for the activation function satisfy the inequality
  nw  p
• Y will fire if it receives k or more excitatory inputs and no
inhibitory inputs, where
kw    (k  1) w
• Although all excitatory weights coming into any particular unit

must be the same, the weights coming into one unit, say Y1 ,
do not have to be the same as the weights coming into
another unit, say Y2
LINEAR SEPARABILITY
 Linear separability is the concept wherein the separation of the

input space into regions is based on whether the network response
is positive or negative.
 Consider a network having

positive response in the first
quadrant and negative response
in all other quadrants (AND
function) with either binary or
bipolar data, then the decision
line is drawn separating the
positive response region from
the negative response region.
Cntd…
• The output is
n
yin  b   xi wi
i 1
• There exists a boundary between the regions where yin>0 & yin<0.
This region may be called as decision boundary & can be determined
by the relation n
b   xi wi  0
i 1
• If there exists weights with bias for which the training input vectors
having +ve response +1, lie on one side of the decision boundary & all
the other vectors having –ve response -1, lie on the other side of the
decision boundary then we can conclude the problem is “linearly
separable”.
yin  b  x1w1  x2 w2
b  x1w1  x2 w2  0
Cntd…
• If w2 is not equal to 0 then w1 b
x2   x1 
w2 w2
• The requirement of +ve response if the net us

b + x1w1 + x2w2 > 0
If threshold value is used
Net input receive > theta (threshold)
yin > Theta
x1w1 + x2w2 > Theta
The separating line equation will
x1w1  x2 w2  
w1 
x2   x1  ( withw2  0)
w2 w2
AND Function using MP
model
AND-NOT Function using
MP model
AND-NOT Function using MP model
XOR Function using MP model
XOR Function using
MP model
LINEAR SEPARABILITY: OR Function
LINEAR SEPARABILITY: OR Function
Linear neuron model : application of linear
neuron for linear regression,
Linear Regression:
• In statistics, linear regression is an
approach to modeling the relationship
between a scalar variable y and one or
more explanatory variables denoted X.
• Once we know the equation of this
fitted line we can use it to find y given
X.
• Say , we have conducted an
experiment that gives us some output
value for a set of inputs.
• There are various conventional
methods to do this ,e.g. Ordinary least
squares.
• The best fit line has equation :
Y = mX + c
• We have a choice of two parameters:
m and c
Cntd…
• A linear neuron can be used for this task
since output of a linear neuron is:
• Y2 = w20 (w0) + x1.w21(w1)
• If we keep x0 = 1, then w21 or w1 is slope
and w20 or w0 is the intercept, hence we
have to modify w21 or w1 and w20 or w0
to obtain the best fit line.
• If we make a good choice of w21 or w1 and
w20 or w0 our job is over!
• Many real world problems involve more
than one independent variables then the
regression is called multiple regression.
• For example, if we have two independent
variables , then it becomes a 2-D problem,
the new network would different.
• Now, we have to adjust two slopes and one
intercept.
• This can be extended to solve n-
dimensional problems.
Cntd…
• This can be extended to solve n- dimensional problems.
Concept of error energy
• We can have a combined error measure by squaring

and adding all the errors and dividing it by number
of observations giving MEAN SQUARE ERROR (MSE),
this in fact is the ERROR ENERGY
• This MSE shows how good or bad the line is fitted!

Cntd…
• Considering a 1-D problem ,error energy at a point p is :

• The total error energy is given by:
(Target output – calculated output)^2
• Why do we have to measure this error energy?
• In order to get the best fit the error energy should be minim
Gradient descent algorithm:
Hebb network
• “When an axon of cell A is near enough to excite cell B, & repeatedly or
permanently takes place in firing it, some growth process or metabolic
change takes place in one or both the cells such that A’s efficiency, as
one of the cells firing B, is increased”.
• The weight update in Hebb rule is given by

Wi(new) = Wi(old) + Xi*Y
• It is suited for bipolar data than binary data.

* Bipolar: equation will satisfy
* Binary: equation will not satisfy for two training pair (on, off) &
(off, off)
Hebb Network – Flowchart of training algorithm
• The training algorithm is used

for the calculation of
adjustment of weights.
• The flowchart of training

algorithm of Hebb’s network is
shown in fig.
• S:t refers for each training

inputs and target output pair.
Hebb Network – Training algorithm
Cntd…
Hebb Problem
Hebb Problem –
Hebb Problem –
Hebb Problem –
Hebb Problem –
Hebb Problem –
• The decision boundary differentiates • The decision boundary separates

only the first and fourth inputs and (1,1) from (1,-1) and (-1,-1) and not
not all negative responses are (-1,1).
separated from positive responses.
Hebb Problem –
• The decision boundary is

same for third and fourth
training pairs.
• The decision boundary line
is obtained from these
input training pair
separates the positive
response region to the
negative response region.
• Hence, the weights
obtained from this are the
final weights and are given
by –
w1 = 2, w2 = 2, b = -2
Hebb Problem –
Delta network
Delta network
Perceptron Networks
• It is single-layer feed-forward networks called simple perceptrons.
• It is designed by Rosenblatt in 1962 & Minsky-Papert (1969, 1988).
• Basically, discovered by Block in 1962.
• Mainly it consists of three units:

1. Sensory unit
2. Associator unit
3. Response unit
Sensory Associator Response

unit Unit Unit
Inputs xi desired o/p
PERCEPTRON NETWORKS
 Linear threshold unit (LTU)
x1 w1
w0
w2
x2  n
o
. 
. i=0
wi xi
. wn
n
xn 1 if  wi xi >0
f(xi)= { i=0
-1 otherwise
Flowchart
Training algorithm:
Problem
Problem
Problem
Problem
Thank you

Soft Computing Unit I

Uploaded by

Copyright:

Available Formats

Soft Computing Unit I

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Soft Computing Unit I

Uploaded by

Copyright:

Available Formats

Soft Computing

 Hard computing vs. Soft computing

 Control action should be unambiguous and accurate.

 Suitable for problem, which is easy to model mathematically.

 Control action is formally defined (i.e., with mathematical

Defi nition : Soft computing

• Algorithms are adaptive (i.e., it can adjust

Example: Hand written character recognition (Artificial Neural

Example: Money allocation problem (Evolutionary Computing)s

Example: Robot movement (Fuzzy Logic)

Figure: Concept of Hybrid Computing

Fuzzy logic is a mathematical language to express

There are some other mathematical languages also

• Relational algebra (operations on sets)

Fuzzy logic deals with Fuzzy set.

First time introduced by Lotfi Abdelli Zadeh (1965),

He is fondly nick-named as LAZ

Dictionary meaning of fuzzy is not clear, noisy etc.

Antonym of fuzzy is crisp

Fuzzy answer Absolutely

Here, All are the sets of finite numbers of individuals. Such a

Let us discuss about fuzzy set. X = All students in IT60108.

Crisp Set Fuzzy Set

Note: A crisp set is a fuzzy set, but, a fuzzy set is not

How to decide the degree of memberships of elements in a

DoM 0.95 0.90 0.80 0.01 0.65 0.75

How the cities of comfort can be judged?

Note: Degree of membership values lie in the range [0...1].

Definition 1: Membership function (and Fuzzy set)

The membership values may be of discrete

Either elements or their membership values (or both) also

Number of children (X)

Support: The support of a fuzzy set A is the set of all

Core: The core of a fuzzy set A is the set of all points x in X

core (A) = {x | µA(x) = 1}

Normality : A fuzzy set A is a normal if its core is non-empty.

Crossover point : A crossover point of a fuzzy set A is a point

Fuzzy Singleton : A fuzzy set whose support is a single

α-cut and strong α-cut :

The α-cut of a fuzzy set A is a crisp set

Strong α-cut is defined similarly :

Note : Support(A) = A0’ and Core(A) = A1.

µA (λx1 + (1 -λ)x2) ≥ min(µA(x1), µA(x2))

A fuzzy set A is symmetric if its membership function around

Fuzzy : When we say about certainty of a thing

Example: A patient come to the doctor and he has to

Probability: When we say about the chance of an event to

Example: India will win the T20 tournament with a chance

The Fuzzy vs. Probability is analogical to Prediction vs.

Prediction : When you start guessing about things.

Forecasting : When you take the information from the past

The main difference:

Number of children (X) Age (X)

A = Fuzzy set of “Happy family” B = “Young age”

< triangular > < trapezoidal > < curve >

< non-uniform > < non-uniform >

A Gaussian MF is specified by two parameters {c, σ}

It is also called Cauchy MF. A generalized bell MF is specified

Parameters: {a, c } ; where c = crossover point and a =