Fuzzy Book
Fuzzy Book
Fuzzy Book
Robert Fullér
Åbo 1995
Contents
0.1 Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1 Fuzzy Systems 8
1.1 An introduction to fuzzy logic . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2 Operations on fuzzy sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.3 Fuzzy relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3.1 The extension principle . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.3.2 Metrics for fuzzy numbers . . . . . . . . . . . . . . . . . . . . . . . 42
1.3.3 Fuzzy implications . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
1.3.4 Linguistic variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
1.4 The theory of approximate reasoning . . . . . . . . . . . . . . . . . . . . . 50
1.5 An introduction to fuzzy logic controllers . . . . . . . . . . . . . . . . . . . 66
1.5.1 Defuzzification methods . . . . . . . . . . . . . . . . . . . . . . . . 72
1.5.2 Inference mechanisms . . . . . . . . . . . . . . . . . . . . . . . . . . 75
1.5.3 Construction of data base and rule base of FLC . . . . . . . . . . . 80
1.5.4 Ball and beam problem . . . . . . . . . . . . . . . . . . . . . . . . . 85
1.6 Aggregation in fuzzy system modeling . . . . . . . . . . . . . . . . . . . . 88
1.6.1 Averaging operators . . . . . . . . . . . . . . . . . . . . . . . . . . 91
1.7 Fuzzy screening systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
1.8 Applications of fuzzy systems . . . . . . . . . . . . . . . . . . . . . . . . . 106
1
3.1.1 Fuzzy neurons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
3.2 Hybrid neural nets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
3.2.1 Computation of fuzzy logic inferences by hybrid neural net . . . . . 174
3.3 Trainable neural nets for fuzzy IF-THEN rules . . . . . . . . . . . . . . . . 180
3.3.1 Implementation of fuzzy rules by regular FNN of Type 2 . . . . . . 187
3.3.2 Implementation of fuzzy rules by regular FNN of Type 3 . . . . . . 190
3.4 Tuning fuzzy control parameters by neural nets . . . . . . . . . . . . . . . 194
3.5 Fuzzy rule extraction from numerical data . . . . . . . . . . . . . . . . . . 201
3.6 Neuro-fuzzy classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
3.7 FULLINS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210
3.8 Applications of fuzzy neural systems . . . . . . . . . . . . . . . . . . . . . 215
4 Appendix 230
4.1 Case study: A portfolio problem . . . . . . . . . . . . . . . . . . . . . . . . 230
4.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
2
0.1 Preface
This Lecture Notes containes the material of the course on Neural Fuzzy Systems delivered
by the author at Turku Center for Computer Science in 1995.
Fuzzy sets were introduced by Zadeh (1965) as a means of representing and manipulating
data that was not precise, but rather fuzzy. Fuzzy logic provides an inference morphology
that enables approximate human reasoning capabilities to be applied to knowledge-based
systems. The theory of fuzzy logic provides a mathematical strength to capture the uncer-
tainties associated with human cognitive processes, such as thinking and reasoning. The
conventional approaches to knowledge representation lack the means for representating
the meaning of fuzzy concepts. As a consequence, the approaches based on first order
logic and classical probablity theory do not provide an appropriate conceptual framework
for dealing with the representation of commonsense knowledge, since such knowledge is
by its nature both lexically imprecise and noncategorical.
The developement of fuzzy logic was motivated in large measure by the need for a con-
ceptual framework which can address the issue of uncertainty and lexical imprecision.
Some of the essential characteristics of fuzzy logic relate to the following [120].
There are two main characteristics of fuzzy systems that give them better performance
for specific applications.
• Fuzzy systems are suitable for uncertain or approximate reasoning, especially for
the system with a mathematical model that is difficult to derive.
• Fuzzy logic allows decision making with estimated values under incomplete or un-
certain information.
3
of the kind readily modeled on the von Neumann computer. For a variety of reasons,
the symbol-processing approach became the dominant theme in artifcial intelligence. The
1980s showed a rebirth in interest in neural computing: Hopfield (1985) provided the
mathematical foundation for understanding the dynamics of an important class of net-
works; Rumelhart and McClelland (1986) introduced the backpropagation learning algo-
rithm for complex, multi-layer networks and thereby provided an answer to one of the
most severe criticisms of the original perceptron work.
Perhaps the most important advantage of neural networks is their adaptivity. Neural
networks can automatically adjust their weights to optimize their behavior as pattern
recognizers, decision makers, system controllers, predictors, etc. Adaptivity allows the
neural network to perform well even when the environment or the system being con-
trolled varies over time. There are many control problems that can benefit from continual
nonlinear modeling and adaptation.
While fuzzy logic performs an inference mechanism under cognitive uncertainty, compu-
tational neural networks offer exciting advantages, such as learning, adaptation, fault-
tolerance, parallelism and generalization. A brief comparative study between fuzzy sys-
tems and neural networks in their operations in the context of knowledge acquisition,
uncertainty, reasoning and adaptation is presented in the following table [58]:
To enable a system to deal with cognitive uncertainties in a manner more like humans,
one may incorporate the concept of fuzzy logic into the neural networks. The resulting
hybrid system is called fuzzy neural, neural fuzzy, neuro-fuzzy or fuzzy-neuro network.
Neural networks are used to tune membership functions of fuzzy systems that are employed
as decision-making systems for controlling equipment. Although fuzzy logic can encode
expert knowledge directly using rules with linguistic labels, it usually takes a lot of time
to design and tune the membership functions which quantitatively define these linquistic
labels. Neural network learning techniques can automate this process and substantially
reduce development time and cost while improving performance.
4
In theory, neural networks, and fuzzy systems are equivalent in that they are convertible,
yet in practice each has its own advantages and disadvantages. For neural networks, the
knowledge is automatically acquired by the backpropagation algorithm, but the learning
process is relatively slow and analysis of the trained network is difficult (black box).
Neither is it possible to extract structural knowledge (rules) from the trained neural
network, nor can we integrate special information about the problem into the neural
network in order to simplify the learning procedure.
Fuzzy systems are more favorable in that their behavior can be explained based on fuzzy
rules and thus their performance can be adjusted by tuning the rules. But since, in general,
knowledge acquisition is difficult and also the universe of discourse of each input variable
needs to be divided into several intervals, applications of fuzzy systems are restricted to
the fields where expert knowledge is available and the number of input variables is small.
To overcome the problem of knowledge acquisition, neural networks are extended to au-
tomatically extract fuzzy rules from numerical data.
Cooperative approaches use neural networks to optimize certain parameters of an ordinary
fuzzy system, or to preprocess data and extract fuzzy (control) rules from data.
The basic processing elements of neural networks are called artificial neurons, or simply
neurons. The signal flow from of neuron inputs, xj , is considered to be unidirectionalas
indicated by arrows, as is a neuron’s output signal flow. Consider a simple neural net in
Figure 0.1. All signals and weights are real numbers. The input neurons do not change
the input signals so their output is the same as their input. The signal xi interacts with
the weight wi to produce the product pi = wi xi , i = 1, . . . , n. The input information pi
is aggregated, by addition, to produce the input
net = p1 + · · · + pn = w1 x1 + · · · + wn xn
to the neuron. The neuron uses its transfer function f , which could be a sigmoidal
function,
1
f (t) =
1 + e−t
to compute the output
y = f (net) = f (w1 x1 + · · · + wn xn ).
This simple neural net, which employs multiplication, addition, and sigmoidal f , will be
called as regular (or standard) neural net.
x1
w1
y = f(<w, x>)
f
wn
xn
Figure 0.1 A simple neural net.
If we employ other operations like a t-norm, or a t-conorm, to combine the incoming
data to a neuron we obtain what we call a hybrid neural net. These modifications lead
5
to a fuzzy neural architecture based on fuzzy arithmetic operations. A hybrid neural net
may not use multiplication, addition, or a sigmoidal function (because the results of these
operations are not necesserily are in the unit interval).
A hybrid neural net is a neural net with crisp signals and weights and crisp transfer
function. However, (i) we can combine xi and wi using a t-norm, t-conorm, or some other
continuous operation; (ii) we can aggregate the pi ’s with a t-norm, t-conorm, or any other
continuous function; (iii) f can be any continuous function from input to output.
We emphasize here that all inputs, outputs and the weights of a hybrid neural net are
real numbers taken from the unit interval [0, 1]. A processing element of a hybrid neural
net is called fuzzy neuron.
It is well-known that regular nets are universal approximators, i.e. they can approximate
any continuous function on a compact set to arbitrary accuracy. In a discrete fuzzy
expert system one inputs a discrete approximation to the fuzzy sets and obtains a discrete
approximation to the output fuzzy set. Usually discrete fuzzy expert systems and fuzzy
controllers are continuous mappings. Thus we can conclude that given a continuous fuzzy
expert system, or continuous fuzzy controller, there is a regular net that can uniformly
approximate it to any degree of accuracy on compact sets. The problem with this result
that it is non-constructive and does not tell you how to build the net.
Hybrid neural nets can be used to implement fuzzy IF-THEN rules in a constructive way.
Though hybrid neural nets can not use directly the standard error backpropagation algo-
rithm for learning, they can be trained by steepest descent methods to learn the parameters
of the membership functions representing the linguistic terms in the rules.
The direct fuzzification of conventional neural networks is to extend connection weigths
and/or inputs and/or fuzzy desired outputs (or targets) to fuzzy numbers. This extension
is summarized in Table 0.2.
Fuzzy neural networks of Type 1 are used in classification problem of a fuzzy input vector
to a crisp class. The networks of Type 2, 3 and 4 are used to implement fuzzy IF-THEN
rules. However, the last three types in Table 0.2 are unrealistic.
• In Type 5, outputs are always real numbers because both inputs and weights are
real numbers.
6
• In Type 6 and 7, the fuzzification of weights is not necessary because targets are
real numbers.
A regular fuzzy neural network is a neural network with fuzzy signals and/or fuzzy weights,
sigmoidal transfer function and all the operations are defined by Zadeh’s extension prin-
ciple. Consider a simple regular fuzzy neural net in Figure 0.2.
X1
W1
Y = f(W1 X1+ ... + W nXn )
Xn Wn
All signals and weights are fuzzy numbers. The input neurons do not change the input
signals so their output is the same as their input. The signal Xi interacts with the weight
Wi to produce the product Pi = Wi Xi , i = 1, . . . , n, where we use the extension principle
to compute Pi . The input information Pi is aggregated, by standard extended addition,
to produce the input
net = P1 + · · · + Pn = W1 X1 + · · · + Wn Xn
to the neuron. The neuron uses its transfer function f , which is a sigmoidal function, to
compute the output
Y = f (net) = f (W1 X1 + · · · + Wn Xn )
where f is a sigmoidal function and the membership function of the output fuzzy set Y
is computed by the extension principle.
The main disadvantage of regular fuzzy neural network that they are not universal ap-
proximators. Therefore we must abandon the extension principle if we are to obtain a
universal approximator.
A hybrid fuzzy neural network is a neural network with fuzzy signals and/or fuzzy weights.
However, (i) we can combine Xi and Wi using a t-norm, t-conorm, or some other con-
tinuous operation; (ii) we can aggregate the Pi ’s with a t-norm, t-conorm, or any other
continuous function; (iii) f can be any function from input to output.
Buckley and Hayashi [28] showed that hybrid fuzzy neural networks are universal approx-
imators, i.e. they can approximate any continuous fuzzy functions on a compact domain.
This Lecture Notes is organized in four Chapters. The First Chapter is dealing with
inference mechanisms in fuzzy expert systems. The Second Chapter provides a brief
description of learning rules of feedforward multi-layer supervised neural networks, and
Kohonen’s unsupervised learning algorithm for classification of input patterns. In the
Third Chapter we explain the basic principles of fuzzy neural hybrid systems. In the
Fourth Chapter we present some excercises for the Reader.
7
Chapter 1
Fuzzy Systems
χA : X → {0, 1}.
This mapping may be represented as a set of ordered pairs, with exactly one ordered pair
present for each element of X. The first element of the ordered pair is an element of the
set X, and the second element is an element of the set {0, 1}. The value zero is used to
represent non-membership, and the value one is used to represent membership. The truth
or falsity of the statement
”x is in A”
is determined by the ordered pair (x, χA (x)). The statement is true if the second element
of the ordered pair is 1, and the statement is false if it is 0.
Similarly, a fuzzy subset A of a set X can be defined as a set of ordered pairs, each with
the first element from X, and the second element from the interval [0, 1], with exactly
one ordered pair present for each element of X. This defines a mapping, µA , between
elements of the set X and values in the interval [0, 1]. The value zero is used to represent
complete non-membership, the value one is used to represent complete membership, and
values in between are used to represent intermediate degrees of membership. The set X
is referred to as the universe of discourse for the fuzzy subset A. Frequently, the mapping
µA is described as a function, the membership function of A. The degree to which the
statement
”x is in A”
is true is determined by finding the ordered pair (x, µA (x)). The degree of truth of the
statement is the second element of the ordered pair. It should be noted that the terms
membership function and fuzzy subset get used interchangeably.
8
Definition 1.1.1 [113] Let X be a nonempty set. A fuzzy set A in X is characterized by
its membership function
µA : X → [0, 1]
and µA (x) is interpreted as the degree of membership of element x in fuzzy set A for each
x ∈ X.
A = {(x, µA (x))|x ∈ X}
Frequently we will write simply A(x) instead of µA (x). The family of all fuzzy (sub)sets
in X is denoted by F(X). Fuzzy subsets of the real line are called fuzzy quantities.
If X = {x1 , . . . , xn } is a finite set and A is a fuzzy set in X then we often use the notation
A = µ1 /x1 + . . . + µn /xn
Example 1.1.1 Suppose we want to define the set of natural numbers ”close to 1”. This
can be expressed by
-2 -1 0 1 2 3 4
Figure 1.1 A discrete membership function for ”x is close to 1”.
Example 1.1.2 The membership function of the fuzzy set of real numbers ”close to 1”,
is can be defined as
A(t) = exp(−β(t − 1)2 )
where β is a positive real number.
9
Example 1.1.3 Assume someone wants to buy a cheap car. Cheap can be represented as
a fuzzy set on a universe of prices, and depends on his purse. For instance, from Fig. 1.3.
cheap is roughly interpreted as follows:
• Below 3000$ cars are considered as cheap, and prices make no real difference to
buyer’s eyes.
• Between 3000$ and 4500$, a variation in the price induces a weak preference in
favor of the cheapest car.
• Between 4500$ and 6000$, a small variation in the price induces a clear preference
in favor of the cheapest car.
Definition 1.1.3 (normal fuzzy set) A fuzzy subset A of a classical set X is called normal
if there exists an x ∈ X such that A(x) = 1. Otherwise A is subnormal.
Definition 1.1.4 (α-cut) An α-level set of a fuzzy set A of X is a non-fuzzy set denoted
by [A]α and is defined by
½
α {t ∈ X|A(t) ≥ α} if α > 0
[A] =
cl(suppA) if α = 0
10
in this case
{−1, 0, 1, 2, 3} if 0 ≤ α ≤ 0.3
α
[A] = {0, 1, 2} if 0.3 < α ≤ 0.6
{1} if 0.6 < α ≤ 1
Definition 1.1.5 (convex fuzzy set) A fuzzy set A of X is called convex if [A]α is a
convex subset of X ∀α ∈ [0, 1].
α − cut
α
In many situations people are only able to characterize numeric information imprecisely.
For example, people use terms such as, about 5000, near zero, or essentially bigger than
5000. These are examples of what are called fuzzy numbers. Using the theory of fuzzy
subsets we can represent these fuzzy numbers as fuzzy subsets of the set of real numbers.
More exactly,
Definition 1.1.6 (fuzzy number) A fuzzy number A is a fuzzy set of the real line with
a normal, (fuzzy) convex and continuous membership function of bounded support. The
family of fuzzy numbers will be denoted by F.
Definition 1.1.7 (quasi fuzzy number) A quasi fuzzy number A is a fuzzy set of the real
line with a normal, fuzzy convex and continuous membership function satisfying the limit
conditions
lim A(t) = 0, lim A(t) = 0.
t→∞ t→−∞
11
In other words, a1 (γ) denotes the left-hand side and a2 (γ) denotes the right-hand side of
the γ-cut. It is easy to see that
a1 : [0, 1] → IR
is monoton increasing and lower semicontinuous, and the right-hand side function
a2 : [0, 1] → IR
A
1 a1(γ)
γ
a2 (γ)
a1 (0) a2 (0)
If A is not a fuzzy number then there exists an γ ∈ [0, 1] such that [A]γ is not a convex
subset of IR.
Definition 1.1.8 (triangular fuzzy number) A fuzzy set A is called triangular fuzzy num-
ber with peak (or center) a, left width α > 0 and right width β > 0 if its membership
function has the following form
1 − (a − t)/α if a − α ≤ t ≤ a
A(t) = 1 − (t − a)/β if a ≤ t ≤ a + β
0 otherwise
12
and we use the notation A = (a, α, β). It can easily be verified that
[A]γ = [a − (1 − γ)α, a + (1 − γ)β], ∀γ ∈ [0, 1].
The support of A is (a − α, b + β).
a-α a a+β
Figure 1.7 Triangular fuzzy number.
Definition 1.1.9 (trapezoidal fuzzy number) A fuzzy set A is called trapezoidal fuzzy
number with tolerance interval [a, b], left width α and right width β if its membership
function has the following form
1 − (a − t)/α if a − α ≤ t ≤ a
1 if a ≤ t ≤ b
A(t) =
1 − (t − b)/β if a ≤ t ≤ b + β
0 otherwise
and we use the notation A = (a, b, α, β). It can easily be shown that
[A]γ = [a − (1 − γ)α, b + (1 − γ)β], ∀γ ∈ [0, 1].
The support of A is (a − α, b + β).
a-α a b b+β
Figure 1.8 Trapezoidal fuzzy number.
13
Definition 1.1.10 (LR-representation of fuzzy numbers) Any fuzzy number A ∈ F can
be described as à !
L a − t
if t ∈ [a − α, a]
α
1 if t ∈ [a, b]
A(t) = Ã !
t − b)
if t ∈ [b, b + β]
R
β
0 otherwise
where [a, b] is the peak or core of A,
are continuous and non-increasing shape functions with L(0) = R(0) = 1 and R(1) =
L(1) = 0. We call this fuzzy interval of LR-type and refer to it by
A = (a, b, α, β)LR
Definition 1.1.11 (quasi fuzzy number of type LR) Any quasi fuzzy number A ∈ F(IR)
can be described as à !
a − t
L if t ≤ a,
α
A(t) = 1 if t ∈ [a, b],
à !
t−b
R if t ≥ b,
β
where [a, b] is the peak or core of A,
are continuous and non-increasing shape functions with L(0) = R(0) = 1 and
14
Let A = (a, b, α, β)LR be a fuzzy number of type LR. If a = b then we use the notation
A = (a, α, β)LR
A = (a, b, α, β).
1-x
Definition 1.1.12 (subsethood) Let A and B are fuzzy subsets of a classical set X. We
say that A is a subset of B if A(t) ≤ B(t), ∀t ∈ X.
Definition 1.1.13 (equality of fuzzy sets) Let A and B are fuzzy subsets of a classical
set X. A and B are said to be equal, denoted A = B, if A ⊂ B and B ⊂ A. We note
that A = B if and only if A(x) = B(x) for x ∈ X.
Definition 1.1.14 (empty fuzzy set) The empty fuzzy subset of X is defined as the fuzzy
subset ∅ of X such that ∅(x) = 0 for each x ∈ X.
15
It is easy to see that ∅ ⊂ A holds for any fuzzy subset A of X.
Definition 1.1.15 The largest fuzzy set in X, called universal fuzzy set in X, denoted
by 1X , is defined by 1X (t) = 1, ∀t ∈ X.
1X
1
10 x
Figure 1.11 The graph of the universal fuzzy subset in X = [0, 10].
Definition 1.1.16 (Fuzzy point) Let A be a fuzzy number. If supp(A) = {x0 } then A is
called a fuzzy point and we use the notation A = x̄0 .
_
1 x0
x0
Figure 1.11a Fuzzy point.
Let A = x̄0 be a fuzzy point. It is easy to see that [A]γ = [x0 , x0 ] = {x0 }, ∀γ ∈ [0, 1].
Exercise 1.1.1 Let X = [0, 2] be the universe of discourse of fuzzy number A defined by
the membership function A(t) = 1 − t if t ∈ [0, 1] and A(t) = 0, otherwise. Interpret A
linguistically.
Exercise 1.1.2 Let A = (a, b, α, β)LR and A0 = (a0 , b0 , α0 , β 0 )LR be fuzzy numbers of type
LR. Give necessary and sufficient conditions for the subsethood of A in A0 .
Exercise 1.1.3 Let A = (a, α) be a symmetrical triangular fuzzy number. Calculate [A]γ
as a function of a and α.
16
Exercise 1.1.5 Let A = (a, b, α, β) be a trapezoidal fuzzy number. Calculate [A]γ as a
function of a, b, α and β.
Exercise 1.1.6 Let A = (a, b, α, β)LR be a fuzzy number of type LR. Calculate [A]γ as a
function of a, b, α, β, L and R.
17
1.2 Operations on fuzzy sets
In this section we extend the classical set theoretic operations from ordinary set theory
to fuzzy sets. We note that all those operations which are extensions of crisp concepts
reduce to their usual meaning when the fuzzy subsets have membership degrees that are
drawn from {0, 1}. For this reason, when extending operations to fuzzy sets we use the
same symbol as in set theory.
Let A and B are fuzzy subsets of a nonempty (crisp) set X.
A B
18
A B
A closely related pair of properties which hold in ordinary set theory are the law of excluded
middle
A ∨ ¬A = X
and the law of noncontradiction principle
A ∧ ¬A = ∅
It is clear that ¬1X = ∅ and ¬∅ = 1X , however, the laws of excluded middle and noncon-
tradiction are not satisfied in fuzzy logic.
Lemma 1.2.1 The law of excluded middle is not valid. Let A(t) = 1/2, ∀t ∈ IR, then it
is easy to see that
(¬A ∨ A)(t) = max{¬A(t), A(t)} = max{1 − 1/2, 1/2} = 1/2 6= 1
Lemma 1.2.2 The law of noncontradiction is not valid. Let A(t) = 1/2, ∀t ∈ IR, then
it is easy to see that
(¬A ∧ A)(t) = min{¬A(t), A(t)} = min{1 − 1/2, 1/2} = 1/2 6= 0
Triangular norms were introduced by Schweizer and Sklar [91] to model the distances in
probabilistic metric spaces. In fuzzy sets theory triangular norms are extensively used to
model logical connective and.
19
T (x, y) = T (y, x) (symmetricity)
All t-norms may be extended, through associativity, to n > 2 arguments. The t-norm
M IN is automatically extended and
P AN D(a1 , . . . , an ) = a1 a2 · · · an
X
n
LAN D(a1 , . . . an ) = max{ ai − n + 1, 0}
i=1
20
If T is a t-norm then the equality S(a, b) := 1 − T (1 − a, 1 − b) defines a t-conorm and
we say that S is derived from T .
Lemma 1.2.5 T (a, a) = a holds for any a ∈ [0, 1] if and only if T is the minimum norm.
Proof. If T (a, b) = M IN (a, b) then T (a, a) = a holds obviously. Suppose T (a, a) = a for
any a ∈ [0, 1], and a ≤ b ≤ 1. We can obtain the following expression using monotonicity
of T
a = T (a, a) ≤ T (a, b) ≤ min{a, b}.
From commutativity of T it follows that
These equations show that T (a, b) = min{a, b} for any a, b ∈ [0, 1].
21
Lemma 1.2.6 The distributive law of T on the max operator holds for any a, b, c ∈ [0, 1].
In general, the law of the excluded middle and the noncontradiction principle properties
are not satisfied by t-norms and t-conorms defining the intersection and union operations.
However, the L Ã ukasiewicz t-norm and t-conorm do satisfy these properties.
Lemma 1.2.7 If T (x, y) = LAN D(x, y) = max{x + y − 1, 0} then the law of noncontra-
diction is valid.
22
Proof. Let A be a fuzzy set in X. Then from the definition of t-norm-based intersection
we get
Lemma 1.2.8 If S(x, y) = LOR(x, y) = min{1, x + y} then the law of excluded middle
is valid.
Exercise 1.2.5 Show that if γ ≤ γ0 then HAN Dγ (x, y) ≥ HAN Dγ 0 (x, y) holds for all
x, y ∈ [0, 1], i.e. the family HAN Dγ is monoton decreasing.
23
1.3 Fuzzy relations
A classical relation can be considered as a set of tuples, where a tuple is an ordered pair.
A binary tuple is denoted by (u, v), an example of a ternary tuple is (u, v, w) and an
example of n-ary tuple is (x1 , . . . , xn ).
Definition 1.3.1 (classical n-ary relation) Let X1 , . . . , Xn be classical sets. The subsets
of the Cartesian product X1 × · · · × Xn are called n-ary relations. If X1 = · · · = Xn and
R ⊂ X n then R is called an n-ary relation in X.
Let R be a binary relation in IR. Then the characteristic function of R is defined as
(
1 if (u, v) ∈ R
χR (u, v) =
0 otherwise
Example 1.3.1 Let X be the domain of men {John, Charles, James} and Y the domain
of women {Diana, Rita, Eva}, then the relation ”married to” on X × Y is, for example
{(Charles, Diana), (John, Eva), (James, Rita) }
Example 1.3.2 Consider the following relation (u, v) ∈ R iff u ∈ [a, b] and v ∈ [0, c]:
(
1 if (u, v) ∈ [a, b] × [0, c]
χR (u, v) =
0 otherwise
a b
Figure 1.14 Graph of a crisp relation.
Let R be a binary relation in a classical set X. Then
24
Example 1.3.3 Consider the classical inequality relations on the real line IR. It is clear
that ≤ is reflexive, anti-symmetric and transitive, while < is anti-reflexive, anti-
symmetric and transitive.
Definition 1.3.9 (total order) R is a total order relation if it is partial order and (u, v) ∈
R or (v, u) ∈ R hold for any u and v.
Example 1.3.4 Let us consider the binary relation ”subset of”. It is clear that it is a
partial order relation. The relation ≤ on natural numbers is a total order relation.
Definition 1.3.10 (fuzzy relation) Let X and Y be nonempty sets. A fuzzy relation R
is a fuzzy subset of X × Y . In other words, R ∈ F(X × Y ). If X = Y then we say that
R is a binary fuzzy relation in X.
Let R be a binary fuzzy relation on IR. Then R(u, v) is interpreted as the degree of
membership of (u, v) in R.
Example 1.3.6 A simple example of a binary fuzzy relation on U = {1, 2, 3}, called
”approximately equal” can be defined as
R(1, 1) = R(2, 2) = R(3, 3) = 1, R(1, 2) = R(2, 1) = R(2, 3) = R(3, 2) = 0.8
R(1, 3) = R(3, 1) = 0.3
The membership function of R is given by
1 if u = v
R(u, v) = 0.8 if |u − v| = 1
0.3 if |u − v| = 2
In matrix notation it can be represented as
1 2 3
1 1 0.8 0.3
R=
2 0.8 1 0.8
3 0.3 0.8 1
25
Fuzzy relations are very important because they can describe interactions between vari-
ables. Let R and S be two binary fuzzy relations on X × Y .
Note that R : X × Y → [0, 1], i.e. the domain of R is the whole Cartesian product X × Y .
Example 1.3.7 Let us define two binary relations R = ”x is considerable smaller than
y” and G = ”x is very close to y”
y 1 y2 y3 y4 y1 y2 y3 y4
x 0.5 0.1 0.1 0.7 x1 0.4 0 0.9 0.6
R= 1 G=
x2 0 0.8 0 0 x2 0.9 0.4 0.5 0.7
x3 0.9 1 0.7 0.8 x3 0.3 0 0.8 0.5
26
similarly, the projection of R on Y , denoted by ΠY (R) , is defined as
ΠX (R)(x) = sup{R(x, y) | y ∈ Y }
ΠY (R)(y) = sup{R(x, y) | x ∈ X}
• x1 is assigned the highest membership degree from the tuples (x1 , y1 ), (x1 , y2 ),
(x1 , y3 ), (x1 , y4 ), i.e. ΠX (x1 ) = 0.7, which is the maximum of the first row.
• x2 is assigned the highest membership degree from the tuples (x2 , y1 ), (x2 , y2 ),
(x2 , y3 ), (x2 , y4 ), i.e. ΠX (x2 ) = 0.8, which is the maximum of the second row.
• x3 is assigned the highest membership degree from the tuples (x3 , y1 ), (x3 , y2 ),
(x3 , y3 ), (x3 , y4 ), i.e. ΠX (x3 ) = 1, which is the maximum of the third row.
It is clear that the Cartesian product of two fuzzy sets A ∈ F(X) and B ∈ F(Y ) is a
binary fuzzy relation in X × Y , i.e.
A × B ∈ F(X × Y ).
27
A
AxB
Definition 1.3.17 (t-conorm-based union) Let S be a t-conorm and let R and G be binary
fuzzy relations in X × Y . Their S-union is defined by
(R ∪ S)(u, v) = S(R(u, v), G(u, v)), (u, v) ∈ X × Y.
Definition 1.3.18 (sup-min composition) Let R ∈ F(X × Y ) and G ∈ F(Y × Z). The
sup-min composition of R and G, denoted by R ◦ G is defined as
(R ◦ S)(u, w) = sup min{R(u, v), S(v, w)}
v∈Y
28
It is clear that R ◦ G is a binary fuzzy relation in X × Z.
Example 1.3.9 Consider two fuzzy relations R = ”x is considerable smaller than y” and
G = ”y is very close to z”
z1 z2 z3
y1 y2 y3 y4
y1 0.4 0.9 0.3
x 0.5 0.1 0.1 0.7
R= 1 G=
y2 0 0.4 0
x2 0 0.8 0 0
y3 0.9 0.5 0.8
x3 0.9 1 0.7 0.8
y4 0.6 0.7 0.5
Then their sup − min composition is
z1 z2 z3
x1 0.6 0.8 0.5
R◦G=
x
2 0 0.4 0
x3 0.7 0.9 0.7
Formally,
z1 z2 z3
y1 y2 y3 y4 z1 z2 z3
y1 0.4 0.9 0.3
x1 0.5 0.1 0.1 0.7 x1 0.6 0.7 0.5
◦
y2
0 0.4 0 =
x 0 x 0 0.4 0
2 0 0.8 0 y 2
3 0.9 0.5 0.8
x3 0.9 1 0.7 0.8 x3 0.7 0.9 0.7
y4 0.6 0.7 0.5
i.e., the composition of R and G is nothing else, but the classical product of the matrices
R and G with the difference that instead of addition we use maximum and instead of
multiplication we use minimum operator. For example,
(R ◦ G)(x1 , z1 ) = max{0.5 ∧ 0.4, 0.1 ∧ 0, 0.1 ∧ 0.9, 0.7 ∧ 0.6} = 0.6
(R ◦ G)(x1 , z2 ) = max{0.5 ∧ 0.9, 0.1 ∧ 0.4, 0.1 ∧ 0.5, 0.7 ∧ 0.7} = 0.7
(R ◦ G)(x1 , z3 ) = max{0.5 ∧ 0.3, 0.1 ∧ 0, 0.1 ∧ 0.8, 0.7 ∧ 0.5} = 0.5
Definition 1.3.19 (sup-T composition) Let T be a t-norm and let R ∈ F(X × Y ) and
G ∈ F(Y × Z). The sup-T composition of R and G, denoted by R ◦ G is defined as
(R ◦ S)(u, w) = sup T (R(u, v), S(v, w))
v∈Y
Following Zadeh [115] we can define the sup-min composition of a fuzzy set and fuzzy
relation as follows
Definition 1.3.20 Let C ∈ F(X) and R ∈ F(X × Y ). The membership function of the
composition of a fuzzy set C and a fuzzy relation R is defined by
(C ◦ R)(y) = sup min{C(x), R(x, y)}, ∀y ∈ Y.
x∈X
29
The composition of a fuzzy set C and a fuzzy relation R can be considered as the shadow
of the relation R on the fuzzy set C.
C(x)
R(x,y')
(C o R)(y')
y'
R(x,y)
Y
Figure 1.16 Composition of a fuzzy number and a fuzzy relation.
In the above definition we can use any t-norm for modeling the compositional operator.
for all y ∈ Y .
For example, if P AN D(x, y) = xy is the product t-norm then the sup-T compositiopn of
a fuzzy set C and a fuzzy relation R is defined by
(C ◦ R)(y) = sup P AN D(C(x), R(x, y)) = sup C(x)R(x, y)
x∈X x∈X
for all y ∈ Y .
Example 1.3.10 Let A and B be fuzzy numbers and let R = A × B a fuzzy relation.
Observe the following property of composition
A ◦ R = A ◦ (A × B) = B, B ◦ R = B ◦ (A × B) = A.
This fact can be interpreted as: if A and B have relation A × B and then the composition
of A and A × B is exactly B, and then the composition of B and A × B is exactly A.
30
Example 1.3.11 Let C be a fuzzy set in the universe of discourse {1, 2, 3} and let R be
a binary fuzzy relation in {1, 2, 3}. Assume that C = 0.2/1 + 1/2 + 0.2/3 and
1 2 3
1 1 0.8 0.3
R=
2 0.8 1 0.8
3 0.3 0.8 1
Example 1.3.12 Let C be a fuzzy set in the universe of discourse [0, 1] and let R be a
binary fuzzy relation in [0, 1]. Assume that C(x) = x and R(x, y) = 1 − |x − y|. Using
the definition of sup-min composition (1.3.20) we get
1+y
(C ◦ R)(y) = sup min{x, 1 − |x − y|} =
x∈[0,1] 2
Example 1.3.13 Let C be a fuzzy set in the universe of discourse {1, 2, 3} and let R be
a binary fuzzy relation in {1, 2, 3}. Assume that C = 1/1 + 0.2/2 + 1/3 and
1 2 3
1 0.4 0.8 0.3
R=
2 0.8 0.4 0.8
3 0.3 0.8 0
31
1.3.1 The extension principle
In order to use fuzzy numbers and relations in any intellgent system we must be able to
perform arithmetic operations with these fuzzy quantities. In particular, we must be able
to to add, subtract, multiply and divide with fuzzy quantities. The process of doing these
operations is called fuzzy arithmetic.
We shall first introduce an important concept from fuzzy set theory called the extension
principle. We then use it to provide for these arithmetic operations on fuzzy numbers.
In general the extension principle pays a fundamental role in enabling us to extend any
point operations to operations involving fuzzy sets. In the following we define this prin-
ciple.
Definition 1.3.22 (extension principle) Assume X and Y are crisp sets and let f be a
mapping from X to Y ,
f: X →Y
such that for each x ∈ X, f (x) = y ∈ Y . Assume A is a fuzzy subset of X, using the
extension principle, we can define f (A) as a fuzzy subset of Y such that
½
supx∈f −1 (y) A(x) if f −1 (y) 6= ∅
f (A)(y) = (1.1)
0 otherwise
where f −1 (y) = {x ∈ X | f (x) = y}.
It should be noted that if f is strictly increasing (or strictly decreasing) then (1.1) turns
into ½
A(f −1 (y)) if y ∈ Range(f )
f (A)(y) =
0 otherwise
where Range(f ) = {y ∈ Y | ∃x ∈ X such that f (x) = y}.
f
f(A)
0 A
Figure 1.17 Extension of a monoton increasing function.
Example 1.3.14 Let f (x) = x2 and let A ∈ F be a symmetric triangular fuzzy number
with membership function
(
1 − |a − x|/α if |a − x| ≤ α
A(x) =
0 otherwise
32
Then using the extension principle we get
½ √
A( y) if y ≥ 0
f (A)(y) =
0 otherwise
that is ( √ √
1 − |a − y|/α if |a − y| ≤ α and y ≥ 0
f (A)(y) =
0 otherwise
1
A f(x) = x 2
f(A)
1
Figure 1.18 The quadratic image of a symmetric triangular fuzzy number.
Example 1.3.15 Let f (x) = 1/(1 + e−x ) be a sigmoidal function and let A be a fuzzy
number. Then from
(
ln(y/(1 − y)) if 0 ≤ y ≤ 1
f −1 (y) =
0 otherwise
it follows that (
A(ln(y/(1 − y))) if 0 ≤ y ≤ 1
f (A)(y) =
0 otherwise
Example 1.3.16 Let λ 6= 0 be a real number and let f (x) = λx be a linear function.
Suppose A ∈ F is a fuzzy number. Then using the extension principle we obtain
1
A 2A
a 2a
33
Figure 1.19 The fuzzy number λA for λ = 2.
-A A
34
Example 1.3.17 (extended addition) Let f : X × X → X be defined as
f (x1 , x2 ) = x1 + x2 ,
i.e. f is the addition operator. Suppose A1 and A2 are fuzzy subsets of X. Then using
the extension principle we get
f (A1 , A2 )(y) = sup min{A1 (x1 ), A2 (x2 )}
x1 +x2 =y
is not equal to the fuzzy number 0̄, where 0̄(t) = 1 if t = 0 and 0̄(t) = 0 otherwise.
A-A A
35
Example 1.3.20 (extended multiplication) Let f : X × X → X be defined as
f (x1 , x2 ) = x1 x2 ,
i.e. f is the multiplication operator. Suppose A1 and A2 are fuzzy subsets of X. Then
using the extension principle we get
f (x1 , x2 ) = x1 /x2 ,
i.e. f is the division operator. Suppose A1 and A2 are fuzzy subsets of X. Then using
the extension principle we get
Definition 1.3.24 Let X 6= ∅ and Y 6= ∅ be crisp sets and let f be a function from F(X)
to F(Y ). Then f is called a fuzzy function (or mapping) and we use the notation
f : F(X) → F(Y ).
It should be noted, however, that a fuzzy function is not necessarily defined by Zadeh’s
extension principle. It can be any function which maps a fuzzy set A ∈ F(X) into a fuzzy
set B := f (A) ∈ F(Y ).
Theorem 1.3.1 Let X 6= ∅ and Y 6= ∅ be crisp sets. Then every fuzzy mapping
f : F(X) → F(Y ) defined by the extension principle is monoton increasing.
Proof Let A, A0 ∈ F(X) such that A ⊂ A0 . Then using the definition of sup-min
extension principle we get
for all y ∈ Y .
36
Lemma 1.3.1 Let A, B ∈ F be fuzzy numbers and let f (A, B) = A + B be defined by
sup-min extension principle. Then f is monoton increasing.
Proof Let A, A0 , B, B 0 ∈ F such that A ⊂ A0 and B ⊂ B 0 . Then using the definition of
sup-min extension principle we get
(A + B)(z) = sup min{A(x), B(y)} ≤ sup min{A0 (x), B 0 (y)} = (A0 + B 0 )(z)
x+y=z x+y=z
Lemma 1.3.2 Let A, B ∈ F be fuzzy numbers, let λ1 , λ2 be real numbers and let
f (A, B) = λ1 A + λ2 B
Let A = (a1 , a2 , α1 , α2 )LR and B = (b1 , b2 , β1 , β2 )LR be fuzzy numbers of LR-type. Us-
ing the (sup-min) extension principle we can verify the following rules for addition and
subtraction of fuzzy numbers of LR-type.
A + B = (a1 + b1 , a2 + b2 , α1 + β1 , α2 + β2 )LR
A − B = (a1 − b2 , a2 − b1 , α1 + β1 , α2 + β2 )LR
furthermore, if λ ∈ IR is a real number then λA can be represented as
(
(λa1 , λa2 , α1 , α2 )LR if λ ≥ 0
λA =
(λa2 , λa1 , |λ|α2 , |λ|α1 )LR if λ < 0
A + B = (a + b, α1 + β1 , α2 + β2 )
A − B = (a − b, α1 + β2 , α2 + β1 )
and if A = (a, α) and B = (b, β) are fuzzy numbers of symmetrical triangular form then
A + B = (a + b, α + β)
A − B = (a − b, α + β)
λA = (λa, |λ|α).
The above results can be generalized to linear combinations of fuzzy numbers.
37
Lemma 1.3.3 Let Ai = (ai , αi ) be a fuzzy number of symmetrical triangular form and
let λi be a real number, i = 1, . . . , n. Then their linear combination
X
n
λi Ai := λ1 A1 + · · · + λn An
i=1
can be represented as
X
n
λi Ai = (λ1 a1 + · · · + λn an , |λ1 |α1 + · · · + |λn |αn )
i=1
A A+B B
a a+b b
Assume Ai = (ai , α), i = 1, . . . , n are fuzzy numbers of symmetrical triangular form and
λi ∈ [0, 1], such that λ1 + . . . + λn = 1. Then their convex linear combination can be
represented as
X
n
λi Ai = (λ1 a1 + · · · + λn an , λ1 α + · · · + λn α) = (λ1 a1 + · · · + λn an , α)
i=1
Let A and B be fuzzy numbers with [A]α = [a1 (α), a2 (α)] and [B]α = [b1 (α), b2 (α)]. Then
it can easily be shown that
[A + B]α = [a1 (α) + b1 (α), a2 (α) + b2 (α)]
[−A]α = [−a2 (α), −a1 (α)]
[A − B]α = [a1 (α) − b2 (α), a2 (α) − b1 (α)]
[λA]α = [λa1 (α), λa2 (α)], λ ≥ 0
[λA]α = [λa2 (α), λa1 (α)], λ < 0
for all α ∈ [0, 1], i.e. any α-level set of the extended sum of two fuzzy numbers is equal
to the sum of their α-level sets. The following two theorems show that this property is
valid for any continuous function.
Theorem 1.3.2 [87] Let f : X → X be a continuous function and let A be fuzzy numbers.
Then
[f (A)]α = f ([A]α )
where f (A) is defined by the extension principle (1.1) and
f ([A]α ) = {f (x) | x ∈ [A]α }.
38
If [A]α = [a1 (α), a2 (α)] and f is monoton increasing then from the above theorem we get
Let f (x, y) = xy and let [A]α = [a1 (α), a2 (α)] and [B]α = [b1 (α), b2 (α)] be two fuzzy
numbers. Applying Theorem 1.3.3 we get
holds if and only if A and B are both nonnegative, i.e. A(x) = B(x) = 0 for x ≤ 0.
If B is nonnegative then we have
[A]α [B]α = [min{a1 (α)b1 (α), a1 (α)b2 (α)}, max{a2 (α)b1 (α), a2 (α)b2 (α)]
In general case we obtain a very complicated expression for the α level sets of the product
AB
[A]α [B]α = [min{a1 (α)b1 (α), a1 (α)b2 (α), a2 (α)b1 (α), a2 (α)b2 (α)},
max{a1 (α)b1 (α), a1 (α)b2 (α), a2 (α)b1 (α), a2 (α)b2 (α)]
The above properties of extended operations addition, subtraction and multiplication by
scalar of fuzzy fuzzy numbers of type LR are often used in fuzzy neural networks.
[f (A, B)]α = f ([A]α , [B]α ) = max{[A]α , [B]α } = [a1 (α) ∨ b1 (α), a2 (α) ∨ b2 (α)]
A B
fuzzy max
39
Definition 1.3.27 (fuzzy min)
Let f (x, y) = min{x, y} and let [A]α = [a1 (α), a2 (α)] and [B]α = [b1 (α), b2 (α)] be two
fuzzy numbers. Applying Theorem 1.3.3 we get
[f (A, B)]α = f ([A]α , [B]α ) = min{[A]α , [B]α } = [a1 (α) ∧ b1 (α), a2 (α) ∧ b2 (α)]
A B
fuzzy min
Definition 1.3.28 (sup-T extension principle) Let T be a t-norm and let f be a mapping
from X1 ×X2 ×· · ·×Xn to Y , Assume f (A1 , . . . , An ) is a fuzzy subset of X1 ×X2 ×· · ·×Xn ,
using the extension principle, we can define f (A) as a fuzzy subset of Y such that
½
sup{T (A1 (x), . . . , An (x)) | x ∈ f −1 (y)} if f −1 (y) 6= ∅
f (A1 , . . . , An )(y) =
0 otherwise
Example 1.3.22 Let P AN D(u, v) = uv be the product t-norm and let f (x1 , x2 ) = x1 +x2
be the addition operation on the real line. If A and B are fuzzy numbers then their sup-T
extended sum, denoted by A ⊕ B, is defined by
f (A, B)(y) = sup P AN D(A1 (x1 ), A2 (x2 )) = sup A1 (x1 )A2 (x2 )
x1 +x2 =y x1 +x2 =y
Example 1.3.23 Let T (u, v) = max{0, u + v − 1} be the L Ã ukasiewicz t-norm and let
f (x1 , x2 ) = x1 + x2 be the addition operation on the real line. If A and B are fuzzy
numbers then their sup-T extended sum, denoted by A ⊕ B, is defined by
f (A, B)(y) = sup LAN D(A1 (x1 ), A2 (x2 )) = sup max{0, A1 (x1 ) + A2 (x2 ) − 1}
x1 +x2 =y x1 +x2 =y
40
The reader can find some results on t-norm-based operations on fuzzy numbers in [45, 46,
52].
Exercise 1.3.1 Let A1 = (a1 , α) and A2 = (a2 , α) be fuzzy numbers of symmetric trian-
gular form. Compute analytically the membership function of their product-sum, A1 ⊕ A2 ,
defined by
(A1 ⊕ A2 )(y) = sup P AN D(A1 (x1 ), A2 (x2 )) = sup A1 (x1 )A2 (x2 ).
x1 +x2 =y x1 +x2 =y
41
1.3.2 Metrics for fuzzy numbers
Let A and B be fuzzy numbers with [A]α = [a1 (α), a2 (α)] and [B]α = [b1 (α), b2 (α)]. We
metricize the set of fuzzy numbers by the metrics
• Hausdorff distance
D(A,B) = |a-b|
1 A B
a- α a a+α b- α b b+α
• C∞ distance
i.e. C∞ (A, B) is the maximal distance between the membership grades of A and B.
A C(A,B) = 1 B
1
• Hamming distance Suppose A and B are fuzzy sets in X. Then their Hamming
distance, denoted by H(A, B), is defined by
Z
H(A, B) = |A(x) − B(x)| dx.
X
42
• Discrete Hamming distance Suppose A and B are discrete fuzzy sets
It should be noted that D(A, B) is a better measure of similarity than C∞ (A, B), because
C∞ (A, B) ≤ 1 holds even though the supports of A and B are very far from each other.
D(A, B) ≤ δ
then
D(f (A), f (B)) ≤ ²
Definition 1.3.30 Let f be a fuzzy function from F(IR) to F(IR). Then f is said to be
continuous in metric C∞ if ∀² > 0 there exists δ > 0 such that if
C∞ (A, B) ≤ δ
then
C∞ (f (A), f (B)) ≤ ².
We note that in the definition of continuity in metric C∞ the domain and the range of f
can be the family of all fuzzy subsets of the real line, while in the case of continuity in
metric D the the domain and the range of f is the set of fuzzy numbers.
Exercise 1.3.2 Let f (x) = sin x and let A = (a, α) be a fuzzy number of symmetric
triangular form. Calculate the membership function of the fuzzy set f (A).
Exercise 1.3.3 Let B1 = (b1 , β1 ) and B2 = (b2 , β2 ) be fuzzy number of symmetric trian-
gular form. Calculate the α-level set of their product B1 B2 .
Exercise 1.3.4 Let B1 = (b1 , β1 ) and B2 = (b2 , β2 ) be fuzzy number of symmetric trian-
gular form. Calculate the α-level set of their fuzzy max max{B1 , B2 }.
Exercise 1.3.5 Let B1 = (b1 , β1 ) and B2 = (b2 , β2 ) be fuzzy number of symmetric trian-
gular form. Calculate the α-level set of their fuzzy min min{B1 , B2 }.
Exercise 1.3.6 Let A = (a, α) and B = (b, β) be fuzzy numbers of symmetrical triangular
form. Calculate the distances D(A, B), H(A, B) and C∞ (A, B) as a function of a, b, α
and β.
43
Exercise 1.3.7 Let A = (a, α1 , α2 ) and B = (b, β1 , β2 ) be fuzzy numbers of triangular
form. Calculate the distances D(A, B), H(A, B) and C∞ (A, B) as a function of a, b, α1 ,
α2 , β1 and β2 .
Exercise 1.3.9 Let A = (a1 , a2 , α1 , α2 )LR and B = (b1 , b2 , β1 , β2 )LR be fuzzy numbers of
type LR. Calculate the distances D(A, B), H(A, B) and C∞ (A, B).
Exercise 1.3.10 Let A and B be discrete fuzzy subsets of X = {−2, −1, 0, 1, 2, 3, 4}.
τ (p) τ (q) τ (p → q)
1 1 1
0 1 1
0 0 1
1 0 0
Table 1.3 Truth table for the material implication.
Example 1.3.24 Let p = ”x is bigger than 10” and let q = ”x is bigger than 9”. It is
easy to see that p → q is true, because it can never happen that x is bigger than 10 and
at the same time x is not bigger than 9.
Consider the implication statement: if ”pressure is high” then ”volume is small”. The
membership function of the fuzzy set A = ”big pressure”,
1 if u ≥ 5
A(u) = 1 − (5 − u)/4 if 1 ≤ u ≤ 5
0 otherwise
can be interpreted as
44
• x is in the fuzzy set big pressure with grade of membership zero, for all 0 ≤ x ≤ 1
• x is in the fuzzy set big pressure with grade of membership one, for all x ≥ 5
can be interpreted as
• y is in the fuzzy set small volume with grade of membership zero, for all y ≥ 5
• y is in the fuzzy set small volume with grade of membership one, for all y ≤ 1
1 5 x 1 5 y
In our interpretation A(u) is considered as the truth value of the proposition ”u is big
pressure”, and B(v) is considered as the truth value of the proposition ”v is small volume”.
45
One possible extension of material implication to implications with intermediate truth
values is ½
1 if A(u) ≤ B(v)
A(u) → B(v) =
0 otherwise
This implication operator is called Standard Strict.
However, it is easy to see that this fuzzy implication operator is not appropriate for
real-life applications. Namely, let A(u) = 0.8 and B(v) = 0.8. Then we have
Let us suppose that there is a small error of measurement or small rounding error of
digital computation in the value of B(v), and instead 0.8 we have to proceed with 0.7999.
Then from the definition of Standard Strict implication operator it follows that
This example shows that small changes in the input can cause a big deviation in the
output, i.e. our system is very sensitive to rounding errors of digital computation and
small errors of measurement.
A smoother extension of material implication operator can be derived from the equation
X → Y = sup{Z|X ∩ Z ⊂ Y }
that is, ½
1 if A(u) ≤ B(v)
A(u) → B(v) =
B(v) otherwise
This operator is called Gödel implication. Using the definitions of negation and union of
fuzzy subsets the material implication p → q = ¬p ∨ q can be extended by
46
• S-implications: defined by
x → y = S(n(x), y)
where S is a t-conorm and n is a negation on [0, 1]. These implications arise from
the Boolean formalism p → q = ¬p ∨ q. Typical examples of S-implications are the
L
à ukasiewicz and Kleene-Dienes implications.
These implications arise from the Intutionistic Logic formalism. Typical examples
of R-implications are the Gödel and Gaines implications.
x → y = T (x, y)
The most often used fuzzy implication operators are listed in the following table.
Name Definition
47
1.3.4 Linguistic variables
The use of fuzzy sets provides a basis for a systematic way for the manipulation of vague
and imprecise concepts. In particular, we can employ fuzzy sets to represent linguistic
variables. A linguistic variable can be regarded either as a variable whose value is a fuzzy
number or as a variable whose values are defined in linguistic terms.
For example, if speed is interpreted as a linguistic variable, then its term set T (speed)
could be
T = {slow, moderate, fast, very slow, more or less fast, sligthly slow, . . . }
40 55 speed
70
48
• NB (Negative Big), NM (Negative Medium)
• PB (Positive Big)
NB NM NS ZE PS PM PB
-1 1
Figure 1.30 A possible fuzzy partition of [−1, 1].
If A a fuzzy set in X then we can modify the meaning of A with the help of words such
as very, more or less, slightly, etc. For example, the membership function of fuzzy sets
”very A” and ”more or less A” can be defined by
p
(very A)(x) = (A(x))2 , (more or less A)(x) = A(x), ∀x ∈ X
old
very old
30 60
Figure 1.31 Membership functions of fuzzy sets old and very old.
old
30 60
Figure 1.32 Membership function of fuzzy sets old and more or less old.
49
1.4 The theory of approximate reasoning
In 1979 Zadeh introduced the theory of approximate reasoning [118]. This theory provides
a powerful framework for reasoning in the face of imprecise and uncertain information.
Central to this theory is the representation of propositions as statements assigning fuzzy
sets as values to variables.
Suppose we have two interactive variables x ∈ X and y ∈ Y and the causal relationship
between x and y is completely known. Namely, we know that y is a function of x
y = f (x)
Then we can make inferences easily
premise y = f (x)
fact x = x0
consequence y = f (x0 )
This inference rule says that if we have y = f (x), ∀x ∈ X and we observe that x = x0
then y takes the value f (x0 ).
y
y=f(x)
y=f(x’)
x=x' x
Figure 1.33 Simple crisp inference.
More often than not we do not know the complete causal link f between x and y, only
we now the values of f (x) for some particular values of x
<1 : If x = x1 then y = y1
also
<2 : If x = x2 then y = y2
also
...
also
<n : If x = xn then y = yn
50
<1 : If x = x1 then y = y1
also
<2 : If x = x2 then y = y2
also
... ...
also
<n : If x = xn then y = yn
fact: x = x0
consequence: y = y0
In [118] Zadeh introduces a number of translation rules which allow us to represent some
common linguistic statements in terms of propositions in our language. In the following
we describe some of these translation rules.
51
Definition 1.4.4 Projection rule:
In fuzzy logic and approximate reasoning, the most important fuzzy implication inference
rule is the Generalized Modus Ponens (GMP). The classical Modus Ponens inference rule
says:
premise if p then q
fact p
consequence q
This inference rule can be interpreted as: If p is true and p → q is true then q is true.
The fuzzy implication inference is based on the compositional rule of inference for ap-
proximate reasoning suggested by Zadeh [115].
premise if x is A then y is B
fact x is A0
consequence: y is B 0
where the consequence B 0 is determined as a composition of the fact and the fuzzy impli-
cation operator
B 0 = A0 ◦ (A → B)
that is,
B 0 (v) = sup min{A0 (u), (A → B)(u, v)}, v ∈ V.
u∈U
The Generalized Modus Ponens, which reduces to calssical modus ponens when A0 = A
and B 0 = B, is closely related to the forward data-driven inference which is particularly
useful in the Fuzzy Logic Control.
In many practical cases instead of sup-min composition we use sup-T composition, where
T is a t-norm.
52
Definition 1.4.7 (sup-T compositional rule of inference)
premise if x is A then y is B
fact x is A0
consequence: y is B 0
where the consequence B 0 is determined as a composition of the fact and the fuzzy impli-
cation operator
B 0 = A0 ◦ (A → B)
that is,
B 0 (v) = sup{T (A0 (u), (A → B)(u, v)) | u ∈ U }, v ∈ V.
It is clear that T can not be chosen independently of the implication operator.
A(x)
min{A(x), B(y)}
B'(y) = B(y)
B(y)
y
AxB
Figure 1.34 A ◦ A × B = B.
The classical Modus Tollens inference rule says: If p → q is true and q is false then p is
false. The Generalized Modus Tollens,
premise if x is A then y is B
fact y is B 0
consequence: x is A0
which reduces to ”Modus Tollens” when B = ¬B and A0 = ¬A, is closely related to the
backward goal-driven inference which is commonly used in expert systems, especially in
the realm of medical diagnosis.
53
Suppose that A, B and A0 are fuzzy numbers. The Generalized Modus Ponens should
satisfy some rational properties
Property 1.4.1 Basic property:
if x is A then y is B if pressure is big then volume is small
x is A pressure is big
y is B volume is small
A' = A B'= B
A B
A' B'
B' = B
A
A'
54
Property 1.4.4 Superset:
if x is A then y is B
x is A0
y is B 0 ⊃ B
A B
A' B'
Suppose that A, B and A0 are fuzzy numbers. We show that the Generalized Modus
Ponens with Mamdani’s implication operator does not satisfy all the four properties listed
above.
if x is A then y is B
x is A0
y is B 0
B 0 (y) = sup min{1 − A(x), min{A(x), B(y)} = sup min{A(x), 1 − A(x), B(y)} =
x x
B 0 (y) = sup min{A0 (x), min{A(x), B(y)} = sup min{A(x), A0 (x), B(y)} =
x x
55
min{B(y), sup A0 (x)} = min{B(y), 1} = B(y)
x
B 0 (y) = sup min{A0 (x), min{A(x), B(y)} = sup min{A(x), A0 (x), B(y)} ≤ B(y).
x x
A B
B'
A(x)
x
Figure 1.39 The GMP with Mamdani’s implication operator.
if x is A then y is B
x is A0
y is B 0
56
So, the superset property is not satisfied.
A A'
B
B'
x
Figure 1.39a The GMP with Larsen’s implication operator.
<1 : if x is A1 then z is C1 ,
<2 : if x is A2 then z is C2 ,
············
<n : if x is An then z is Cn
fact: x is A
consequence: z is C
<i : if x is Ai then z is Ci
There are two main approaches to determine the membership function of consequence C.
• Combine the rules first. In this approach, we first combine all the rules by an
aggregation operator Agg into one rule which used to obtain C from A.
that is
\
n
R(u, w) = Ri (u, w) = min(Ai (u) → Ci (w))
i=1
57
If the sentence connective also is interpreted as or then we get
[
n
R= Ri
i=1
that is
[
n
R(u, w) = Ri (u, v, w) = max(Ai (u) → Ci (w))
i=1
C = A ◦ R = A ◦ Agg (R1 , R2 , · · · , Rn )
• Fire the rules first. Fire the rules individually, given A, and then combine their
results into C.
We first compose A with each Ri producing intermediate result
Ci0 = A ◦ Ri
for i = 1, . . . , n and then combine the Ci0 component wise into C 0 by some aggrega-
tion operator Agg
We show that the sup-min compositional operator and the connective also interpreted
as the union operator are commutative. Thus the consequence, C, inferred from the
complete set of rules is equivalent to the aggregated result, C 0 , derived from individual
rules.
and let
[
n
0
C = A ◦ Ri
i=1
58
Proof. Using the distributivity of ∧ over ∨ we get
(A(u) ∧ Rn (u, w))} = max{sup A(u) ∧ R1 (u, w), . . . , sup A(u) ∧ Rn (u, w)} = C 0 (w).
u u
and let
[
n
C0 = A ◦ Ri
i=1
Then C(w) = C 0 (w) holds for each w from the universe of discourse W .
A(u)Rn (u, w)} = max{sup A(u)R1 (u, w), . . . , sup A(u)Rn (u, w)} = C 0 (w).
u u
59
and let
\
n
0
C = A ◦ Ri
i=1
Then C ⊂ C 0 , i.e C(w) ≤ C 0 (w) holds for all w from the universe of discourse W .
and let
\
n
0
C = A ◦ Ri
i=1
Then C ⊂ C 0 , i.e C(w) ≤ C 0 (w) holds for all w from the universe of discourse W .
Example 1.4.3 We illustrate Lemma 1.4.3 by a simple example. Assume we have two
fuzzy rules of the form
<1 : if x is A1 then z is C1
<2 : if x is A2 then z is C2
where A1 , A2 and C1 , C2 are discrete fuzzy numbers of the universe of discourses {x1 , x2 }
and {z1 , z2 }, respectively. Suppose that we input a fuzzy set A = a1 /x1 + a2 /x2 to the
system and let
60
z1 z2 z1 z2
R1 = x 1 0 1 , R2 = x1 1 0
x2 1 0 x2 0 1
represent the fuzzy rules. We first compute the consequence C by
C = A ◦ (R1 ∩ R2 ).
C 0 = (A ◦ R1 ) ∩ (A ◦ R2 )
So,
A ◦ R1 = a2 /z1 + a1 /z2
and from
z1 z2
A ◦ R2 = (a1 /x1 + a2 /x2 ) ◦ x1 1 0 =
x2 0 1
we get
A ◦ R2 = a1 /z1 + a2 /z2 .
Finally,
C 0 = a2 /z1 + a1 /z2 ∩ a1 /z1 + a2 /z2 = a1 ∧ a2 /z1 + a1 ∧ a2 /z2 .
Which means that C is a proper subset of C 0 whenever min{a1 , a2 } 6= 0.
Suppose now that the fact of the GMP is given by a fuzzy singleton. Then the process of
computation of the membership function of the consequence becomes very simple.
61
_
1 x0
x0
Figure 1.39b Fuzzy singleton.
rule 1: if x is A1 then z is C1
fact: x is x̄0
consequence: z is C
Observing that x̄0 (u) = 0, ∀u 6= x0 the supremum turns into a simple minimum
A1 C1
C
A1(x0)
x0 u w
Figure 1.40 Inference with Mamdani’s implication operator.
So, ½
1 if A1 (x0 ) ≤ C1 (w)
C(w) =
C1 (w) otherwise
62
A1 C
C1
x0 u w
Figure 1.41 Inference with Gödel implication operator.
Proof. Suppose that the input of the system A = x̄0 is a fuzzy singleton. On the one
hand we have
C(w) = (A ◦ Agg [R1 , . . . , Rn ])(w) = sup{x̄0 (u) ∧ Agg [R1 , . . . , Rn ](u, w)} =
u
sup min{x̄0 (u), R1 (u, w)}] = Agg [R1 (x0 , w), . . . , Rn (x0 , w)] = C(w).
u
Which ends the proof.
Consider one block of fuzzy rules of the form
< = {Ai → Ci , 1 ≤ i ≤ n}
where Ai and Ci are fuzzy numbers.
Lemma 1.4.6 Suppose that in < the supports of Ai are pairwise disjunctive:
supp(Ai ) ∩ supp(Aj ) = ∅, for i 6= j.
If the implication operator is defined by
½
1 if x ≤ z
x→z=
z otherwise
(Gödel implication) then
\
n
Ai ◦ (Ai → Ci ) = Ci
i=1
holds for i = 1, . . . , n
63
Proof. Since the GMP with Gödel implication satisfies the basic property we get
Ai ◦ (Ai → Ci ) = Ai .
Ai ◦ (Aj → Cj ) = 1, i 6= j
A1 A2 A3 A4
Definition 1.4.8 The rule-base < is said to be separated if the core of Ai , defined by
This property means that deleting any of the rules from < leaves a point x̂ to which no
rule applies. It means that every rule is useful.
A1 A2 A3
The following theorem shows that Lemma 1.4.6 remains valid for separated rule-bases.
Theorem 1.4.1 [23] Let < be separated. If the implication is modelled by the Gödel
implication operator then
\n
Ai ◦ (Ai → Ci ) = Ci
i=1
holds for i = 1, . . . , n
64
Proof. Since the Gödel implication satisfies the basic property of the GMP we get
Ai ◦ (Ai → Ci ) = Ai .
Since core(Ai ) ∩ supp(Aj ) 6= ∅, for i 6= j there exists an element x̂ such that x̂ ∈ core(Ai )
and x̂ ∈
/ supp(Aj ), i 6= j. That is Ai (x̂) = 1 and Aj (x̂) = 0, i 6= j. Applying the
compositional rule of inference with Gödel implication operator we get
Exercise 1.4.1 Show that the GMP with Gödel implication operator satisfies properties
(1)-(4).
Exercise 1.4.3 Show that the statement of Lemma 1.4.6 also holds for L
à ukasiewicz im-
plication operator.
Exercise 1.4.4 Show that the statement of Theorem 1.4.1 also holds for L
à ukasiewicz
implication operator.
65
1.5 An introduction to fuzzy logic controllers
Conventional controllers are derived from control theory techniques based on mathemat-
ical models of the open-loop process, called system, to be controlled.
The purpose of the feedback controller is to guarantee a desired response of the output
y. The process of keeping the output y close to the setpoint (reference input) y ∗ , despite
the presence disturbances of the system parameters, and noise measurements, is called
regulation. The output of the controller (which is the input of the system) is the control
action u. The general form of the discrete-time control law is
providing a control action that describes the relationship between the input and the output
of the controller. In (1.2), e represents the error between the desired setpoint y ∗ and the
output of the system y; parameter τ defines the order of the controller, and f is in general
a nonlinear function.
y* e u y
Controller System
• proportional (P)
• integral (I)
• derivative (D)
and their combinations can be derived from control law (1.2) for different values of pa-
rameter τ and for different functions f .
66
The seminal work by L.A. Zadeh on fuzzy algorithms [115] introduced the idea of formu-
lating the control algorithm by logical rules.
In a fuzzy logic controller (FLC), the dynamic behavior of a fuzzy system is characterized
by a set of linguistic description rules based on expert knowledge. The expert knowledge
is usually of the form
IF (a set of conditions are satisfied) THEN (a set of consequences can be inferred).
Since the antecedents and the consequents of these IF-THEN rules are associated with
fuzzy concepts (linguistic terms), they are often called fuzzy conditional statements. In our
terminology, a fuzzy control rule is a fuzzy conditional statement in which the antecedent
is a condition in its application domain and the consequent is a control action for the
system under control.
Basically, fuzzy control rules provide a convenient way for expressing control policy and
domain knowledge. Furthermore, several linguistic variables might be involved in the
antecedents and the conclusions of these rules. When this is the case, the system will be
referred to as a multi-input-multi-output (MIMO) fuzzy system. For example, in the case
of two-input-single-output (MISO) fuzzy systems, fuzzy control rules have the form
<1 : if x is A1 and y is B1 then z is C1
also
<2 : if x is A2 and y is B2 then z is C2
also
...
also
<n : if x is An and y is Bn then z is Cn
where x and y are the process state variables, z is the control variable, Ai , Bi , and Ci
are linguistic values of the linguistic vatiables x, y and z in the universes of discourse U ,
V , and W , respectively, and an implicit sentence connective also links the rules into a
rule set or, equivalently, a rule-base. We can represent the FLC in a form similar to the
conventional control law (1.2)
u(k) = F (e(k), e(k − 1), . . . , e(k − τ ), u(k − 1), . . . , u(k − τ )) (1.3)
where the function F is described by a fuzzy rule-base. However it does not mean that
the FLC is a kind of transfer function or difference equation. The knowledge-based
nature of FLC dictates a limited usage of the past values of the error e and control u
because it is rather unreasonable to expect meaningful linguistic statements for e(k − 3),
e(k − 4), . . . , e(k − τ ). A typical FLC describes the relationship between the change of
the control
∆u(k) = u(k) − u(k − 1)
on the one hand, and the error e(k) and its change
∆e(k) = e(k) − e(k − 1).
on the other hand. Such control law can be formalized as
∆u(k) = F (e(k), ∆(e(k)) (1.4)
67
and is a manifestation of the general FLC expression (1.3) with τ = 1. The actual output
of the controller u(k) is obtained from the previous value of control u(k − 1) that is
updated by ∆u(k)
u(k) = u(k − 1) + ∆u(k).
This type of controller was suggested originally by Mamdani and Assilian in 1975 [81]
and is called the Mamdani-type FLC. A prototypical rule-base of a simple FLC realising
the control law (1.4) is listed in the following
N error ZE P
So, our task is the find a crisp control action z0 from the fuzzy rule-base and from the
actual crisp inputs x0 and y0 :
<1 : if x is A1 and y is B1 then z is C1
also
<2 : if x is A2 and y is B2 then z is C2
also
... ...
also
<n : if x is An and y is Bn then z is Cn
input x is x0 and y is y0
output z0
Of course, the inputs of fuzzy rule-based systems should be given by fuzzy sets, and
therefore, we have to fuzzify the crisp inputs. Furthermore, the output of a fuzzy system
is always a fuzzy set, and therefore to get crisp value we have to defuzzify it.
Fuzzy logic control systems usually consist from four major parts: Fuzzification interface,
Fuzzy rule-base, Fuzzy inference machine and Defuzzification interface.
68
Fuzzifier
crisp x in U
fuzzy set in U
Fuzzy Fuzzy
Inference Rule
Engine Base
fuzzy set in V
crisp y in V
Defuzzifier
A fuzzification operator has the effect of transforming crisp data into fuzzy sets. In most
of the cases we use fuzzy singletons as fuzzifiers
f uzzif ier(x0 ) := x̄0
where x0 is a crisp input value from a process.
_
1 x0
x0
Figure 1.44a Fuzzy singleton as fuzzifier.
Suppose now that we have two input variables x and y. A fuzzy control rule
<i : if (x is Ai and y is Bi ) then (z is Ci )
is implemented by a fuzzy implication Ri and is defined as
£ ¤
Ri (u, v, w) = Ai (u) and Bi (v) → Ci (w)
where the logical connective and is implemented by Cartesian product, i.e.
£ ¤ £ ¤
Ai (u) and Bi (v) → Ci (w) = Ai (u) × Bi (v) → Ci (w) = min{Ai (u), Bi (v)} → Ci (w)
Of course, we can use any t-norm to model the logical connective and.
An FLC consists of a set of fuzzy control rules which are related by the dual concepts of
fuzzy implication and the sup–t-norm compositional rule of inference. These fuzzy control
69
rules are combined by using the sentence connective also. Since each fuzzy control rule
is represented by a fuzzy relation, the overall behavior of a fuzzy system is characterized
by these fuzzy relations. In other words, a fuzzy system can be characterized by a single
fuzzy relation which is the combination in question involves the sentence connective also.
Symbolically, if we have the collection of rules
<1 : if x is A1 and y is B1 then z is C1
also
<2 : if x is A2 and y is B2 then z is C2
also
··· ···
also
<n : if x is An and y is Bn then z is Cn
The procedure for obtaining the fuzzy output of such a knowledge base consists from the
following three steps:
To infer the output z from the given process states x, y and fuzzy relations Ri , we apply
the compositional rule of inference:
<1 : if x is A1 and y is B1 then z is C1
<2 : if x is A2 and y is B2 then z is C2
············
<n : if x is An and y is Bn then z is Cn
fact : x is x̄0 and y is ȳ0
consequence : z is C
That is,
C = Agg(x̄0 × ȳ0 ◦ R1 , . . . , x̄0 × ȳ0 ◦ Rn )
taking into consideration that x̄0 (u) = 0, u 6= x0 and ȳ0 (v) = 0, v 6= y0 , the computation
of the membership function of C is very simple:
for all w ∈ W .
The procedure for obtaining the fuzzy output of such a knowledge base can be formulated
as
70
• The firing level of the i-th rule is determined by
Ai (x0 ) × Bi (y0 ).
for all w ∈ W .
• The overall system output, C, is obtained from the individual rule outputs Ci0 by
for all w ∈ W .
Example 1.5.2 If the sentence connective also is interpreted as anding the rules by using
minimum-norm then the membership function of the consequence is computed as
That is,
for all w ∈ W .
Example 1.5.3 If the sentence connective also is interpreted as oring the rules by using
minimum-norm then the membership function of the consequence is computed as
That is,
for all w ∈ W .
Example 1.5.4 Suppose that the Cartesian product and the implication operator are im-
plemented by the t-norm T (u, v) = uv. If the sentence connective also is interpreted as
oring the rules by using minimum-norm then the membership function of the consequence
is computed as
C = (x̄0 × ȳ0 ◦ R1 ) ∪ . . . ∪ (x̄0 × ȳ0 ◦ Rn ).
That is,
C(w) = max{A1 (x0 )B1 (y0 )C1 (w), . . . , An (x0 )Bn (y0 )Cn (w)}
for all w ∈ W .
71
1.5.1 Defuzzification methods
The output of the inference process so far is a fuzzy set, specifying a possibility distribution
of control action. In the on-line control, a nonfuzzy (crisp) control action is usually
required. Consequently, one must defuzzify the fuzzy control action (output) inferred
from the fuzzy control algorithm, namely:
where z0 is the nonfuzzy control output and defuzzifier is the defuzzification operator.
• Center-of-Sums, Center-of-Largest-Area
z0
Figure 1.45 First-of-Maxima defuzzification method.
1X N
z0 = zj
N j=1
72
where {z1 , . . . , zN } is the set of elements of the universe W which attain the max-
imum value of C. If C is not discrete then defuzzified value of a fuzzy set C is
defined as R
z dz
z0 = RG
G
dz
where G denotes the set of maximizing element of C.
z0
Figure 1.46 Middle-of-Maxima defuzzification method.
• Max-Criterion. This method chooses an arbitrary value, from the set of maxi-
mizing elements of C, i.e.
Example 1.5.5 [128] Consider a fuzzy controller steering a car in a way to avoid ob-
stacles. If an obstacle occurs right ahead, the plausible control action depicted in Figure
1.46a could be interpreted as ”turn right or left”. Both Center-of-Area and Middle-of-
Maxima defuzzification methods results in a control action ”drive ahead straightforward”
which causes an accident.
z0
73
Figure 1.46a Undisered result by Center-of-Area and Middle-of-Maxima
defuzzification methods.
A suitable defuzzification method would have to choose between different control actions
(choose one of two triangles in the Figure) and then transform the fuzzy set into a crisp
value.
Exercise 1.5.1 Let the overall system output, C, have the following membership function
2
x √ if 0 ≤ x ≤ 1
C(x) = 2 − x if 1 ≤ x ≤ 4
0 otherwise
Exercise 1.5.2 Let C = (a, b, α) be a triangular fuzzy number. Compute the defuzzified
value of C using the Center-of-Area and Middle-of-Maxima methods.
a-α a z0 a+β
Figure 1.46b z0 is the defuzzified value of C.
Exercise 1.5.3 Let C = (a, b, α, β) be a trapezoidal fuzzy number. Compute the defuzzi-
fied value of C using the Center-of-Area and Middle-of-Maxima methods.
a-α a z0 b b+β
Figure 1.46c z0 is the defuzzified value of C.
Exercise 1.5.4 Let C = (a, b, α, β)LR be a fuzzy number of type LR. Compute the de-
fuzzified value of C using the Center-of-Area and Middle-of-Maxima methods.
74
1.5.2 Inference mechanisms
We present four well-known inference mechanisms in fuzzy logic control systems. For
simplicity we assume that we have two fuzzy control rules of the form
<1 : if x is A1 and y is B1 then z is C1
also
<2 : if x is A2 and y is B2 then z is C2
fact : x is x̄0 and y is ȳ0
consequence : z is C
A1 B1 C1
u v w
A2 B2 C2
xo u yo v w
min
Figure 1.47 Making inferences with Mamdani’s implication operator.
75
• Tsukamoto. All linguistic terms are supposed to have monotonic membership
functions.
The firing levels of the rules, denoted by αi , i = 1, 2, are computed by
α1 = A1 (x0 ) ∧ B1 (y0 ), α2 = A2 (x0 ) ∧ B2 (y0 )
In this mode of reasoning the individual crisp control actions z1 and z2 are computed
from the equations
α1 = C1 (z1 ), α2 = C2 (z2 )
and the overall crisp control action is expressed as
α1 z1 + α2 z2
z0 =
α1 + α2
i.e. z0 is computed by the discrete Center-of-Gravity method.
If we have n rules in our rule-base then the crisp control action is computed as
Xn ÁXn
z0 = αi zi αi ,
i=1 i=1
where αi is the firing level and zi is the (crisp) output of the i-th rule, i = 1, . . . , n
A1 B1 C1
0.7
0.3 0.3
u v z1 = 8 w
A2 B2 C2
xo u yo v min z2 = 4 w
Figure 1.48 Tsukamoto’s inference mechanism.
Example 1.5.6 We illustrate Tsukamoto’s reasoning method by the following simple ex-
ample
<1 : if x is A1 and y is B1 then z is C1
also
<2 : if x is A2 and y is B2 then z is C2
fact : x is x̄0 and y is ȳ0
consequence : z is C
76
Then according to Fig.1.48 we see that
and from
A2 (x0 ) = 0.6, B2 (y0 ) = 0.8
it follows that the firing level of the second rule is
the individual rule outputs z1 = 8 and z2 = 4 are derived from the equations
then the individual rule outputs are derived from the relationships
z1∗ = a1 x0 + b1 y0 , z2∗ = a2 x0 + b2 y0
α1 z1∗ + α2 z2∗
z0 =
α1 + α2
If we have n rules in our rule-base then the crisp control action is computed as
Xn ÁXn
∗
z0 = αi zi αi ,
i=1 i=1
77
Example 1.5.7 We illustrate Sugeno’s reasoning method by the following simple example
<1 : if x is BIG and y is SMALL then z1 = x + y
also
<2 : if x is MEDIUM and y is BIG then z2 = 2x − y
fact : x0 is 3 and y0 is 2
conseq : z0
1
0.8
0.2
α1 = 0.2
u v x+y=5
1
0.9
0.6
α2 =0.6
3 u 2 v min 2x-y=4
Figure 1.49 Sugeno’s inference mechanism.
78
• Larsen. The fuzzy implication is modelled by Larsen’s prduct operator and the
sentence connective also is interpreted as oring the propositions and defined by max
operator. Let us denote αi the firing level of the i-th rule, i = 1, 2
A1 B1 C1
u v w
A2 B2 C2
xo u yo v w
min
Figure 1.50 Making inferences with Larsen’s product operation rule.
79
1.5.3 Construction of data base and rule base of FLC
The knowledge base of an fuzzy logic controller is compromised of two components,
namely, a data base and a fuzzy rule base. The concepts associated with a data base are
used to characterize fuzzy control rules and fuzzy data manipulation in an FLC. These
concepts are subjectively defined and based on experience and engineering judgment. It
should be noted that the correct choice of the membership functions of a linguistic term
set plays an essential role in the success of an application.
Drawing heavily on [79] we discuss some of the important aspects relating to the con-
struction of the data base and rule base in an FLC.
ε = 0.5
1
Figure 1.50a ε-complet fuzzy partition of [0, 1] with ε = 0.5.
In this sense, a dominant rule always exists and is associated with the degree of
belief greater than 0.5. In the extreme case, two dominant rules are activated with
equal belief 0.5.
80
A look-up table based on discrete universes, which defines the output of a controller
for all possible combinations of the input signals, can be implemented by off-line
processing in order to shorten the running time of the controller.
Range NB NM NS ZE PS PM PB
NB ZE PM
81
N Z P
Since a normalized universe implies the knowledge of the input/output space via
appropriate scale mappings, a well-formed term set can be achieved as shown. If this
is not the case, or a nonnormalized universe is used, the terms could be asymmetrical
and unevenly distributed in the universe. Furthermore, the cardinality of a term set
in a fuzzy input space determines the maximum number of fuzzy control rules that
we can construct.
NB NM NS ZE PS PM PB
-1 1
Figure 1.53 A finer fuzzy partition of [−1, 1].
• Completeness.
Intuitively, a fuzzy control algorithm should always be able to infer a proper control
action for every state of process. This property is called ”completeness”. The
completeness of an FLC relates to its data base, rule base, or both.
82
to a change in the normalization of a universe. Either a numerical definition or func-
tional definition may be used to assign the grades of membership is based on the
subjective criteria of the decision.
Numerical In this case, the grade of membership function of a fuzzy set is repre-
sented as a vector of numbers whose dimention depends on the degree of discretiza-
tion.
In this case, the membership function of each primary fuzzy set has the form
• Rule base.
A fuzzy system is characterized by a set of linguistic statements based on expert
knowledge. The expert knowledge is usually in the form of ”IF-THEN” rules, which
are easily implemented by fuzzy conditional statements in fuzzy logic. The collection
of fuzzy control rules that are expressed as fuzzy conditional statements forms the
rule base or the rule set of an FLC.
83
– Operator’s Control Actions
In many industrial man-machine control system, the input-output relations
are not known with sufficient precision to make it possible to employ classical
control theory for modeling and simulation.
And yet skilled human operators can control such systems quite successfully
without having any quantitative models in mind. In effect, a human operator
employs-consciously or subconsciously - a set of fuzzy IF-THEN rules to control
the process.
As was pointed out by Sugeno, to automate such processes, it is expedient to
express the operator’s control rules as fuzzy IF-THEN rules employing linguis-
tic variables. In practice, such rules can be deduced from the observation of
human controller’s actions in terms of the input-output operating data.
– Fuzzy Model of a Process
In the linguistic approach, the linguistic description of the dynamic characteris-
tics of a controlled process may be viewed as a fuzzy model of the process.Based
on the fuzzy model, we can generate a set of fuzzy control rules for attaining
optimal performance of a dynamic system.
The set of fuzzy control rules forms the rule base of an FLC.
Although this approach is somewhat more complicated, it yields better perfor-
mance and reliability, and provides a FLC.
– Learning
Many fuzzy logic controllers have been built to emulate human decision-making
behavior, but few are focused on human learning, namely, the ability to create
fuzzy control rules and to modify them based on experience. A very interesting
example of a fuzzy rule based system which has a learning capability is Sugeno’s
fuzzy car. Sugeno’s fuzzy car can be trained to park by itself.
84
1.5.4 Ball and beam problem
We illustrate the applicability of fuzzy logic control systems by the ball and beam problem.
The ball and beam system can be found in many undergraduate control laboratories. The
beam is made to rotate in a vertical plane by applying a torque at the center of rotation
and the ball is free to roll along the beam. We require that the ball remain in contact
with the beam. Let x = (r, ṙ, θ, θ̇)T be the state of the system, and y = r be the output
of the system. Then the system can be represented by the state-space model
ẋ1 x2 0
ẋ2 B(x1 x24 − G sin x3 ) 0
+ u
ẋ3 = x4 0
ẋ4 0 1
y = x1
where the control u is the acceleration of θ. The purpose of control is to determine
u(x) such that the closed-loop system output y will converge to zero from certain initial
conditions. The input-output linearization algorithm determines the control law u(x) as
follows: For state x compute
where φ1 = x1 , φ2 = x2
s4 + α3 s3 + α2 s2 + α1 s + α0
is a Hurwitz polynomial. Compute a(x) = −BG cos x3 and b(x) = BGx24 sin x3 ; then
u(x) = (v(x) − b(x))/a(x).
r
u
ball
θ
origin
beam
Figure 1.55 The beam and ball problem.
Wang and Mendel [98] use the following four common-sense linguistic control rules for
the beam and ball problem:
85
<1 : if x1 is ”positive” and x2 is ”near zero” and x3 is ”positive”
and x4 is ”near zero” then ”u is negative”
<2 : if x1 is ”positive” and x2 is ”near zero” and x3 is ”negative”
and x4 is ”near zero” then ”u is positive big”
<3 : if x1 is ”negative” and x2 is ”near zero” and x3 is ”positive”
and x4 is ”near zero”then ”u is negative big”
<4 : if x1 is ”negative” and x2 is ”near zero” and x3 is ”negative”
and x4 is ”near zero” then ”u is positive”
where all fuzzy numbers have Gaussian membership function, e.g. the value ”near zero”
of the linguistic variable x2 is defined by exp(−x2 /2).
Using the Stone-Weierstrass theorem Wang [99] showed that fuzzy logic control systems
of the form
<i : if x is Ai and y is Bi then z is Ci , i = 1, . . . , n
with
86
• Centroid defuzzification method [80]
Pn
i=1 αi3 Ai (x)Bi (y)
z= P n
i=1 Ai (x)Bi (y)
are universal approximators, i.e. they can approximate any continuous function on a
compact set to arbitrary accuracy. Namely, he proved the following theorem
Theorem 1.5.1 For a given real-valued continuous function g on the compact set U and
arbitrary ² > 0, there exists a fuzzy logic control system with output function f such that
sup kg(x) − f (x)k ≤ ².
x∈U
87
1.6 Aggregation in fuzzy system modeling
Many applications of fuzzy set theory involve the use of a fuzzy rule base to model
complex and perhaps ill-defined systems. These applications include fuzzy logic control,
fuzzy expert systems and fuzzy systems modeling. Typical of these situations are set of
n rules of the form
<1 : if x is A1 then y is C1
also
<2 : if x is A2 then y is C2
also
············
also
<n : if x is An then y is Cn
The fuzzy inference process consists of the following four step algorithm [107]:
• Determination of the relevance or matching of each rule to the current input value.
• Determination of the output of each rule as fuzzy subset of the output space. We
shall denote these individual rule outputs as Rj .
• Aggregation of the individual rule outputs to obtain the overall fuzzy system output
as fuzzy subset of the output space. We shall denote this overall output as R.
Our purpose here is to investigate the requirements for the operations that can be used
to implement this reasoning process. We are particularly concerned with the third step,
the rule output aggregation.
Let us look at the process for combining the individual rule outputs. A basic assumption
we shall make is that the operation is pointwise and likewise. By pointwise we mean
that for every y, R(y) just depends upon Rj (y), j = 1, . . . , n. By likewise we mean that
the process used to combine the Rj is the same for all of the y.
Let us denote the pointwise process we use to combine the individual rule outputs as
In the above Agg is called the aggregation operator and the Rj (y) are the arguments.
More generally, we can consider this as an operator
a = Agg(a1 , . . . , an )
where the ai and a are values from the membership grade space, normally the unit interval.
Let us look at the minimal requirements associated with Agg. We first note that the
combination of of the individual rule outputs should be independent of the choice of
indexing of the rules. This implies that a required property that we must associate with
th Agg operator is that of commutativity, the indexing of the arguments does not matter.
88
We note that the commutativity property allows to represent the arguments of the Agg
operator, as an unordered collection of possible duplicate values; such an object is a bag.
For an individual rule output, Rj , the membership grade Rj (y) indicates the degree or
sterength to which this rule suggests that y is the appropriate solution. In particular if
for a pair of elements y 0 and y 00 it is the case that
Ri (y 0 ) ≥ Ri (y 00 ),
then we are saying that rule j is preferring y 0 as the system output over y 00 . From this we
can reasonably conclude that if all rules prefer y 0 over y 00 as output then the overall system
output should prefer y 0 over y 00 . This observation requires us to impose a monotonicity
condition on the Agg operation. In particular if
Rj (y 0 ) ≥ Rj (y 00 ),
C =A⊕B
A ⊕ B =< a, b, c, d, b, c, c >
In the following we let Bag(X) indicate the set of all bags of the set X.
89
Definition 1.6.1 A function
F : Bag(X) → X
is called a bag mapping from Bag(X) into the set X.
An important property of bag mappings are that they are commutative in the sense that
the ordering of the elements does not matter.
Definition 1.6.2 Assume A =< a1 , . . . , an > and B =< b1 , . . . , bn > are two bags of the
same cardinality n. If the elements in A and B can be indexed in such way that ai ≥ bi
for all i then we shall denote this A ≥ B.
• For every bag A there exists an element, u ∈ [0, 1], called the identity of A such that
if C = A⊕ < u > then M (C) = M (A) (identity)
Thus the MICA operator is endowed with two properties in addition to the inherent
commutativity of the bag operator, monotonicity and identity:
• The property of identity allows us to have the facility for aggregating data which
does not affect the overall result. This becomes useful for enabling us to include
importances among other characteristics.
Fuzzy set theory provides a host of attractive aggregation connectives for integrating mem-
bership values representing uncertain information. These connectives can be categorized
into the following three classes union, intersection and compensation connectives.
Union produces a high output whenever any one of the input values representing degrees
of satisfaction of different features or criteria is high. Intersection connectives produce a
high output only when all of the inputs have high values. Compensative connectives have
the property that a higher degree of satisfaction of one of the criteria can compensate for
a lower degree of satisfaction of another criteria to a certain extent.
In the sense, union connectives provide full compensation and intersection connectives
provide no compensation.
90
1.6.1 Averaging operators
In a decision process the idea of trade-offs corresponds to viewing the global evaluation
of an action as lying between the worst and the best local ratings.
This occurs in the presence of conflicting goals, when a compensation between the corre-
sponding compabilities is allowed.
Averaging operators realize trade-offs between objectives, by allowing a positive compen-
sation between ratings.
• M is continuous
and
M (x, y) ≤ M (max{x, y}, max{x, y}) = max{x, y}.
Which ends the proof. The interesting properties averagings are the following [25]:
91
An important family of averaging operators is formed by quasi-arithmetic means
−1
1X n
M (a1 , . . . , an ) = f ( f (ai ))
n i=1
This family has been characterized by Kolmogorov as being the class of all decomposable
continuous averaging operators.
Name M (x, y)
The process of information aggregation appears in many applications related to the de-
velopment of intelligent systems. One sees aggregation in neural networks, fuzzy logic
controllers, vision systems, expert systems and multi-criteria decision aids. In [104] Yager
introduced a new aggregation technique based on the ordered weighted averaging (OWA)
operators.
F : IRn → IR,
Furthermore
X
n
F (a1 , . . . , an ) = wj bj
j=1
92
Example 1.6.3 Assume W = (0.4, 0.3, 0.2, 0.1)T then
F (0.7, 1, 0.2, 0.6) = 0.4 × 1 + 0.3 × 0.7 + 0.2 × 0.6 + 0.1 × 0.2 = 0.75.
A fundamental aspect of this operator is the re-ordering step, in particular an aggregate
ai is not associated with a particular weight wi but rather a weight is associated with a
particular ordered position of aggregate.
When we view the OWA weights as a column vector we shall find it convenient to refer to
the weights with the low indices as weights at the top and those with the higher indices
with weights at the bottom.
It is noted that different OWA operators are distinguished by their weighting function.
In [104] Yager pointed out three important special cases of OWA aggregations:
• F ∗ : In this case W = W ∗ = (1, 0 . . . , 0)T and
F ∗ (a1 , . . . , an ) = max (ai ),
i=1,... ,n
A number of important properties can be associated with the OWA operators. We shall
now discuss some of these.
For any OWA operator F
F∗ (a1 , . . . , an ) ≤ F (a1 , . . . , an ) ≤ F ∗ (a1 , . . . , an ).
Thus the upper an lower star OWA operator are its boundaries. From the above it
becomes clear that for any F
max(ai ) ≤ F (a1 , . . . , an ) ≤ maxi (ai ).
The OWA operator can be seen to be commutative. Let {a1 , . . . , an } be a bag of aggregates
and let {d1 , . . . , dn } be any permutation of the ai . Then for any OWA operator
F (a1 , . . . , an ) = F (d1 , . . . , dn ).
A third characteristic associated with these operators is monotonicity. Assume ai and ci
are a collection of aggregates, i = 1, . . . , n such that for each i, ai ≥ ci . Then
F (a1 , . . . , an ) ≥ F (c1 , c2 , . . . , cn )
where F is some fixed weight OWA operator.
Another characteristic associated with these operators is idempotency. If ai = a for all i
then for any OWA operator
F (a1 , . . . , an ) = a.
From the above we can see the OWA operators have the basic properties associated with
an averaging operator.
93
Example 1.6.4 A window type OWA operator takes the average of the m arguments
about the center. For this class of operators we have
0 if i < k
wi = 1/m if k ≤ i < k + m
0 if i ≥ k + m
1/m
1 k k+m-1 n
Figure 1.57 Window type OWA operator.
In order to classify OWA operators in regard to their location between and and or, a
measure of orness, associated with any vector W is introduce by Yager [104] as follows
1 X n
orness(W ) = (n − i)wi
n − 1 i=1
It is easy to see that for any W the orness(W ) is always in the unit interval. Furthermore,
note that the nearer W is to an or, the closer its measure is to one; while the nearer it is
to an and, the closer is to zero.
Lemma 1.6.2 Let us consider the the vectors W ∗ = (1, 0 . . . , 0)T , W∗ = (0, 0 . . . , 1)T
and WA = (1/n, . . . , 1/n)T . Then it can easily be shown that
• orness(W ∗ ) = 1
• orness(W∗ ) = 0
• orness(WA ) = 0.5
andness(W ) = 1 − orness(W )
Generally, an OWA opeartor with much of nonzero weights near the top will be an orlike
operator,
orness(W ) ≥ 0.5
and when much of the weights are nonzero near the bottom, the OWA operator will be
andlike
andness(W ) ≥ 0.5.
1
orness(W ) = (2 × 0.8 + 0.2) = 0.6
3
94
and
andness(W ) = 1 − orness(W ) = 1 − 0.6 = 0.4.
This means that the OWA operator, defined by
F (a1 , a2 , a3 ) = 0.8b1 + 0.2b2 + 0.0b3 = 0.8b1 + 0.2b2
where bj is the j-th largest element of the bag < a1 , a2 , a3 >, is an orlike aggregation.
The following theorem shows that as we move weight up the vector we increase the orness,
while moving weight down causes us to decrease orness(W ).
Theorem 1.6.1 [105] Assume W and W 0 are two n-dimensional OWA vectors such that
W = (w1 , . . . , wn )T , W 0 = (w1 , . . . , wj + ², . . . , wk − ², . . . , wn )T
where ² > 0, j < k. Then orness(W 0 ) > orness(W ).
1
orness(W 0 ) = orness(W ) + ²(k − j).
n−1
Since k > j, orness(W 0 ) > orness(W ).
In [104] Yager defined the measure of dispersion (or entropy) of an OWA vector by
X
disp(W ) = − wi ln wi .
i
We can see when using the OWA operator as an averaging operator Disp(W ) measures
the degree to which we use all the aggregates equally.
If F is an OWA aggregation with weights wi the dual of F denoted F̂ , is an OWA
aggregation of the same dimention where with weights ŵi
ŵi = wn−i+1 .
We can easily see that if F and F̂ are duals then
disp(F̂ ) = disp(F )
orness(F̂ ) = 1 − orness(F ) = andness(F )
Thus is F is orlike its dual is andlike.
95
An important application of the OWA operators is in the area of quantifier guided aggre-
gations [104]. Assume
{A1 , . . . , An }
is a collection of criteria. Let x be an object such that for any criterion Ai , Ai (x) ∈ [0, 1]
indicates the degree to which this criterion is satisfied by x. If we want to find out
the degree to which x satisfies ”all the criteria” denoting this by D(x), we get following
Bellman and Zadeh [2].
D(x) = min{A1 (x), . . . , An (x)}
In this case we are essentially requiring x to satisfy A1 and A2 and . . . and An .
If we desire to find out the degree to which x satisfies ”at least one of the criteria”,
denoting this E(x), we get
• regular unimodal if
96
Figure 1.58a Unimodal linguistic quantifier.
w3
w2
w1
Proof. We first see that from the non-decreasing property Q(i/n) ≥ Q(i − 1/n) hence
wi ≥ 0 and since Q(r) ≤ 1 then wi ≤ 1. Furthermore we see
X X i i n 0
wi = (Q( ) − Q( )) − Q( ) − Q( ) = 1 − 0 = 1.
i i
n n−1 n n
97
we call any function satisfying the conditions of a regular non-decreasing quantifier an
acceptable OWA weight generating function.
Let us look at the weights generated from some basic types of quantifiers. The quantifier,
for all Q∗ , is defined such that
½
0 for r < 1,
Q∗ (r) =
1 for r = 1.
i i−1
wi = Q∗ ( ) − Q∗ ( )
n n
we get ½
0 for i < n,
wi =
1 for i = n.
This is exactly what we previously denoted as W∗ .
1
Figure 1.59a The quantifier all.
For the quantifier there exists we have
½
∗ 0 for r = 0,
Q (r) =
1 for r > 0.
1
Figure 1.60 The quantifier there exists.
98
Consider next the quantifier defined by
Q(r) = r.
This is an identity or linear type quantifier.
In this case we get
i i−1 i i−1 1
wi = Q( ) − Q( )= − = .
n n n n n
This gives us the pure averaging OWA aggregation operator.
1
Figure 1.60a The identity quantifier.
Recapitulating using the approach suggested by Yager if we desire to calculate
FQ (a1 , . . . , an )
for Q being a regular non-decreasing quantifier we proceed as follows:
• (1) Calculate
i i−1
wi = Q( ) − Q( ),
n n
• (2) Calculate
X
n
FQ (ai , . . . , an ) = wi b i
i=1
Exercise 1.6.1 Let W = (0.4, 0.2, 0.1, 0.1, 0.2)T . Calculate disp(W ).
Exercise 1.6.2 Let W = (0.3, 0.3, 0.1, 0.1, 0.2)T . Calculate orness(F ), where the OWA
operator F is derived from W .
Exercise 1.6.3 Prove that 0 ≤ disp(W ) ≤ ln(n) for any n-dimensional weight vector W .
Exercise 1.6.4 Let Q(x) = x2 be a linguistic quintifier. Assume the weights of an OWA
operator F are derived from Q. Calculate the value F (a1 , a2 , a3 , a4 ) for a1 = a2 = 0.6,
a3 = 0.4 and a4 = 0.2. What is the orness measure of F ?
√
Exercise 1.6.5 Let Q(x) = x be a linguistic quintifier. Assume the weights of an OWA
operator F are derived from Q. Calculate the value F (a1 , a2 , a3 , a4 ) for a1 = a2 = 0.6,
a3 = 0.4 and a4 = 0.2. What is the orness measure of F ?
99
1.7 Fuzzy screening systems
In screening problems one usually starts with a large subset, X, of possible alternative
solutions. Each alternative is essentially represented by a minimal amount of information
supporting its appropriateness as the best solution. This minimal amount of informa-
tion provided by each alternative is used to help select a subset A of X to be further
investigated.
Two prototypical examples of this kind of problem can be mentioned.
In the above examples the process of selecting the subset A, required to provide further
information, is called a screening process. In [106] Yager suggests a technique, called fuzzy
screening system, for managing this screening process.
This kinds of screening problems described above besides being characterized as decision
making with minimal information general involve multiple participants in the selection
process. The people whose opinion must be considered in the selection process are called
experts. Thus screening problems are a class of multiple expert decision problems. In
addition each individual expert’s decision is based upon the use of multiple criteria. So we
have ME-MCDM (Multi Expert-Multi Criteria Decision Making) problem with minimal
information.
The fact that we have minimal information associated with each of the alternatives com-
plicates the problem because it limits the operations which can be performed in the
aggregation processes needed to combine the multi-experts as well as multi-criteria. The
Arrow impossibility theorem [1] is a reflection of this difficulty.
Yager [106] suggests an approach to the screening problem which allows for the requisite
aggregations but which respects the lack of detail provided by the information associated
with each alternative. The technique only requires that preference information be ex-
pressed in by elements draw from a scale that essentially only requires a linear ordering.
This property allows the experts to provide information about satisfactions in the form of
a linguistic values such as high, medium, low. This ability to perform the necessary oper-
ations will only requiring imprecise linguistic preference valuations will enable the experts
to comfortably use the kinds of minimally informative sources of information about the
objects described above. The fuzzy screening system is a two stage process.
• In the first stage, individual experts are asked to provide an evaluation of the al-
ternatives. This evaluation consists of a rating for each alternative on each of the
criteria.
100
• In the second stage, the methodology introduced in [104] is used to aggregate the
individual experts evaluations to obtain an overall linguistic value for each object.
This overall evaluation can then be used by the decision maker as an aid in the
selection process.
X = {X1 , . . . , Xp }
A = {A1 , . . . , Ar }
C = {C1 , . . . , Cn }
of criteria which are considered relevant in the choice of the objects to be further
considered.
For each alternative each expert is required to provided his opinion. In particular for each
alternative an expert is asked to evaluate how well that alternative satisfies each of the
criteria in the set C. These evaluations of alternative satisfaction to criteria will be given
in terms of elements from the following scale S:
Outstanding (OU) S7
Very High (VH) S6
High (H) S5
Medium (M) S4
Low S3
Very Low S2
None S1
The use of such a scale provides a natural ordering, Si > Sj if i > j and the maximum
and minimum of any two scores re defined by
max(Si , Sj ) = Si if Si ≥ Sj , min(Si , Sj ) = Sj if Sj ≤ Si
We shall denote the max by ∨ and the min by ∧. Thus for an alternative an expert
provides a collection of n values
{P1 , . . . , Pn }
where Pj is the rating of the alternative on the j-th criteria by the expert. Each Pj is an
element in the set of allowable scores S.
101
Assuming n = 5, a typical scoring for an alternative from one expert would be:
N eg(OU ) =N
N eg(V H) =VL
N eg(H) =L
N eg(M ) =M
N eg(L) =H
N eg(V L) =VH
N eg(N ) = OU
Then the unit score of each alternative by each expert, denoted by U, is calculated as
follows
Example 1.7.1 Consider some alternative with the following scores on five criteria
Criteria: C1 C2 C3 C4 C5
Importance: VH VH M L VL
Score: M L OU VH OU
102
The essential reason for the low performance of this object is that it performed low on
the second criteria which has a very high importance. The formulation of Equation 1.6
can be seen as a generalization of a weighted averaging. Linguistically, this formulation
is saying that
As a result of the first stage, we have for each alternative a collection of evaluations
where Xik is the unit evaluation of the i-th alternative by the k-th expert.
In the second stage the technique for combining the expert’s evaluation to obtain an
overall evaluation for each alternative is based upon the OWA operators.
The first step in this process is for the decision making body to provide an aggregation
function which we shall denote as Q. This function can be seen as a generalization of the
idea of how many experts it feels need to agree on an alternative for it to be acceptable
to pass the screening process. In particular for each number i, where i runs from 1 to r,
the decision making body must provide a value Q(i) indicating how satisfied it would be
in passing an alternative that i of the experts where satisfied with. The values for Q(i)
should be drawn from the scale S described above.
It should be noted that Q should have certain characteristics to make it rational:
• As more experts agree the decision maker’s satisfaction or confidence should increase
• If all the experts are satisfied then his satisfaction should be the highest possible
Q(r) = Outstanding.
• If the decision making body requires all experts to support a alternative then we
get
Q(i) = None for i > r
Q(r) = Outstanding
• If the support of just one expert is enough to make a alternative worthy of consid-
eration then
Q(i) = Outstanding for all i
103
In order to define function Q, we introduce the operation Int[a] as returning the integer
value that is closest to the number a. In the following, we shall let q be the number of
points on the scale and r be the number of experts participating. This function which
emulates the average is denoted as QA (k) and is defined by
QA (k) = Sb(k)
where
q−1
b(k) = Int[1 + (k × )]
r
for all k = 0, 1, . . . , r.
We note that whatever the values of q and r it is always the case that
QA (0) = S1 , QA (r) = Sq
6
b(k) = Int[1 + (k × )] = Int[1 + 2k]
3
and
QA (0) = S1 , QA (1) = S3 , QA (2) = S5 , QA (3) = S7 .
If r = 4 and q = 7 then
b(k) = Int[1 + k × 1.5]
and
QA (0) = S1 , QA (1) = S3 , QA (2) = S4 , QA (3) = S6 , QA (4) = S7 .
Having appropriately selected Q we are now in the position to use the OWA method for
aggregating the expert opinions. Assume we have r experts, each of which has a unit
evaluation for the i-th projected denoted Xik .
The first step in the OWA procedure is to order the Xik ’s in descending order, thus we
shall denote Bj as the j-th highest score among the experts unit scores for the project.
To find the overall evaluation for the ith project, denoted Xi , we calculate
Xi = max{Q(j) ∧ Bj }.
j
In order to appreciate the workings for this formulation we must realize that
• Q(j) ∧ Bj can be seen as an indication of how important the decision maker feels
that the support of at least j experts is.
• The term Q(j) ∧ Bj can be seen as a weighting of an objects j best scores, Bj , and
the decision maker requirement that j people support the project, Q(j).
• The max operator plays a role akin to the summation in the usual numeric averaging
procedure.
104
Example 1.7.2 Assume we have four experts each providing a unit evaluation for project
i obtained by the methodology discussed in the previous section.
Xi1 = M
Xi2 = H
Xi3 = H
Xi4 = VH
Using the methodology suggested thus far we obtain for each alternative an overall rating
Xi . These ratings allow us to obtain a evaluation of all the alternative without resorting
to a numeric scale. The decision making body is now in the position to make its selection
of alternatives that are be passed through the screening process. A level S ∗ from the scale
S is selected and all those alternatives that have an overall evaluation of S ∗ or better are
passed to the next step in the decision process.
Exercise 1.7.1 Consider some alternative with the following scores on five criteria
Criteria: C1 C2 C3 C4 C5 C6
Importance: H VH M L VL M
Score: L VH OU VH OU M
105
1.8 Applications of fuzzy systems
For the past few years, particularly in Japan, USA and Germany, approximately 1,000
commercial and industrial fuzzy systems have been successfully developed. The number of
industrial and commercial applications worldwide appears likely to increase significantly
in the near future.
The first application of fuzzy logic is due to Mamdani of the University of London, U.K.,
who in 1974 designed an experimental fuzzy control for a steam engine. In 1980, a Danish
company (F.L. Smidth & Co. A/S) used fuzzy theory in cement kiln control. Three years
later, Fuji Electric Co., Ltd. (Japan) implemented fuzzy control of chemical injection for
water purification plants.
The first fuzzy controller was exhibited at Second IFSA Congress in 1987. This controller
originated from Omron Corp., a Japanese company which began research in fuzzy logic
in 1984 and has since applied for over 700 patents. Also in 1987, the Sendai Subway
Automatic Train Operations Controller, designed by the Hitachi team, started operat-
ing in Sendai, Japan. The fuzzy logic in this subway system makes the journey more
comfortable with smooth braking and acceleration. In 1989, Omron Corp. demonstrated
fuzzy workstations at the Business Show in Harumi, Japan. Such a workstation is just a
RISC–based computer, equipped with a fuzzy inference board. This fuzzy inference board
is used to store and retrieve fuzzy information, and to make fuzzy inferences.
Product Company
106
television sets etc. In 1993, Sony introduced the Sony PalmTop, which uses a fuzzy
logic decision tree algorithm to perform handwritten (using a computer lightpen) Kanji
character recognition. For instance, if one would write 253, then the Sony Palmtop can
distinguish the number 5 from the letter S.
There are many products based on Fuzzy Logic in the market today. Most of the con-
sumer products in SEA/Japan advertise Fuzzy Logic based products for consumers. We
are beginning to see many automotive applications based on Fuzzy logic. Here are few
examples seen in the market. By no means this list includes all possible fuzzy logic based
products in the market.
The most successful domain has been in fuzzy control of various physical or chemical
characteristics such as temperature, electric current, flow of liquid/gas, motion of ma-
chines, etc. Also, fuzzy systems can be obtained by applying the principles of fuzzy sets
and logic to other areas, for example, fuzzy knowledge-based systems such as fuzzy ex-
pert systems which may use fuzzy IF-THEN rules; ”fuzzy software engineering” which
may incorporate fuzziness in programs and data; fuzzy databases which store and retrieve
fuzzy information: fuzzy pattern recognition which deals with fuzzy visual or audio sig-
nals; applications to medicine, economics, and management problems which involve fuzzy
information processing.
Year Number
1986 ... 8
1987 ... 15
1988 ... 50
1989 ... 100
1990 ... 150
1991 ... 300
1992 ... 800
1993 ... 1500
When fuzzy systems are applied to appropriate problems, particularly the type of problems
described previously, their typical characteristics are faster and smoother response than
with conventional systems. This translates to efficient and more comfortable operations
for such tasks as controlling temperature, cruising speed, for example. Furthermore, this
will save energy, reduce maintenance costs, and prolong machine life. In fuzzy systems,
describing the control rules is usually simpler and easier, often requiring fewer rules, and
thus the systems execute faster than conventional systems. Fuzzy systems often achieve
tractability, robustness, and overall low cost. In turn, all these contribute to better
performance. In short, conventional methods are good for simple problems, while fuzzy
systems are suitable for complex problems or applications that involve human descriptive
or intuitive thinking.
However we have to note some problems and limitations of fuzzy systems which include
[85]
107
• Stability: a major issue for fuzzy control.
There is no theoretical guarantee that a general fuzzy system does not go chaotic
and remains stable, although such a possibility appears to be extremely slim from
the extensive experience.
• Learning capability: Fuzzy systems lack capabilities of learning and have no memory
as stated previously.
This is why hybrid systems, particularly neuro-fuzzy systems, are becoming more
and more popular for certain applications.
• Determining or tuning good membership functions and fuzzy rules are not always
easy.
Even after extensive testing, it is difficult to say how many membership functions
are really required. Questions such as why a particular fuzzy expert system needs so
many rules or when can a developer stop adding more rules are not easy to answer.
The basic steps for developing a fuzzy system are the following
• Determine whether a fuzzy system is a right choice for the problem. If the knowledge
about the system behavior is described in approximate form or heuristic rules, then
fuzzy is suitable. Fuzzy logic can also be useful in understanding and simplifying the
processing when the system behavior requires a complicated mathematical model.
• Identify inputs and outputs and their ranges. Range of sensor measurements typ-
ically corresponds to the range of input variable, and the range of control actions
provides the range of output variable.
• Define a primary membership function for each input and output parameter. The
number of membership functions required is a choice of the developer and depends
on the system behavior.
• Construct a rule base. It is up to the designer to determine how many rules are
necessary.
• Verify that rule base output within its range for some sample inputs, and further
validate that this output is correct and proper according to the rule base for the
given set of inputs.
Several studies show that fuzzy logic is applicable in Management Science (see e.g. [7]).
108
Bibliography
[1] K.J. Arrow, Social Choice and Individual Values (John Wiley & Sons, New York,
1951).
[2] R.A.Bellman and L.A.Zadeh, Decision-making in a fuzzy environment, Manage-
ment Sciences, Ser. B 17(1970) 141-164.
[3] D. Butnariu and E.P. Klement, Triangular Norm-Based Measures and Games
with Fuzzy Coalitions (Kluwer, Dordrecht, 1993).
[4] D. Butnariu, E.P. Klement and S. Zafrany, On triangular norm-based proposi-
tional fuzzy logics, Fuzzy Sets and Systems, 69(1995) 241-255.
[5] E. Canestrelli and S. Giove, Optimizing a quadratic function with fuzzy linear
coefficients, Control and Cybernetics, 20(1991) 25-36.
[6] E. Canestrelli and S. Giove, Bidimensional approach to fuzzy linear goal pro-
gramming, in: M. Delgado, J. Kacprzyk, J.L. Verdegay and M.A. Vila eds.,
Fuzzy Optimization (Physical Verlag, Heildelberg, 1994) 234-245.
[7] C. Carlsson, On the relevance of fuzzy sets in management science methodology,
TIMS/Studies in the Management Sciences, 20(1984) 11-28.
[8] C. Carlsson, Fuzzy multiple criteria for decision support systems, in: M.M. Gup-
ta, A. Kandel and J.B. Kiszka eds., Approximate Reasoning in Expert Systems
(North-Holland, Amsterdam, 1985) 48-60.
[9] C. Carlsson and R.Fullér, Interdependence in fuzzy multiple objective program-
ming, Fuzzy Sets and Systems 65(1994) 19-29.
[10] C. Carlsson and R. Fullér, Fuzzy if-then rules for modeling interdependencies
in FMOP problems, in: Proceedings of EUFIT’94 Conference, September 20-23,
1994 Aachen, Germany (Verlag der Augustinus Buchhandlung, Aachen, 1994)
1504-1508.
[11] C. Carlsson and R. Fullér, Interdependence in Multiple Criteria Decision Making,
Technical Report, Institute for Advanced Management Systems Research, Åbo
Akademi University, No. 1994/6.
[12] C. Carlsson and R.Fullér, Fuzzy reasoning for solving fuzzy multiple objective
linear programs, in: R.Trappl ed., Cybernetics and Systems ’94, Proceedings
of the Twelfth European Meeting on Cybernetics and Systems Research (World
Scientific Publisher, London, 1994) 295-301.
109
[13] C. Carlsson and R.Fullér, Multiple Criteria Decision Making: The Case for In-
terdependence, Computers & Operations Research 22(1995) 251-260.
[15] J.L. Castro, Fuzzy logic contollers are universal approximators, IEEE Transac-
tions on Syst. Man Cybernet., 25(1995) 629-635.
[16] S.M. Chen, A weighted fuzzy reasoning akgorithm for medical diagnosis, Decision
Support Systems, 11(1994) 37-43.
[17] E. Cox, The Fuzzy system Handbook. A Practitioner’s Guide to Building, Using,
and Maintaining Fuzzy Systems (Academic Press, New York, 1994).
[18] M. Delgado, E. Trillas, J.L. Verdegay and M.A. Vila, The generalized ”modus
ponens” with linguistic labels, in: Proceedings of the Second International Con-
ference on Fuzzy Logics andd Neural Network, IIzuka, Japan, 1990 725-729.
[20] J. Dombi, A general class of fuzzy operators, the DeMorgan class of fuzzy opera-
tors and fuziness measures induced by fuzzy operators, Fuzzy Sets and Systems,
8(1982) 149-163.
[24] D. Dubois and H. Prade, Fuzzy Sets and Systems: Theory and Applications
(Academic Press, London, 1980).
[26] D.Dubois and H.Prade, Possibility Theory (Plenum Press, New York,1988).
[27] D.Dubois, H.Prade and R.R Yager eds., Readings in Fuzzy Sets for Intelligent
Systems (Morgan & Kaufmann, San Mateo, CA, 1993).
110
[29] M. Fedrizzi, J. Kacprzyk and S. Zadrozny, An interactive multi-user decision
support system for consensus reaching processes using fuzzy logic with linguistic
quantifiers, Decision Support Systems, 4(1988) 313-327.
[30] M. Fedrizzi and L. Mich, Decision using production rules, in: Proc. of Annual
Conference of the Operational Research Society of Italy, September 18-10, Riva
del Garda. Italy, 1991 118-121.
[31] M. Fedrizzi and R.Fullér, On stability in group decision support systems under
fuzzy production rules, in: R.Trappl ed., Proceedings of the Eleventh European
Meeting on Cybernetics and Systems Research (World Scientific Publisher, Lon-
don, 1992) 471-478.
[35] J.C. Fodor, A remark on constructing t-norms, Fuzzy Sets and Systems, 41(1991)
195–199.
[36] J.C. Fodor, On fuzzy implication operators, Fuzzy Sets and Systems, 42(1991)
293–300.
[37] J.C. Fodor, Strict preference relations based on weak t-norms, Fuzzy Sets and
Systems, 43(1991) 327–336.
[38] J.C. Fodor, Traces of fuzzy binary relations, Fuzzy Sets and Systems, 50(1992)
331–342.
[39] J.C. Fodor, An axiomatic approach to fuzzy preference modelling, Fuzzy Sets
and Systems, 52(1992) 47–52.
[40] J.C. Fodor and M. Roubens, Aggregation and scoring procedures in multicriteria
decision making methods, in: Proceedings of the IEEE International Conference
on Fuzzy Systems, San Diego, 1992 1261–1267.
[41] J.C. Fodor, Fuzzy connectives via matrix logic, Fuzzy Sets and Systems, 56(1993)
67–77.
[42] J.C. Fodor, A new look at fuzzy connectives, Fuzzy Sets and System, 57(1993)
141–148.
[43] J.C. Fodor and M. Roubens, Preference modelling and aggregation procedures
with valued binary relations, in: R. Lowen and M. Roubens eds., Fuzzy Logic:
State of the Art (Kluwer, Dordrecht, 1993) 29–38.
111
[44] J.C. Fodor and M. Roubens, Fuzzy Preference Modelling and Multicriteria De-
cision Aid (Kluwer Academic Publisher, Dordrecht, 1994).
[46] R. Fullér, On Hamacher-sum of triangular fuzzy numbers, Fuzzy Sets and Sys-
tems, 42(1991) 205-212.
[47] R. Fullér, Well-posed fuzzy extensions of ill-posed linear equality systems, Fuzzy
Systems and Mathematics, 5(1991) 43-48.
[48] R. Fullér and B. Werners, The compositional rule of inference: introduction, the-
oretical considerations, and exact calculation formulas, Working Paper, RWTH
Aachen, institut für Wirtschaftswissenschaften, No.1991/7.
[49] R. Fullér, On law of large numbers for L-R fuzzy numbers, in: R. Lowen and
M. Roubens eds., Proceedings of the Fourth IFSA Congress, Volume: Mathemat-
ics, Brussels, 1991 74-77.
[53] R. Fullér and B. Werners, The compositional rule of inference with several re-
lations, in: B.Riecan and M.Duchon eds., Proceedings of the international Con-
ference on Fuzzy Sets and its Applications, Liptovsky Mikulás, Czecho-Slovakia,
February 17-21, 1992 (Math. Inst. Slovak Academy of Sciences, Bratislava, 1992)
39–44.
[54] R. Fullér and H.-J.Zimmermann, Fuzzy reasoning for solving fuzzy mathe-
matical programming problems, Working Paper, RWTH Aachen, institut für
Wirtschaftswissenschaften, No.1992/01.
[56] R. Fullér and H.-J. Zimmermann, Fuzzy reasoning for solving fuzzy mathematical
programming problems, Fuzzy Sets and Systems 60(1993) 121-133.
[57] R. Fullér and E. Triesch, A note on law of large numbers for fuzzy variables,
Fuzzy Sets and Systems, 55(1993).
112
[58] M.M. Gupta and D.H. Rao, On the principles of fuzzy neural networks, Fuzzy
Sets and Systems, 61(1994) 1-18.
[64] J. Kacprzyk, Group decision making with a fuzzy linguistic majority, Fuzzy Sets
and Systems, 18(1986) 105-118.
[65] O. Kaleva, Fuzzy differential equations, Fuzzy Sets and Systems, 24(1987) 301-
317.
[66] A. Kaufmann and M.M. Gupta, Introduction to Fuzzy Arithmetic: Theory and
Applications (Van Nostrand Reinhold, New York, 1991).
[69] L.T. Kóczy, Fuzzy graphs in the evaluation and optimization of networks, Fuzzy
Sets and Systems, 46(1992) 307-319.
[70] L.T. Kóczy and K.Hirota, A Fast algorithm for fuzzy inference by compact rules,
in: L.A. Zadeh and J. Kacprzyk eds., Fuzzy Logic for the Management of Un-
certainty (J. Wiley, New York, 1992) 297-317.
[71] L.T. Kóczy, Approximate reasoning and control with sparse and/or inconsistent
fuzzy rule bases, in: B. Reusch ed., Fuzzy Logic Theorie and Praxis, Springer,
Berlin, 1993 42-65.
[72] L.T. Kóczy and K. Hirota, Ordering, distance and Closeness of Fuzzy Sets, Fuzzy
Sets and Systems, 59(1993) 281-293.
113
[73] L.T. Kóczy, A fast algorithm for fuzzy inference by compact rules, in: L.A. Zadeh
and J. Kacprzyk eds., Fuzzy Logic for the Management of Uncertainty (J. Wiley,
New York, 1993) 297-317.
[74] B.Kosko, Neural networks and fuzzy systems, Prentice-Hall, New Jersey, 1992.
[75] B. Kosko, Fuzzy systems as universal approximators, in: Proc. IEEE 1992 Int.
Conference Fuzzy Systems, San Diego, 1992 1153-1162.
[76] M.Kovács and L.H. Tran, Algebraic structure of centered M -fuzzy numbers,
Fuzzy Sets and Systems, 39(1991) 91–99.
[77] M. Kovács, A stable embedding of ill-posed linear systems into fuzzy systems,
Fuzzy Sets and Systems, 45(1992) 305–312.
[78] J.R.Layne, K.M.Passino and S.Yurkovich, Fuzzy learning control for antiskid
braking system, IEEE Transactions on Contr. Syst. Tech., 1(993) 122-129.
[79] C.-C. Lee, Fuzzy logic in control systems: Fuzzy logic controller - Part I, IEEE
Transactions on Syst., Man, Cybern., 20(1990) 419-435.
[80] C.-C. Lee, Fuzzy logic in control systems: Fuzzy logic controller - Part II, IEEE
Transactions on Syst., Man, Cybern., 20(1990) 404-418.
[81] E.H. Mamdani and S. Assilian, An experiment in linquistic synthesis with a fuzzy
logic controller. International Journal of Man-Machine Studies 7(1975) 1-13.
[82] J.K. Mattila, On some logical points of fuzzy conditional decision making, Fuzzy
Sets and Systems, 20(1986) 137-145.
[83] G.F. Mauer, A fuzzy logic controller for an ABS braking system, IEEE Trans-
actions on Fuzzy Systems, 3(1995) 381-388.
[84] D. McNeil and P, Freiberger, Fuzzy Logic (Simon and Schuster, New York, 1993).
[87] H.T. Nguyen, A note on the extension principle for fuzzy sets, Journal of Math-
ematical Analysis and Applications, 64(1978) 369-380.
[88] S.A. Orlovsky, Calculus of Decomposable Properties. Fuzzy Sets and Decisions
(Allerton Press, 1994).
114
[91] B.Schweizer and A.Sklar, Associative functions and abstract semigroups, Publ.
Math. Debrecen, 10(1963) 69-81.
[92] T. Sudkamp, Similarity, interpolation, and fuzzy rule construction, Fuzzy Sets
and Systems, 58(1993) 73-86.
[93] T.Takagi and M.Sugeno, Fuzzy identification of systems and its applications to
modeling and control, IEEE Trans. Syst. Man Cybernet., 1985, 116-132.
[95] T. Tilli, Fuzzy Logik: Grundlagen, Anwendungen, Hard- und Software (Franzis-
Verlag, München, 1992).
[97] I.B. Turksen, Fuzzy normal forms, Fuzzy Sets and Systems, 69(1995) 319-346.
[98] L.-X. Wang and J.M. Mendel, Fuzzy basis functions, universal approximation,
and orthogonal least-squares learning, IEEE Transactions on Neural Networks,
3(1992) 807-814.
[99] L.-X. Wang, Fuzzy systems are universal approximators, in: Proc. IEEE 1992
Int. Conference Fuzzy Systems, San Diego, 1992 1163-1170.
[101] R.R. Yager, Fuzzy decision making using unequal objectives, Fuzzy Sets and
Systems,1(1978) 87-95.
[102] R.R. Yager, A new methodology for ordinal multiple aspect decisions based on
fuzzy sets, Decision Sciences 12(1981) 589-600.
[103] R.R. Yager ed., Fuzzy Sets and Applications. Selected Papers by L.A.Zadeh (John
Wiley & Sons, New York, 1987).
[105] R.R.Yager, Families of OWA operators, Fuzzy Sets and Systems, 59(1993) 125-
148.
[106] R.R.Yager, Fuzzy Screening Systems, in: R.Lowen and M.Roubens eds., Fuzzy
Logic: State of the Art (Kluwer, Dordrecht, 1993) 251-261.
[107] R.R.Yager, Aggregation operators and fuzzy systems modeling, Fuzzy Sets and
Systems, 67(1994) 129-145.
[108] R.R.Yager and D.Filev, Essentials of Fuzzy Modeling and Control (Wiley, New
York, 1994).
115
[109] T. Yamakawa and K. Sasaki, Fuzzy memory device, in: Proceedings of 2nd IFSA
Congress, Tokyo, Japan, 1987 551-555.
[110] T. Yamakawa, Fuzzy controller hardware system, in: Proceedings of 2nd IFSA
Congress, Tokyo, Japan, 1987.
[111] T. Yamakawa, Fuzzy microprocessors - rule chip and defuzzifier chip, in: Inter-
national Workshop on Fuzzy System Applications, Iizuka, Japan, 1988 51-52.
[112] J. Yen, R. Langari and L.A. Zadeh eds., Industrial Applications of Fuzzy Logic
and Intelligent Systems (IEEE Press, New York, 1995).
[113] L.A. Zadeh, Fuzzy Sets, Information and Control, 8(1965) 338-353.
[114] L.A. Zadeh, Towards a theory of fuzzy systems, in: R.E. Kalman and N. DeClaris
eds., Aspects of Network and System Theory (Hort, Rinehart and Winston, New
York, 1971) 469-490.
[115] L.A. Zadeh, Outline of a new approach to the analysis of complex systems and de-
cision processes, IEEE Transanctins on Systems, Man and Cybernetics, 3(1973)
28-44.
[116] L.A. Zadeh, Concept of a linguistic variable and its application to approximate
reasoning, I, II, III, Information Sciences, 8(1975) 199-249, 301-357; 9(1975)
43-80.
[117] L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and
Systems, 1(1978) 3-28.
[118] L.A. Zadeh, A theory of approximate reasoning, In: J.Hayes, D.Michie and
L.I.Mikulich eds., Machine Intelligence, Vol.9 (Halstead Press, New York, 1979)
149-194.
[120] L.A. Zadeh, Knowledge representation in fuzzy logic, In: R.R.Yager and L.A.
Zadeh eds., An introduction to fuzzy logic applications in intelligent systems
(Kluwer Academic Publisher, Boston, 1992) 2-25.
[121] H.-J. Zimmermann and P. Zysno, Latent connectives in human decision making,
Fuzzy Sets and Systems, 4(1980) 37-51.
[122] H.-J. Zimmermann, Fuzzy set theory and its applications (Kluwer, Dordrecht,
1985).
[123] H.-J. Zimmermann, Fuzzy sets, Decision Making and Expert Systems (Kluwer
Academic Publisher, Boston, 1987).
116
[124] H.-J.Zimmermann and B.Werners, Uncertainty representation in knowledge-
based systems, in: A.S. Jovanovic, K.F. Kussmal, A.C. Lucia and P.P. Bonissone
eds., Proc. of an International Course on Expert Systems in Structural Safety
Assessment Stuttgart, October 2-4, 1989, (Springer-Verlag, Berlin, Heidelberg,
1989) 151-166.
[125] H.-J.Zimmermann, Cognitive sciences, decision technology, and fuzzy sets, In-
formation Sciences, 57-58(1991) 287-295.
117
Chapter 2
1982 Hopfield [14] provided the mathematical foundation for understanding the dynamics
of an important class of networks.
1984 Kohonen [16] developed unsupervised learning networks for feature mapping into
regular arrays of neurons.
1986 Rumelhart and McClelland [22] introduced the backpropagation learning algorithm
for complex, multilayer networks.
Beginning in 1986-87, many neural networks research programs were initiated. The list
of applications that can be solved by neural networks has expanded from small test-size
examples to large practical tasks. Very-large-scale integrated neural network chips have
been fabricated.
118
In the long term, we could expect that artificial neural systems will be used in applications
involving vision, speech, decision making, and reasoning, but also as signal processors such
as filters, detectors, and quality control systems.
Definition 2.1.1 [32] Artificial neural systems, or neural networks, are physical cellular
systems which can acquire, store, and utilize experiental knowledge.
The knowledge is in the form of stable states or mappings embedded in networks that can
be recalled in response to the presentation of cues.
Output patterns
Input patterns
The basic processing elements of neural networks are called artificial neurons, or simply
neurons or nodes.
Each processing unit is characterized by an activity level (representing the state of po-
larization of a neuron), an output value (representing the firing rate of the neuron), a set
of input connections, (representing synapses on the cell and its dendrite), a bias value
(representing an internal resting level of the neuron), and a set of output connections
(representing a neuron’s axonal projections). Each of these aspects of the unit are repre-
sented mathematically by real numbers. Thus, each connection has an associated weight
(synaptic strength) which determines the effect of the incoming input on the activation
level of the unit. The weights may be positive (excitatory) or negative (inhibitory).
x1
w1
ο
θ f
wn
xn
Figure 2.1a A processing element with single output connection.
119
following relationship
X
n
o = f (< w, x >) = f (wT x) = f ( wj xj )
j=1
where w = (w1 , . . . , wn )T ∈ IRn is the weight vector. The function f (wT x) is often
referred to as an activation (or transfer) function. Its domain is the set of activation
values, net, of the neuron model, we thus often use this function as f (net). The variable
net is defined as a scalar product of the weight and input vectors
net =< w, x >= wT x = w1 x1 + · · · + wn xn
and in the simplest case the output value o is computed as
(
1 if wT x ≥ θ
o = f (net) =
0 otherwise,
where θ is called threshold-level and this type of node is called a linear threshold unit.
Example 2.1.1 Suppose we have two Boolean inputs x1 , x2 ∈ {0, 1}, one Boolean output
o ∈ {0, 1} and the training set is given by the following input/output pairs
x1 x2 o(x1 , x2 ) = x1 ∧ x2
1. 1 1 1
2. 1 0 0
3. 0 1 0
4. 0 0 0
Then the learning problem is to find weight w1 and w2 and threshold (or bias) value θ
such that the computed output of our network (which is given by the linear threshold
function) is equal to the desired output for all examples. A straightforward solution is
w1 = w2 = 1/2, θ = 0.6. Really, from the equation
½
1 if x1 /2 + x2 /2 ≥ 0.6
o(x1 , x2 ) =
0 otherwise
it follows that the output neuron fires if and only if both inputs are on.
x2
x1
Figure 2.2 A solution to the learning problem of Boolean and function.
120
Example 2.1.2 Suppose we have two Boolean inputs x1 , x2 ∈ {0, 1}, one Boolean output
o ∈ {0, 1} and the training set is given by the following input/output pairs
x1 x2 o(x1 , x2 ) = x1 ∨ x2
1. 1 1 1
2. 1 0 1
3. 0 1 1
4. 0 0 0
Then the learning problem is to find weight w1 and w2 and threshold value θ such that
the computed output of our network is equal to the desired output for all examples. A
straightforward solution is w1 = w2 = 1, θ = 0.8. Really, from the equation
½
1 if x1 + x2 ≥ 0.8
o(x1 , x2 ) =
0 otherwise
it follows that the output neuron fires if and only if at least one of the inputs is on.
The removal of the threshold from our network is very easy by increasing the dimension
of input patterns. Really, the identity
w1 x1 + · · · + wn xn > θ ⇐⇒ w1 x1 + · · · + wn xn − 1 × θ > 0
means that by adding an extra neuron to the input layer with fixed input value −1 and
weight θ the value of the threshold becomes zero. It is why in the following we suppose
that the thresholds are always equal to zero.
ο ο
θ 0
w1 wn w1 wn θ
x1 xn x1 xn -1
We define now the scalar product of n-dimensional vectors, which plays a very important
role in the theory of neural networks.
Definition 2.1.2 Let w = (xn , . . . , wn )T and x = (x1 , . . . , xn )T be two vectors from IRn .
The scalar (or inner) product of w and x, denoted by < w, x > or wT x, is defined by
X
n
< w, x >= w1 x1 + · · · + wn xn = wj xj
j=1
121
Other definition of scalar product in two dimensional case is
where k.k denotes the Eucledean norm in the real plane, i.e.
q q
kwk = w12 + w22 , kxk = x21 + x22
φ
w2 x2
w1 x1
Figure 2.4 w = (w1 , w2 )T and x = (x1 , x2 )T .
Proof
cos(w, x) = cos((w, 1-st axis) − (x, 1-st axis)) = cos((w, 1-st axis) cos(x, 1-st axis))+
q q q q
sin(w, 1-st axis) sin(x, 1-st axis) = w1 x1 / w1 + w2 x1 + x2 + w2 x2 / w1 + w2 x21 + x22
2 2 2 2 2 2
That is,
q q
kwkkxkcos(w, x) = w12 + w22 x21 + x22 cos(w, x) = w1 x1 + w2 x2 .
From cos π/2 = 0 it follows that < w, x >= 0 whenever w and x are perpendicular. If
kwk = 1 (we say that w is normalized) then | < w, x > | is nothing else but the projection
of x onto the direction of w. Really, if kwk = 1 then we get
The problem of learning in neural networks is simply the problem of finding a set of con-
nection strengths (weights) which allow the network to carry out the desired computation.
The network is provided with a set of example input/output pairs (a training set) and is to
modify its connections in order to approximate the function from which the input/output
pairs have been drawn. The networks are then tested for ability to generalize.
122
The error correction learning procedure is simple enough in conception. The procedure is
as follows: During training an input is put into the network and flows through the network
generating a set of values on the output units. Then, the actual output is compared with
the desired target, and a match is computed. If the output and target match, no change
is made to the net. However, if the output differs from the target a change must be made
to some of the connections.
<w,x>
The perceptron learning rule, introduced by Rosenblatt [21], is a typical error correction
learning algorithm of single-layer feedforward networks with linear threshold activation
function.
ο1 οm
w11 wmn
x1 x2 xn
Figure 2.6 Single-layer feedforward network.
Usually, wij denotes the weight from the j-th input unit to the i-th output unit and wi
denotes the weight vector of the i-th output node.
We are given a training set of input/output pairs
123
for all training patterns k.
The activation function of the output nodes is linear threshold function of the form
½
+1 if < wi , x > ≥ 0
oi (x) = sign(< wi , x >) =
−1 if < wi , x > < 0
and the weight adjustments in the perceptron learning method are performed by
i.e. x belongs to class C1 if there exists an input/output pair (x, 1), and x belongs to class
C2 if there exists an input/output pair (x, −1).
Taking into consideration the definition of the activation function it is easy to see that
we are searching for a weight vector w such that
< w, x > ≥ 0 for each x ∈ C1 , and < w, x > < 0 for each x ∈ C2 .
(x1 , y 1 ), . . . , (xK , y K )
• Step 2 Weigts wi are initialized at small random values, the running error E is set
to 0, k := 1
124
• Step 5 Cumulative cycle error is computed by adding the present error to E
1
E := E + ky − ok2
2
• Step 7 The training cycle is completed. For E = 0 terminate the training session.
If E > 0 then E is set to 0, k := 1 and we initiate a new training cycle by going to
Step 3
The following theorem shows that if the problem has solutions then the perceptron learn-
ing algorithm will find one of them.
Theorem 2.1.1 (Convergence theorem) If the problem is linearly separable then the pro-
gram will go to Step 3 only finetely many times.
The learning constant is assumed to be 0.1. The initial weight vector is w0 = (1, −1, 0)T .
Then the learning according to the perceptron learning rule progresses as follows.
125
Step 2 Input is x2 , desired output is 1. For the present w1 we compute the activation value
0
< w1 , x2 >= (0.8, −1, −0.2) −1 = 1.2
−1
Step 5 Input is x2 , desired output is 1. For the present w4 we compute the activation value
0
< w4 , x2 >= (0.4, −1.1, −0.6) −1 = 1.7
−1
126
Correction is not performed in this step since 1 = sign(0.75), so we let w6 := w5 .
This terminates the learning process, because
< w6 , x1 >= −0.2 < 0, < w6 , x2 >= 1.7 > 0, < w6 , x3 >= 0.75 > 0
Minsky and Papert [20] provided a very careful analysis of conditions under which the
perceptron learning rule is capable of carrying out the required mappings. They showed
that the perceptron can not succesfully solve the problem
x1 x1 o(x1 , x2 )
1. 1 1 0
2. 1 0 1
3. 0 1 1
4. 0 0 0
This Boolean function is known in the literature as exclusive or (XOR). We will refer to
the above function as two-dimensional parity function.
The n-dimensional parity function is a binary Boolean function, which takes the value 1
if we have odd number of 1-s in the input vector, and zero otherwise. For example, the
3-dimensional parity function is defined as
x1 x1 x3 o(x1 , x2 , x3 )
1. 1 1 1 1
2. 1 1 0 0
3. 1 0 1 0
4. 1 0 0 1
5. 0 0 1 1
6. 0 1 1 0
7. 0 1 0 1
8. 0 0 0 0
127
2.2 The delta learning rule
The error correction learning procedure is simple enough in conception. The procedure is
as follows: During training an input is put into the network and flows through the network
generating a set of values on the output units. Then, the actual output is compared with
the desired target, and a match is computed. If the output and target match, no change
is made to the net. However, if the output differs from the target a change must be made
to some of the connections.
Let’s first recall the definition of derivative of single-variable functions.
Definition 2.2.1 The derivative of f at (an interior point of its domain) x, denoted by
f 0 (x), and defined by
f (x) − f (xn )
f 0 (x) = lim
xn →x x − xn
f(x) - f(xn )
x - xn
xn x
A differentiable function is always increasing in the direction of its derivative, and decreas-
ing in the opposite direction. It means that if we want to find one of the local minima of
a function f starting from a point x0 then we should search for a second candidate in the
right-hand side of x0 if f 0 (x0 ) < 0 (when f is decreasing at x0 ) and in the left-hand side
of x0 if f 0 (x0 ) > 0 (when f increasing at x0 ).
The equation for the line crossing the point (x0 , f (x0 )) is given by
y − f (x0 )
= f 0 (x0 )
x−x 0
that is
y = f (x0 ) + (x − x0 )f 0 (x0 )
128
The next approximation, denoted by x1 , is a solution to the equation
f (x0 ) + (x − x0 )f 0 (x0 ) = 0
which is,
f (x0 )
x =x − 0 0
1 0
f (x )
This idea can be applied successively, that is
f (xn )
x n+1
=x − 0 n .
n
f (x )
f(x)
Downhill direction
f(x 0)
x1 x0
Figure 2.9 The downhill direction is negative at x0 .
The above procedure is a typical descent method. In a descent method the next iteration
wn+1 should satisfy the following property
wn+1 = wn − ηf 0 (wn )
i.e. the value of f at wn+1 is smaller than its value at previous approximation wn .
129
Each iteration of a descent method calculates a downhill direction (opposite of the direc-
tion of the derivative) at wn which means that for a sufficiently small η > 0 the inequality
wn+1 = wn − ηf 0 (wn ).
Let f : IRn → IR be a real-valued function and let e ∈ IRn with kek = 1 be a given
direction.The derivative of f with respect e at w is defined as
f (w + te) − f (w)
∂e f (w) = lim
t→+0 t
i-th
z}|{
If e = (0, . . . 1 . . . , 0)T , i.e. e is the i-th basic direction then instead of ∂e f (w) we write
∂i f (w), which is defined by
f (w1 , . . . , wi + t, . . . wn ) − f (w1 , . . . , wi , . . . , wn )
∂i f (w) = lim
t→+0 t
f(w)
f(w + te)
w
e te
w + te
Figure 2.10 The derivative of f with respect to the direction e..
Example 2.2.1 Let f (w1 , w2 ) = w12 + w22 then the gradient of f is given by
130
The gradient vector always points to the uphill direction of f . The downhill (steepest
descent) direction of f at w is the opposite of the uphill direction, i.e. the downhill
direction is −f 0 (w), which is
Suppose we are given a single-layer network with n input units and m linear output units,
i.e. the output of the i-th neuron can be written as
{(x1 , y 1 ), . . . , (xK , y K )}
ο1 οm
w11 wmn
x1 x2 xn
Figure 2.11 Single-layer feedforward network with m output units
The basic idea of the delta learning rule is to define a measure of the overall performance
of the system and then to find a way to optimize that performance. In our network, we
can define the performance of the system as
X
K
1X K
E= Ek = ky k − ok k2
k=1
2 k=1
That is
1X K X m
1X K X m
E= (yik − oki )2 = (y k − < wi , xk >)2
2 k=1 i=1 2 k=1 i=1 i
where i indexes the output units; k indexes the input/output pairs to be learned; yik
indicates the target for a particular output unit on a particular pattern; oki :=< wi , xk >
indicates the actual output for that unit on that pattern; and E is the total error of
131
the system. The goal, then, is to minimize this function. It turns out, if the output
functions are differentiable, that this problem has a simple solution: namely, we can
assign a particular unit blame in proportion to the degree to which changes in that unit’s
activity lead to changes in the error. That is, we change the weights of the system in
proportion to the derivative of the error with respect to the weights.
The rule for changing weights following presentation of input/output pair (xk , y k ) is given
by the gradient descent method, i.e. we minimize the quadratic error function by using
the following iteration process
∂Ek
wij := wij − η
∂wij
where η > 0 is the learning rate.
Let us compute now the partial derivative of the error function Ek with respect to wij
∂Ek ∂Ek ∂netki
= k
= −(yik − oki )xkj
∂wij ∂neti ∂wij
where netki = wi1 xk1 + · · · + win xkn .
That is,
wij := wij + η(yik − oki )xkj
for j = 1, . . . , n.
Definition 2.2.3 The error signal term, denoted by δik and called delta, produced by the
i-th output neuron is defined as
∂Ek
δik = − = (yik − oki )
∂netki
For linear output units δik is nothing else but the difference between the desired and com-
puted output values of the i-th neuron.
132
Summary 2.2.1 The delta learning rule with linear activation functions.
Given are K training pairs arranged in the training set
{(x1 , y 1 ), . . . , (xK , y K )}
• Step 2 Weights wij are initialized at small random values, k := 1, and the running
error E is set to 0
for i = 1, . . . , m.
1
E := E + ky − ok2
2
• Step 7 The training cycle is completed. For E < Emax terminate the training
session. If E > Emax then E is set to 0 and we initiate a new training cycle by going
back to Step 3
133
2.2.1 The delta learning rule with semilinear activation function
In many practical cases instead of linear activation functions we use semi-linear ones. The
next table shows the most-often used types of activation functions.
The derivatives of sigmoidal activation functions are extensively used in learning algo-
rithms.
134
Then f 0 satisfies the following equality
We shall describe now the delta learning rule with semilinear activation function. For
simplicity we explain the learning algorithm in the case of a single-output network.
x1 w1
ο = f(net)
f
xn wn
Figure 2.14 Single neuron network.
K. xK = (xK K
1 , . . . xn ) yK
The system first uses the input vector, xk , to produce its own output vector, ok , and then
compares this with the desired output, y k . Let
µ ¶2
1 k 1 k 1 k 1
Ek = (y − o ) = (y − o(< w, x >)) =
k 2 k 2
y −
2 2 2 1 + exp (−wT xk )
be our measure of the error on input/output pattern k and let
X
K
E= Ek
k=1
135
be our overall measure of the error.
The rule for changing weights following presentation of input/output pair k is given by
the gradient descent method, i.e. we minimize the quadratic error function by using the
following iteration process
w := w − ηEk0 (w).
Let us compute now the gradient vector of the error function Ek at point w:
µ · ¸2 ¶
0
d 1 1
Ek (w) = × y −k
=
dw 2 1 + exp (−wT xk )
· ¸2
1 d k 1
× y − = −(y k − ok )ok (1 − ok )xk
2 dw 1 + exp (−wT xk )
where ok = 1/(1 + exp (−wT xk )).
Therefore our learning rule for w is
Summary 2.2.2 The delta learning rule with unipolar sigmoidal activation function.
Given are K training pairs arranged in the training set
{(x1 , y 1 ), . . . , (xK , y K )}
• Step 2 Weigts w are initialized at small random values, k := 1, and the running
error E is set to 0
1
E := E + (y − o)2
2
136
• Step 6 If k < K then k := k + 1 and we continue the training by going back to
Step 3, otherwise we go to Step 7
• Step 7 The training cycle is completed. For E < Emax terminate the training
session. If E > Emax then E is set to 0 and we initiate a new training cycle by going
back to Step 3
In this case, without hidden units, the error surface is shaped like a bowl with only one
minimum, so gradient descent is guaranteed to find the best set of weights. With hidden
units, however, it is not so obvious how to compute the derivatives, and the error surface
is not concave upwards, so there is the danger of getting stuck in local minima.
We illustrate the delta learning rule with bipolar sigmoidal activation function f (t) =
2/(1 + exp −t) − 1.
Example 2.2.2 The delta learning rule with bipolar sigmoidal activation function.
Given are K training pairs arranged in the training set
{(x1 , y 1 ), . . . , (xK , y K )}
• Step 2 Weigts w are initialized at small random values, k := 1, and the running
error E is set to 0
1
w := w + η(y − o)(1 − o2 )x
2
1
E := E + (y − o)2
2
• Step 7 The training cycle is completed. For E < Emax terminate the training
session. If E > Emax then E is set to 0 and we initiate a new training cycle by going
back to Step 3
137
2.3 The generalized delta learning rule
We now focus on generalizing the delta learning rule for feedforward layered neural net-
works. The architecture of the two-layer network considered below is shown in Figure
2.16. It has strictly speaking, two layers of processing neurons. If, however, the layers of
nodes are counted, then the network can also be labeled as a three-layer network. There
is no agreement in the literature as to which approach is to be used to describe network
architectures. In this text we will use the term layer in reference to the actual number
of existing and processing neuron layers. Layers with neurons whose outputs are not
directly accesible are called internal or hidden layers. Thus the network of Figure 2.16 is
a two-layer network, which can be called a single hidden-layer network.
Ο1 m output nodes Οm
W11
WmL
L hidden nodes
x1 x2 n input nodes xn
Figure 2.16 Layered neural network with two continuous perceptron layers.
The generalized delta rule is the most often used supervised learning algorithm of feedfor-
ward multi-layer neural networks. For simplicity we consider only a neural network with
one hidden layer and one output node.
Ο
W1 WL
x1 x2 xn
Figure 2.16a Two-layer neural network with one output node.
138
The measure of the error on an input/output training pattern (xk , y k ) is defined by
1
Ek (W, w) = (y k − Ok )2
2
where Ok is the computed output and the overall measure of the error is
X
K
E(W, w) = Ek (W, w).
k=1
If an input vector xk is presented to the network then it generates the following output
1
Ok =
1 + exp(−W T ok )
where ok is the output vector of the hidden layer
1
okl =
1 + exp(−wlT xk )
and wl denotes the weight vector of the l-th hidden neuron, l = 1, . . . , L.
The rule for changing weights following presentation of input/output pair k is given by
the gradient descent method, i.e. we minimize the quadratic error function by using the
following iteration process
∂Ek (W, w)
W := W − η ,
∂W
∂Ek (W, w)
wl := wl − η ,
∂wl
for l = 1, . . . , L, and η > 0 is the learning rate.
By using the chain rule for derivatives of composed functions we get
· ¸2
∂Ek (W, w) 1 ∂ 1
= y −
k
T k
= −(y k − Ok )Ok (1 − Ok )ok
∂W 2 ∂W 1 + exp(−W o )
i.e. the rule for changing weights of the output unit is
W := W + η(y k − Ok )Ok (1 − Ok )ok = W + ηδk ok
that is
Wl := Wl + ηδk okl ,
for l = 1, . . . , L, and we have used the notation δk = (y k − Ok )Ok (1 − Ok ).
Let us now compute the partial derivative of Ek with respect to wl
∂Ek (W, w)
= −Ok (1 − Ok )Wl okl (1 − okl )xk
∂wl
i.e. the rule for changing weights of the hidden units is
wl := wl + ηδk Wl okl (1 − okl )xk , l = 1, . . . , L.
that is
wlj := wlj + ηδk Wl okl (1 − okl )xkj , j = 1, . . . , n.
139
Summary 2.3.1 The generalized delta learning rule (error backpropagation learning)
We are given the training set
{(x1 , y 1 ), . . . , (xK , y K )}
• Step 2 Weigts w are initialized at small random values, k := 1, and the running
error E is set to 0
1
ol =
1 + exp(−wlT x)
W := W + ηδo
wl = wl + ηδWl ol (1 − ol )x, l = 1, . . . , L
1
E := E + (y − O)2
2
• Step 8 The training cycle is completed. For E < Emax terminate the training
session. If E > Emax then E := 0, k := 1 and we initiate a new training cycle by
going back to Step 3
Exercise 2.3.1 Derive the backpropagation learning rule with bipolar sigmoidal activa-
tion function f (t) = 2/(1 + exp −t) − 1.
140
2.3.1 Effectivity of neural networks
Funahashi [8] showed that infinitely large neural networks with a single hidden layer
are capable of approximating all continuous functions. Namely, he proved the following
theorem
Theorem 2.3.1 Let φ(x) be a nonconstant, bounded and monotone increasing continuous
function. Let K ⊂ IRn be a compact set and
f : K → IR
X
N Xn
f˜(x1 , . . . , xn ) = wi φ( wij xj )
i=1 j=1
satisfies
kf − f˜k∞ = sup |f (x) − f˜(x)| ≤ ².
x∈K
In other words, any continuous mapping can be approximated in the sense of uniform
topology on K by input-output mappings of two-layers networks whose output functions
for the hidden layer are φ(x) and are linear for the output layer.
ο = Σwiφ(οi)
w1 wm
ο1 =φ(Σ w1j xj) οm =φ(Σ wmj xj)
w1n
wmn
w11
x1 x2 xn
141
Theorem 2.3.2 (Stone-Weierstrass) Let domain K be a compact space of n dimensions,
and let G be a set of continuous real-valued functions on K, satisfying the following cri-
teria:
1. The constant function f (x) = 1 is in G.
2. For any two points x1 6= x2 in K, there is an f in G such that f (x1 ) 6= f (x2 ).
3. If f1 and f2 are two functions in G, then f g and α1 f1 + α2 f2 are in G for any two real
numbers α1 and α2 .
Then G is dense in C(K), the set of continuous real-valued functions on K. In other
words, for any ² > 0 and any function g in C(K), there exists g in G such that
kf − gk∞ = sup |f (x) − g(x)| ≤ ².
x∈K
142
2.4 Winner-take-all learning
Unsupervised classification learning is based on clustering of input data. No a priori
knowledge is assumed to be available regarding an input’s membership in a particular
class. Rather, gradually detected characteristics and a history of training will be used to
assist the network in defining classes and possible boundaries between them.
Clustering is understood to be the grouping of similar objects and separating of dissimilar
ones.
We discuss Kohonen’s network [16], which classifies input vectors into one of the specified
number of m categories, according to the clusters detected in the training set
{x1 , . . . , xK }.
The learning algorithm treats the set of m weight vectors as variable vectors that need
to be learned. Prior to the learning, the normalization of all (randomly chosen) weight
vectors is required.
ο1 οm
w1n wm1
w11 wmn
x1 x2 xn
Figure 2.18 The winner-take-all learning network.
The weight adjustment criterion for this mode of training is the selection of wr such that
kx − wr k = min kx − wi k
i=1,... ,m
The index r denotes the winning neuron number corresponding to the vector wr , which
is the closest approximation of the current input x. Using the equality
Taking into consideration that kwi k = 1, ∀i ∈ {1, . . . , m} the scalar product < wi , x > is
nothing else but the projection of x on the direction of wi . It is clear that the closer the
vector wi to x the bigger the projection of x on wi .
143
Note that < wr , x > is the activation value of the winning neuron which has the largest
value neti , i = 1, . . . , m.
w2
w3 w1
dkx − wk2 d d
= (< x − w, x − w >) = (< x, x > −2 < w, x > + < w, w >) =
dw dw dw
d d d
(< x, x >) − (2 < w, x >) + (< w, w >) =
dw dw dw
d d 2
−2 × (w1 x1 + · · · + wn xn ) + (w1 + · · · + wn2 ) =
dw dw
· ¸T
d d
−2 × (w1 x1 + · · · + wn xn ), . . . , (w1 x1 + · · · + wn xn ) +
dw1 dwn
· ¸T
d d
(w12 + · · · + wn2 ), . . . , (w2 + · · · + wn2 ) =
dw1 dwn 1
144
−2(x1 , . . . , xn )T + 2(w1 , . . . , wn )T = −2(x − w)
It seems reasonable to reward the weights of the winning neuron with an increment of
weight in the negative gradient direction, thus in the direction (x − wr ). We thus have
wr := wr + η(x − wr ) (2.1)
where η is a small lerning constant selected heuristically, usually between 0.1 and 0.7.
The remaining weight vectors are left unaffected.
Summary 2.4.1 Kohonen’s learning algorithm can be summarized in the following three
steps
wr := wr + η(x − wr ) = (1 − η)wr + ηx
it follows that the updated weight vector is a convex linear combination of the old weight
and the pattern vectors.
w 2 : = (1- η)w2 + ηx
x
w2
w3 w1
In the end of the training process the final weight vectors point to the center of gravity
of classes.
145
The network will only be trainable if classes/clusters of patterns are linearly separable
from other classes by hyperplanes passing through origin.
To ensure separability of clusters with a priori unknown numbers of training clusters,
the unsupervised training can be performed with an excessive number of neurons, which
provides a certain separability safety margin.
w2
w3
w1
Figure 2.21 The final weight vectors point to the center of gravity of the classes.
During the training, some neurons are likely not to develop their weights, and if their
weights change chaotically, they will not be considered as indicative of clusters.
Therefore such weights can be omitted during the recall phase, since their output does not
provide any essential clustering information. The weights of remaining neurons should
settle at values that are indicative of clusters.
Another learning extension is possible for this network when the proper class for some
patterns is known a priori [29]. Although this means that the encoding of data into
weights is then becoming supervised, this information accelerates the learning process
significantly. Weight adjustments are computed in the superviesed mode as in (2.1), i.e.
and only for correct classifications. For inproper clustering responses of the network, the
weight adjustment carries the opposite sign compared to (2.2). That is, η > 0 for proper
node responses, and η < 0 otherwise, in the supervised learning mode for the Kohonen
layer.
Another mofification of the winner-take-all learning rule is that both the winners’ and
losers’ weights are adjusted in proportion to their level of responses. This is called leakly
competative learning and provides more subtle learning in the case for which clusters may
be hard to distinguish.
146
2.5 Applications of artificial neural networks
There are large classes of problems that appear to be more amenable to solution by neural
networks than by other available techniques. These tasks often involve ambiguity, such as
that inherent in handwritten character recognition. Problems of this sort are difficult to
tackle with conventional methods such as matched filtering or nearest neighbor classifica-
tion, in part because the metrics used by the brain to compare patterns may not be very
closely related to those chosen by an engineer designing a recognition system. Likewise,
because reliable rules for recognizing a pattern are usually not at hand, fuzzy logic and
expert system designers also face the difficult and sometimes impossible task of finding
acceptable descriptions of the complex relations governing class inclusion. In trainable
neural network systems, these relations are abstracted directly from training data. More-
over, because neural networks can be constructed with numbers of inputs and outputs
ranging into thousands, they can be used to attack problems that require consideration of
more input variables than could be feasibly utilized by most other approaches. It should
be noted, however, that neural networks will not work well at solving problems for which
sufficiently large and general sets of training data are not obtainable. Drawing heavily
on [25] we provide a comprehensive list of applications of neural networks in Industry,
Business and Science.
• Control of sound and vibration Active control of vibration and noise is accom-
plished by using an adaptive actuator to generate equal and opposite vibration and
noise. This is being used in air-conditioning systems, in automotive systems, and
in industrial applications.
• Credit card fraud detection. Several banks and credit card companies including
American Express, Mellon Bank, First USA Bank, and others are currently using
neural networks to study patterns of credit card usage and and to detect transactions
that are potentially fraudulent.
147
order-processing center and at the state of Wyoming’s Department of revenue. In
the June 1992 issue of Systems Integration Business, Dennis Livingston reports that
before implementing the system, Wyoming was losing an estimated $300,000 per
year in interest income because so many cheks were being deposited late. Cardiff
Software offers a product called Teleform which uses Nestor’s hand-printed char-
acter recognition system to convert a fax machine into an OCR scanner. Poqet
Computer, now a subsidiary of Fujitsu, uses Nestor’s NestorWriter neural network
software to perform handwriting recognition for the penbased PC it announced in
January 1992 [26].
• Cursive handwriting recognition. Neural networks have proved useful in the
development of algorithms for on-line cursive handwriting recognition [23]: A recent
startup company in Palo Alto, Lexicus, beginning with this basic technology has
developed an impressive PC-based cursive handwriting system.
• Quality control in manufacturing. Neural networks are being used in a large
number of quality control and quality assurance programs throughout industry.
Applications include contaminant-level detection from spectroscopy data at chemical
plants and loudspeaker defect classification by CTS Electronics.
• Event detection in particle accelerators.
• Petroleum exploration. Oil companies including Arco and Texaco are using
neural networks to help determine the locations of underground oil and gas deposits.
• Medical applications. Commercial products by Neuromedical Systems Inc. are
used for cancer screening and other medical applications [28]. The company markets
electrocardiograph and pap smear systems that rely on neural network technology.
The pap smear system. Papnet, is able to help cytotechnologists spot cancerous
cells, drastically reducing false/negative classifications. The system is used by the
U.S. Food and Drug Administration [7].
• Financial forecasting and portfolio management. Neural networks are used
for financial forecasting at a large number of investment firms and financial entities
including Merill Lynch & Co., Salomon Brothers, Shearson Lehman Brothers Inc.,
Citibank, and the World Bank. Using neural networks trained by genetic algorithms,
Citibank’s Andrew Colin claims to be able to earn 25 % returns per year investing
in the currency markets. A startup company, Promised Land Technologies, offers a
$249 software package that is claimed to yield impressive annual returns [27].
• Loan approval. Chase Manhattan Bank reportedly uses a hybrid system utilizing
pattern analysis and neural networks to evaluate corporate loan risk. Robert Marose
reports in the May 1990 issue of AI Expert that the system, Creditview, helps loan
officers estimate the credit worthiness of corporate loan candidates.
• Real estate analysis
• Marketing analysis. The Target Marketing System developed by Churchill Sys-
tem is currently in use by Veratex Corp. to optimize marketing strategy and cut
marketing costs by removing unlikely future customers from a list of potential cus-
tomers [10].
148
• Electric arc furnace electrode position control. Electric arc furnaces are used
to melt scrap steel. The Intelligent Arc furnace controller systems installed by
Neural Applications Corp. are reportedly saving millions of dollars per year per
furnace in increased furnace through-put and reduced electrode wear and electricity
consumption. The controller is currently being installed at furnaces worldwide.
• Semiconductor process control. Kopin Corp. has used neural networks to cut
dopant concentration and deposition thickness errors in solar cell manufacturing by
more than a factor of two.
• Automobile applications. Ford Motor Co., General Motors, and other auto-
mobile manufacturers are currently researching the possibility of widespread use of
neural networks in automobiles and in automobile production. Some of the areas
that are yielding promising results in the laboratory include engine fault detection
and diagnosis, antilock brake control, active-suspension control, and idle-speed con-
trol. General Motors is having preliminary success using neural networks to model
subjective customer ratings of automobiles based on their dynamic characteristics
to help engineers tailor vehicles to the market.
149
• Biomedical applications. Neural networks are rapidly finding diverse applica-
tions in the biomedical sciences. They are being used widely in research on amino
acid sequencing in RNA and DNA, ECG and EEG waveform classification, pre-
diction of patients’ reactions to drug treatments, prevention of anesthesia-related
accidents, arrhythmia recognition for implantable defibrillators patient mortality
predictions, quantitative cytology, detection of breast cancer from mammograms,
modeling schizophrenia, clinical diagnosis of lowerback pain, enhancement and clas-
sification of medical images, lung nodule detection, diagnosis of hepatic masses,
prediction of pulmonary embolism likelihood from ventilation-perfusion lung scans,
and the study of interstitial lung disease.
• Control of copies. The Ricoh Corp. has successfully employed neural learning
techniques for control of several voltages in copies in order to preserve uniform copy
quality despite changes in temperature, humidity, time since last copy, time since
change in toner cartridge, and other variables. These variables influence copy quality
in highly nonlinear ways, which were learned through training of a backpropagation
network.
Perhaps the most important advantage of neural networks is their adaptivity. Neural
networks can automatically adjust their parameters (weights) to optimize their behavior
as pattern recognizers, decision makers, system controllers, predictors, and so on.
Self-optimization allows the neural network to ”design” itself. The system designer first
defines the neural network architecture, determines how the network connects to other
parts of the system, and chooses a training methodology for the network. The neural
network then adapts to the application. Adaptivity allows the neural network to perform
well even when the environment or the system being controlled varies over time. There
are many control problems that can benefit from continual nonlinear modeling and adap-
tation. Neural networks, such as those used by Pavilion in chemical process control, and
by Neural Application Corp. in arc furnace control, are ideally suited to track problem
solutions in changing environments. Additionally, with some ”programmability”, such as
the choices regarding the number of neurons per layer and number of layers, a practitioner
can use the same neural network in a wide variety of applications. Engineering time is
thus saved.
Another example of the advantages of self-optimization is in the field of Expert Systems. In
some cases, instead of obtaining a set of rules through interaction between an experienced
expert and a knowledge engineer, a neural system can be trained with examples of expert
behavior.
150
Bibliography
151
[16] T.Kohonen, Self-organization and Associative Memory, (Springer-Verlag, New
York 1984).
[17] S.Y.Kung, Digital Neural Networks (Prentice Hall, Englewood Cliffs, New York,
1993).
[18] V. Kurkova, Kolmogorov’s theorem and multilayer neural networks, Neural Net-
works, 5(1992) 501-506.
[19] W.S.McCulloch and W.A.Pitts, A logical calculus of the ideas imminent in ner-
vous activity, Bull. Math. Biophys. 5(1943) 115-133.
[20] M. Minsky and S. Papert, Perceptrons (MIT Press, Cambridge, Mass., 1969).
[21] F.Rosenblatt, The perceptron: A probabilistic model for information storage and
organization in the brain, Physic. Rev, 65(1958) 386-408.
[22] D.E.Rumelhart and J.L. McClelland and the PDP Research Group, Parallel
Distributed Processing: Explorations in the Microstructure of Cognition (MIT
Press/Bradford Books, Cambridge, Mass., 1986).
[23] D.E. Rumelhart, Theory to practice: A case study - recognizing cursive hand-
writing. In Proceedings of the Third NEC Research Symposium. SIAM, Philadel-
phia, Pa., 1993.
[24] D.E.Rumelhart, B.Widrow and M.A.Lehr, The basic ideas in neural networks,
Communications of ACM, 37(1994) 87-92.
[25] D.E.Rumelhart, B.Widrow and M.A.Lehr, Neural Networks: Applications in
Industry, Business and Science, Communications of ACM, 37(1994) 93-105.
[26] E.I. Schwartz and J.B. Treece, Smart programs go to work: How applied intelli-
gence software makes decisions for the real world. Business Week (Mar. 2, 1992)
97-105.
[27] E.I. Schwartz, Where neural networks are already at work: Putting AI to work
in the markets, Business Week (Nov. 2, 1992) 136-137.
[28] J. Shandle, Neural networks are ready for prime time, Elect. Des., (February 18,
1993), 51-58.
[29] P.I.Simpson, Artificial Neural Systems: Foundations, Paradigms, Applications,
and Implementation (Pergamon Press, New York, 1990).
[30] P.D.Wasserman, Advanced Methods in Neural Computing, Van Nostrand Rein-
hold, New York 1993.
[31] H. White, Connectionist Nonparametric Regression: Multilayer feedforward Net-
works Can Learn Arbitrary Mappings, Neural Networks 3(1990) 535-549.
[32] J.M.Zurada, Introduction to Artificial Neural Systems (West Publishing Com-
pany, New York, 1992).
152
Chapter 3
The computational process envisioned for fuzzy neural systems is as follows. It starts with
the development of a ”fuzzy neuron” based on the understanding of biological neuronal
morphologies, followed by learning mechanisms. This leads to the following three steps
in a fuzzy neural computational process
153
• development of fuzzy neural models motivated by biological neurons,
Neural Decisions
Fuzzy
Interface Perception as Network
neural inputs
(Neural
outputs)
Linguistic Learning
statements algorithm
Neural
Inputs Neural Neural outputs Fuzzy Decisions
Network Inference
Learning
algorithm
Neural networks are used to tune membership functions of fuzzy systems that are employed
as decision-making systems for controlling equipment. Although fuzzy logic can encode
expert knowledge directly using rules with linguistic labels, it usually takes a lot of time
to design and tune the membership functions which quantitatively define these linquistic
labels. Neural network learning techniques can automate this process and substantially
reduce development time and cost while improving performance.
In theory, neural networks, and fuzzy systems are equivalent in that they are convertible,
yet in practice each has its own advantages and disadvantages. For neural networks, the
knowledge is automatically acquired by the backpropagation algorithm, but the learning
154
process is relatively slow and analysis of the trained network is difficult (black box).
Neither is it possible to extract structural knowledge (rules) from the trained neural
network, nor can we integrate special information about the problem into the neural
network in order to simplify the learning procedure.
Fuzzy systems are more favorable in that their behavior can be explained based on fuzzy
rules and thus their performance can be adjusted by tuning the rules. But since, in general,
knowledge acquisition is difficult and also the universe of discourse of each input variable
needs to be divided into several intervals, applications of fuzzy systems are restricted to
the fields where expert knowledge is available and the number of input variables is small.
To overcome the problem of knowledge acquisition, neural networks are extended to au-
tomatically extract fuzzy rules from numerical data.
Cooperative approaches use neural networks to optimize certain parameters of an ordinary
fuzzy system, or to preprocess data and extract fuzzy (control) rules from data.
Based upon the computational process involved in a fuzzy-neuro system, one may broadly
classify the fuzzy neural structure as feedforward (static) and feedback (dynamic).
A typical fuzzy-neuro system is Berenji’s ARIC (Approximate Reasoning Based Intelligent
Control) architecture [9]. It is a neural network model of a fuzy controller and learns by
updating its prediction of the physical system’s behavior and fine tunes a predefined
control knowledge base.
AEN
x v r (error signal)
Predict
Updating weights
x u(t)
Stochastic u'(t)
Action Physical
Modifier System
x
p
Neural network
System state
155
This kind of architecture allows to combine the advantages of neural networks and fuzzy
controllers. The system is able to learn, and the knowledge used within the system has
the form of fuzzy IF-THEN rules. By predefining these rules the system has not to learn
from scratch, so it learns faster than a standard neural control system.
ARIC consists of two coupled feed-forward neural networks, the Action-state Evaluation
Network (AEN) and the Action Selection Network (ASN). The ASN is a multilayer neural
network representation of a fuzzy controller. In fact, it consists of two separated nets,
where the first one is the fuzzy inference part and the second one is a neural network
that calculates p[t, t + 1], a measure of confidence associated with the fuzzy inference value
u(t + 1), using the weights of time t and the system state of time t + 1. A stochastic
modifier combines the recommended control value u(t) of the fuzzy inference part and the
so called ”probability” value p and determines the final output value
of the ASN. The hidden units zi of the fuzzy inference network represent the fuzzy rules,
the input units xj the rule antecedents, and the output unit u represents the control
action, that is the defuzzified combination of the conclusions of all rules (output of hid-
den units). In the input layer the system state variables are fuzzified. Only monotonic
membership functions are used in ARIC, and the fuzzy labels used in the control rules
are adjusted locally within each rule. The membership values of the antecedents of a rule
are then multiplied by weights attached to the connection of the input unit to the hidden
unit. The minimum of those values is its final input. In each hidden unit a special mono-
tonic membership function representing the conclusion of the rule is stored. Because of
the monotonicity of this function the crisp output value belonging to the minimum mem-
bership value can be easily calculated by the inverse function. This value is multiplied
with the weight of the connection from the hidden unit to the output unit. The output
value is then calculated as a weighted average of all rule conclusions.
The AEN tries to predict the system behavior. It is a feed-forward neural network with
one hidden layer, that receives the system state as its input and an error signal r from the
physical system as additional information. The output v[t, t0 ] of the network is viewed as
a prediction of future reinforcement, that depends of the weights of time t and the system
state of time t0 , where t0 may be t or t + 1. Better states are characterized by higher
reinforcements. The weight changes are determined by a reinforcement procedure that
uses the ouput of the ASN and the AEN. The ARIC architecture was applied to cart-pole
balancing and it was shown that the system is able to solve this task [9].
156
3.1.1 Fuzzy neurons
Consider a simple neural net in Figure 3.4. All signals and weights are real numbers. The
two input neurons do not change the input signals so their output is the same as their
input. The signal xi interacts with the weight wi to produce the product
pi = wi xi , i = 1, 2.
The input information pi is aggregated, by addition, to produce the input
net = p1 + p2 = w1 x1 + w2 x2
to the neuron. The neuron uses its transfer function f , which could be a sigmoidal
function, f (x) = (1 + e−x )−1 , to compute the output
y = f (net) = f (w1 x1 + w2 x2 ).
This simple neural net, which employs multiplication, addition, and sigmoidal f , will be
called as regular (or standard) neural net.
x1
w1
y = f(w1x1 +w2 x2 )
x2 w2
If we employ other operations like a t-norm, or a t-conorm, to combine the incoming data
to a neuron we obtain what we call a hybrid neural net.
These modifications lead to a fuzzy neural architecture based on fuzzy arithmetic op-
erations. Let us express the inputs (which are usually membership degrees of a fuzzy
concept) x1 , x2 and the weigths w1 , w2 over the unit interval [0, 1].
A hybrid neural net may not use multiplication, addition, or a sigmoidal function (because
the results of these operations are not necesserily are in the unit interval).
Definition 3.1.1 A hybrid neural net is a neural net with crisp signals and weights and
crisp transfer function. However,
We emphasize here that all inputs, outputs and the weights of a hybrid neural net are
real numbers taken from the unit interval [0, 1]. A processing element of a hybrid neural
net is called fuzzy neuron. In the following we present some fuzzy neurons.
157
Definition 3.1.2 (AND fuzzy neuron [74, 75])
The signal xi and wi are combined by a triangular conorm S to produce
pi = S(wi , xi ), i = 1, 2.
of the neuron.
So, if T = min and S = max then the AND neuron realizes the min-max composition
y = min{w1 ∨ x1 , w2 ∨ x2 }.
x1
w1
y = T(S(w1, x1), S(w 2, x2))
x2 w2
pi = T (wi , xi ), i = 1, 2.
of the neuron.
x1
w1
y = S(T(w1, x1), T(w2, x2))
x2 w2
So, if T = min and S = max then the AND neuron realizes the max-min composition
y = max{w1 ∧ x1 , w2 ∧ x2 }.
158
The AND and OR fuzzy neurons realize pure logic operations on the membership values.
The role of the connections is to differentiate between particular leveles of impact that
the individual inputs might have on the result of aggregation. We note that (i) the higher
the value wi the stronger the impact of xi on the output y of an OR neuron, (ii) the lower
the value wi the stronger the impact of xi on the output y of an AND neuron.
The range of the output value y for the AND neuron is computed by letting all xi equal
to zero or one. In virtue of the monotonicity property of triangular norms, we obtain
y ∈ [T (w1 , w2 ), 1]
and for the OR neuron one derives the boundaries
y ∈ [0, S(w1 , w2 )].
x1
w1
y=S(w1 ←x1, w2 ← x2)
x2 w2
159
w1 g1 y1
x1
h f
xn gm ym
wn
Figure 3.8 Kwan and Cai’s fuzzy neuron.
pi = wi xi , i = 1, 2.
z = max{p1 , p2 } = max{w1 x1 , w2 x2 }
x1
w1
z = max{w1 x1, w2 x2 }
x2 w2
pi = wi xi , i = 1, 2.
y = min{p1 , p2 } = min{w1 x1 , w2 x2 }
160
x1
w1
z = min{w1x1 , w2 x2}
x2 w2
It is well-known that regular nets are universal approximators, i.e. they can approximate
any continuous function on a compact set to arbitrary accuracy. In a discrete fuzzy
expert system one inputs a discrete approximation to the fuzzy sets and obtains a discrete
approximation to the output fuzzy set. Usually discrete fuzzy expert systems and fuzzy
controllers are continuous mappings. Thus we can conclude that given a continuous fuzzy
expert system, or continuous fuzzy controller, there is a regular net that can uniformly
approximate it to any degree of accuracy on compact sets. The problem with this result
that it is non-constructive and only approximative. The main problem is that the theorems
are existence types and do not tell you how to build the net.
Hybrid neural nets can be used to implement fuzzy IF-THEN rules in a constructive way.
Following Buckley & Hayashi [30], and, Keller, Yager & Tahani [99] we will show how to
construct hybrid neural nets which are computationally equivalent to fuzzy expert systems
and fuzzy controllers. It should be noted that these hybrid nets are for computation and
they do not have to learn anything.
Though hybrid neural nets can not use directly the standard error backpropagation algo-
rithm for learning, they can be trained by steepest descent methods to learn the parameters
of the membership functions representing the linguistic terms in the rules (supposing that
the system output is a differentiable function of these parameters).
The direct fuzzification of conventional neural networks is to extend connection weigths
and/or inputs and/or fuzzy desired outputs (or targets) to fuzzy numbers. This extension
is summarized in Table 3.1.
Fuzzy neural networks (FNN) of Type 1 are used in classification problem of a fuzzy input
161
vector to a crisp class [84, 114]. The networks of Type 2, 3 and 4 are used to implement
fuzzy IF-THEN rules [93, 95].
However, the last three types in Table 3.1 are unrealistic.
• In Type 5, outputs are always real numbers because both inputs and weights are
real numbers.
• In Type 6 and 7, the fuzzification of weights is not necessary because targets are
real numbers.
Definition 3.1.8 A regular fuzzy neural network is a neural network with fuzzy signals
and/or fuzzy weights, sigmoidal transfer function and all the operations are defined by
Zadeh’s extension principle.
Consider a simple regular fuzzy neural net in Figure 3.11. All signals and weights are
fuzzy numbers. The two input neurons do not change the input signals so their output
is the same as their input. The signal Xi interacts with the weight Wi to produce the
product
Pi = Wi Xi , i = 1, 2.
where we use the extension principle to compute Pi . The input information Pi is aggre-
gated, by standard extended addition, to produce the input
net = P1 + P2 = W1 X1 + W2 X2
to the neuron. The neuron uses its transfer function f , which is a sigmoidal function, to
compute the output
Y = f (net) = f (W1 X1 + W2 X2 )
where f (x) = (1 + e−x )−1 and the membership function of the output fuzzy set Y is
computed by the extension principle
(
(W1 X1 + W2 X2 )(f −1 (y)) if 0 ≤ y ≤ 1
Y (y) =
0 otherwise
X2 W2
Buckley and Hayashi [28] showed that regular fuzzy neural nets are monotonic, i.e. if
X1 ⊂ X10 and X2 ⊂ X20 then
162
where f is the sigmoid transfer function, and all the operations are defined by Zadeh’s
extension principle.
This means that fuzzy neural nets based on the extension principle might be universal
approximators only for continuous monotonic functions. If a fuzzy function is not mono-
tonic there is no hope of approximating it with a fuzzy neural net which uses the extension
principle.
The following example shows a continuous fuzzy function which is non-monotonic. There-
fore we must abandon the extension principle if we are to obtain a universal approximator.
where A is a fuzzy number, 0̄ is a fuzzy point with center zero, D(A, 0̄) denotes the
Hausdorff distance between A and 0̄, and (D(A, 0̄), 1) denotes a symmetrical triangular
fuzzy number with center D(A, 0̄) and width one.
D(f (An ), f (A)) = D((D(An , 0̄), 1), (D(A, 0̄), 1)) = |D(An , 0̄) − D(A, 0̄)| ≤
A 1
f(A) = (c, 1)
c-1 c = D(A, 0) c +1
Let A, A0 ∈ F such that A ⊂ A0 . Then f (A) = (D(A, 0̄), 1) and f (A0 ) = (D(A0 , 0̄), 1) are
both symmetrical triangular fuzzy numbers with different centers, i.e. nor A ⊂ A0 neither
A0 ⊂ A can occur.
Definition 3.1.9 A hybrid fuzzy neural network is a neural network with fuzzy signals
and/or fuzzy weights. However, (i) we can combine Xi and Wi using a t-norm, t-conorm,
or some other continuous operation; we can aggregate P1 and P2 with a t-norm, t-conorm,
or any other continuous function; f can be any function from input to output
Buckley and Hayashi [28] showed that hybrid fuzzy neural networks are universal approx-
imators, i.e. they can approximate any continuous fuzzy functions on a compact domain.
163
X1
1
Y = (X1 x X 2 ) ∗ R
R
X2 1
Figure 3.13 Simple hybrid fuzzy neural net for the compositional rule of inference.
Buckley, Hayashi and Czogala [22] showed that any continuous feedforward neural
net can be approximated to any degree of accuracy by a discrete fuzzy expert system:
Assume that all the νj in the input signals and all the yi in the output from the neural
net belong to [0, 1]. Therefore, o = G(ν) with ν ∈ [0, 1]n , o ∈ [0, 1]m and G is continuous,
represents the net. Given any input (ν) - output o pair for the net we now show how
to construct the corresponding rule in the fuzzy expert system. Define fuzzy set A as
A(j) = νj , j = 1, . . . , n and zero otherwise.
Α(3) = ν3
1 2 3 n
Figure 3.14 Definition of A.
Also let C(i) = oi , i = 1, . . . , m, and zero otherwise.
C(2) = ο2
1 2 m
Figure 3.15 Definition of C.
Then the rule obtained from the pair (ν, o) is
<(ν) : If x is A then z is C,
That is, in rule construction ν is identified with A and C.
Theorem 3.1.1 [22] Given ² > 0, there exists a fuzzy expert system so that
kF (u) − G(u)k ≤ ², ∀u ∈ [0, 1]n
where F is the input - output function of the fuzzy expert system < = {<(ν)}.
164
3.2 Hybrid neural nets
Drawing heavily on Buckley and Hayashi [23] we show how to construct hybrid neural
nets that are computationally identical to discrete fuzzy expert systems and the Sugeno
and Expert system elementary fuzzy controller. Hybrid neural nets employ more general
operations (t-norms, t-conorms, etc.) in combining signals and weights for input to a
neuron.
Consider a fuzzy expert system with one block of rules
<i : If x is Ai then y is Bi , 1 ≤ i ≤ n.
For simplicity we have only one clause in the antecedent but our results easily extend to
many clauses in the antecedent.
Given some data on x, say A0 , the fuzzy expert system comes up with its final conclusion
y is B 0 . In computer applications we usually use discrete versions of the continuous fuzzy
sets. Let [α1 , α2 ] contain the support of all the Ai , plus the support of all the A0 we might
have as input to the system. Also, let [β1 , β2 ] contain the support of all the Bi , plus the
support of all the B 0 we can obtain as outputs from the system. Let M ≥ 2 and N ≥ be
positive integers. Let
xj = α1 + (j − 1)(α2 − α1 )/(M − 1)
for 1 ≤ j ≤ M .
yi = β1 + (i − 1)(β2 − β1 )/(N − 1)
for 1 ≤ i ≤ N . The discrete version of the system is to input
B'
b'1 = B'(y1) = 0
β 1 = y1 β 2 = yM
b'1 b' M
Fuzzy rule base
a'1 a' N
A'
a'3 = A'(x3)
α 1 = x1 x2 x3 α 2 = xN
Figure 3.16 A discrete version of fuzzy expert system.
165
We now need to describe the internal workings of the fuzzy expert system. There are two
cases:
Case 1. Combine all the rules into one rule which is used to obtain b0 from a0 .
We first construct a fuzzy relation Rk to model rule
<k : If x is Ak , then y is Bk , 1 ≤ k ≤ n.
This is called modeling the implication and there are many ways to do this. One takes
the data Ak (xi ) and Bk (yj ) to obtain Rk (xi , yj ) for each rule. One way to do this is
Then we combine all the Rk into one R, which may be performed in many different ways
and one procedure would be to intersect the Rk to get R. In any case, let
rij = R(xi , yj ),
where a0i = A0 (xi ) and ∗ is some method (usually a t-norm) of combining the data into
λij .
Then set b0 = (b01 , . . . , b0N ) and
b0j = Agg(λ1j , . . . , λM j ), 1 ≤ j ≤ N,
a'1 rM1
a'M
r1N
rMN b'N
N
Figure 3.17 Combine the rules first.
We first combine the signals (a0i ) and the weights (ri1 ) and then aggregate the data
using Agg, so the input to the neuron is b01 . Now the transfer function is identity function
f (t) = t, t ∈ [0, 1] so that the output is b01 . Similarly for all neurons, which implies the net
166
gives b0 from a0 . The hybrid neural net in Figure 3.15 provides fast parallel computation
for a discrete fuzzy expert system. However, it can get too large to be useful. For example,
let [α1 , α2 ] = [β1 , β2 ] = [−10, 10] with discrete increments of 0.01 so that M = N = 1000.
Then there will be: 2000 input neurons, 20002 connections from the input nodes to the
output nodes, and 2000 output nodes.
Case 2. Fire the rules individually, given a0 , and combine their results into b0 .
We compose a0 with each Rk producing intermediate result b0k = (b0k1 , . . . , b0kN ) Then
combine all the b0k into b0 .
• One takes the data Ak (xi ) and Bk (yj ) to obtain Rk (xi , yj ) for each rule. One way
to do this is
Rk (xi , yj ) = min{Ak (xi ), Bk (yj )}.
In any case, let Rk (xi , yj ) = rkij . Then we have λkij = a0i ∗ rkij and
The method of combining the b0k would be done component wise so let
for some other aggregating operator Agg1 . A hybrid neural net computationally
equal to this type of fuzzy expert system is shown in Figure 3.18. For simplicity we
have drawn the figure for M = N = 2.
b'11
r111
r121
r112 1
a'1 b'12 b'1
r122 1
r211 1
a'2 r221 b'2
b'21
1
r212
r222
b'22
Figure 3.18 Fire the rules first.
In the hidden layer: the top two nodes operate as the first rule <1 , and the bottom two
nodes model the second rule <2 . In the two output nodes: the top node, all weights are
one, aggregates b011 and b021 using Agg1 to produce b01 , the bottom node, weights are one,
computes b02 .
167
Therefore, the hybrid neural net computes the same output b0 given a0 as the fuzzy expert
system.
As in the previous case this hybrid net quickly gets too big to be practical at this time.
Suppose there are 10 rules and
with discrete increments of 0.01 so that M = N = 1000. Then there will be: 2000 input
neurons, 4 millions (10 × 20002 ) connections from the input nodes to the hidden layer,
2000 neurons in the hidden layer, 10 × 2000 connections from the hidden layer to the
output neurons, and 20000 output nodes. And this hybrid net has only one clause in each
rule’s antecedent.
Buckley [19] identifies three basic types of elementary fuzzy controllers, Sugeno, Expert
system, and Mamdani. We show how to build a hybrid netural net to be computationally
identical to Sugeno and Expert system fuzzy controller. Actually, depending on how one
computes the defuzzifier, the hybrid neural net could only be approximately the same as
the Mamdani controller.
Sugeno control rules are of the type
where Ai , Bi , αi , βi , and γi are all given e is the error, ∆e is the change in error.
The input to the controller is values for e and ∆e and one first evaluates each rule’s
antecedent as follows:
σi = T (Ai (e), Bi (∆e)),
where T is some t-norm. Next, we evaluate each conclusion given e and ∆e as
zi = αi e + βi (∆e) + γi ,
The output is
X
n ÁX
n
δ= σi zi σi .
i=1 i=1
In the rest of the net all weights are equal to one. The output produces δ because we
aggregate the two input signals using division (²1 + ²2 )/(σ1 + σ2 ).
168
z1
α1 +γ1
e β1 σ1 z1 = ε1
α2
ε1 + ε2
+γ2 z2
β2 δ
σ2 z2 = ε2
A1
σ1
∆e B1
σ1 + σ2
A2
σ2
B2
The fuzzy controller based on fuzzy expert system was introduced by Buckley, Hayashi
ad Czogala in [22]. The fuzzy control rules are
where Ci are triangular shaped fuzzy numbers with centers ci , e is the error, ∆e is the
change in error. Given input values e and ∆e each rule is evaluated producing the σi
given by
σi = T (Ai (e), Bi (∆e)).
Then σi is assigned to the rule’s consequence Ci and the controller takes all the data
(σi , Ci ) and defuzzifies to output δ. Let
X
n ÁX
n
δ= σi ci σi .
i=1 i=1
A hybrid neural net computationally identical to this controller is shown in the Figure
3.20 (again, for simplicity we assume only two control rules).
σ1
A1
c1
e
B2 1
c2 δ
∆e A2 1
1
B1 1
σ2
Figure 3.20 Hybrid neural net as a fuzzy expert system controller.
169
The operations in this hybrid net are similar to those in Figure 3.19.
As an example, we show how to construct a hybrid neural net (called adaptive network
by Jang [97]) which is funcionally equivalent to Sugeno’s inference mechanism.
Sugeno and Takagi use the following rules [93]
α1 = A1 (x0 ) × B1 (y0 )
α2 = A2 (x0 ) × B2 (y0 ),
where the logical and can be modelled by any continuous t-norm, e.g
α1 = A1 (x0 ) ∧ B1 (y0 )
α2 = A2 (x0 ) ∧ B2 (y0 ),
then the individual rule outputs are derived from the relationships
z1 = a1 x0 + b1 y0 , z2 = a2 x0 + b2 y0
A1 A2
α1
u v a1 x + b 1 y
B2
B1
α2
x u y a2 x + b 2 y
v min
Figure 3.21 Sugeno’s inference mechanism.
170
A hybrid neural net computationally identical to this type of reasoning is shown in the
Figure 3.22
B1 ( y0 ) β2 z 2
B1 T N
α2 β2
y0
B2 B2 ( y0 ) x0 y0
Figure 3.22 ANFIS architecture for Sugeno’s reasoning method.
For simplicity, we have assumed only two rules, and two linguistic values for each input
variable.
• Layer 1 The output of the node is the degree to which the given input satisfies the
linguistic label associated to this node. Usually, we choose bell-shaped membership
functions · µ ¶¸
1 u − ai1 2
Ai (u) = exp − ,
2 bi1
· µ ¶¸
1 v − ai2 2
Bi (v) = exp − ,
2 bi2
to represent the linguistic terms, where
is the parameter set. As the values of these parameters change, the bell-shaped
functions vary accordingly, thus exhibiting various forms of membership functions
on linguistic labels Ai and Bi . In fact, any continuous, such as trapezoidal and
triangular-shaped membership functions, are also quantified candidates for node
functions in this layer. Parameters in this layer are referred to as premise parameters.
• Layer 2 Each node computes the firing strength of the associated rule. The output
of top neuron is
α1 = A1 (x0 ) × B1 (y0 ) = A1 (x0 ) ∧ B1 (y0 ),
and the output of the bottom neuron is
Both node in this layer is labeled by T , because we can choose other t-norms for
modeling the logical and operator. The nodes of this layer are called rule nodes.
171
• Layer 3 Every node in this layer is labeled by N to indicate the normalization of
the firing levels.
The output of top neuron is the normalized (with respect to the sum of firing levels)
firing level of the first rule
α1
β1 = ,
α1 + α2
and the output of the bottom neuron is the normalized firing level of the second
rule
α2
β2 = ,
α1 + α2
• Layer 4 The output of top neuron is the product of the normalized firing level and
the individual rule output of the first rule
β1 z1 = β1 (a1 x0 + b1 y0 ),
The output of top neuron is the product of the normalized firing level and the
individual rule output of the second rule
β2 z2 = β2 (a2 x0 + b2 y0 ),
• Layer 5 The single node in this layer computes the overall system output as the
sum of all incoming signals, i.e.
z0 = β1 z1 + β2 z2 .
If a crisp traing set {(xk , y k ), k = 1, . . . , K} is given then the parameters of the hybrid
neural net (which determine the shape of the membership functions of the premises) can
be learned by descent-type methods. This architecture and learning procedure is called
ANFIS (adaptive-network-based fuzzy inference system) by Jang [97].
The error function for pattern k can be given by
Ek = (y k − ok )2
where y k is the desired output and ok is the computed output by the hybrid neural net.
If the membership functions are of triangular form
1 − (ai1 − u)/ai2 if ai1 − ai2 ≤ u ≤ ai1
Ai (u) = 1 − (u − ai1 )/ai3 if ai1 ≤ u ≤ ai1 + ai3
0 otherwise
1 − (bi1 − v)/bi2 if bi1 − bi2 ≤ v ≤ bi1
Bi (v) = 1 − (v − bi1 )/bi3 if bi1 ≤ v ≤ bi1 + bi3
0 otherwise
then we can start the learning process from the initial values (see Figure 3.23).
Generally, the initial values of the parameters are set in such a way that the membership
functions along each axis satisfy ε-completeness, normality and convexity.
172
B2 1
12 22
not used A2 B2
1/2
11 21
A1 B1 not used
B1
A1 A2
1
1/2 1
Figure 3.23 Two-input ANFIS with four fuzzy rules.
Nauck, Klawonn and Kruse [130] initialize the network with all rules that can be
constructed out of all combinations of input and output membership functions. During
the learning process all hidden nodes (rule nodes) that are not used or produce counter-
productive results are removed from the network.
It should be noted however, that these tuning methods have a weak point, because the
convergence of tuning depends on the initial condition.
Exercise 3.2.1 Construct a hybid neural net implementing Tsukumato’s reasoning mech-
anism with two input variable, two linguistiuc values for each input variables and two fuzzy
IF-THEN rules.
Exercise 3.2.2 Construct a hybid neural net implementing Larsen’s reasoning mecha-
nism with two input variable, two linguistiuc values for each input variables and two fuzzy
IF-THEN rules.
Exercise 3.2.3 Construct a hybid neural net implementing Mamdani’s reasoning mecha-
nism with two input variables, two linguistiuc values for each input variable and two fuzzy
IF-THEN rules.
173
3.2.1 Computation of fuzzy logic inferences by hybrid neural
net
Keller, Yager and Tahani [99] proposed the following hybrid neural network architec-
ture for computation of fuzzy logic inferences. Each basic network structure implements
a single rule in the rule base of the form
these values being the membership grades of A0i at sampled points {ν1 , . . . , νM } over its
domain of discourse.
There are two variations of the activities in the antecedent clause checking layer. In both
cases, each antecedent clause of the rule determines the weights. For the first variation,
the weights wij are the fuzzy set complement of the antecedent clause, i.e., for the i-th
clause
wij = 1 − aij
The weights are chosen this way because the first layer of the hybrid neural net will
generate a measure of disagreement between the input possibility distribution and the
antecedent clause distribution. This is done so that as the input moves away from the
antecedent, the amount of disagreement will rise to one. Hence, if each node calculates the
similarity between the input and the complement of the antecedent, then we will produce
such a local measure of disagreement. The next layer combines this evidence.
The purpuse of the node is to determine the amount of disagreement present between the
antecedent clause and the corresponding input data. If the combination at the k-th node
is denoted by dk , then
174
or
d2k = max min{(1 − akj ), a0kj }
j
The second form for the antecedent clause checking layer uses the fuzzy set Ak themselves
as the weights, i.e. in this case
1 − t = 1 − max{αi di }.
i
b'1 b' N
u1 uN
α1 αn
dn
d1 Antecedent clause checking layer
w11 wn1
Figure 3.24 Hybrid neural network configuration for fuzzy logic inference.
The weights ui on the output nodes carry the information from the consequent of rule. If
the proposition ”y is B” is characterized by the discrete possibility distribution
B = {b1 , . . . , bN }
u i = 1 − bi .
175
Each output node forms the value
b0i = 1 − ui (1 − t) = 1 − (1 − bi )(1 − t) = bi + t − bi t
From this equation, it is clear that if t = 0, then the rule fires with conclusion ”y is B”
exactly. On the other hand, if the the total disagreement is one, then the conclusion of
firing the rule is a possibility distribution composed entirely of 1’s, hence the conclusion
is ”y is unknown”.
This network extends classical (crisp) logic to fuzzy logic as shown by the following the-
orems. For simplicity, in each theorem we consider a single antecedent clause rule of the
form
If x is A then y is B.
Suppose A is a crsip subset of its domain of discourse. Let us denote χA the characteristic
function of A, i.e (
1 if u ∈ A
χA (u) =
0 otherwise
and let A be represented by {a1 , . . . , aM }, where ai = χA (νi ).
χΑ
1
a1 = 0 ai = 1
ν1 νi νM
Figure 3.25 Representation of a crisp subset A.
Theorem 3.2.1 [99] In the single antecedent clause rule, suppose A is a crisp subset of
its domain of discourse. Then the fuzzy logic inference network produces the standard
modus ponens result, i.e. if the input ”x is A0 ” is such that A0 = A, then the network
results in ”y is B”.
Hence, at the combination node, t = 0, and so the output layer will produce
b0i = bi + t − bi t = bi + t(1 − bi ) = bi .
176
Theorem 3.2.2 [99] Consider the inference network which uses d1 or d2 for clause check-
ing. Suppose that A and A0 are proper crisp subsets of their domain of discourse and let
co(A) = {x|x ∈/ A} denote the complement of A.
(i) If co(A) ∩ A0 6= ∅, then the network produces the result ”y is unknown”, i.e. a
possibility distribution for y is equal to 1.
(ii) If A0 ⊂ A (i.e. A0 is more specific than A), then the result is ”y is B”.
Proof. (i) Since co(A) ∩ A0 6= ∅, there exists a point νi in the domain such that
b0i = bi + t − bi t = bi + (1 − bi ) = 1.
(ii) Now suppose that A0 ⊂ A. Then A0 ∩ co(A) = ∅, and so, d1 = d2 = 0. producing the
result ”y is B”.
Theorem 3.2.3 [99] Consider the inference network which uses d3 for clause checking.
Suppose that A and A0 are proper crisp subsets of their domain of discourse such that
A0 6= A, then the network result is ”y is unknown”.
Then d3 = maxi {|ai − a0i |} = 1, which ensures that the result is ”y is unknown”.
Theorem 3.2.4 [99] (monotonocity theorem) Consider the single clause inference net-
work using d1 or d2 for clause checking. Suppose that A, A0 and A00 are three fuzzy sets
such that
A00 ⊂ A0 ⊂ A.
Let the results of inference with inputs ”x is A0 ” and ”x is A00 ” be ”y is B 0 ” and ”y is
B 00 ” respectively. Then
B ⊂ B 00 ⊂ B 0
that is, B 00 is closer to B than B 0 .
Proof. For each νi in the domain of discourse of A,
177
for i = 1, 2. Hence, t00 ≤ t0 . Finally,
Clearly, from the above equations, both b00i and b0i are larger or equal than bi . This completes
the proof.
Intuitively, this theorem states that as the input becomes more specific, the output con-
verges to the consequent.
Having to discretize all the fuzzy sets in a fuzzy expert system can lead to an enormous
hybrid neural net as we have seen above. It is the use of an hybrid neural net that dictates
the discretization because it processes real numbers. We can obtain much smaller networks
if we use fuzzy neural nets.
Drawing heavily on Buckley and Hayashi [30] we represent fuzzy expert systems as hybrid
fuzzy neural networks.
We recall that a hybrid fuzzy neural network is a neural network with fuzzy signals and/or
fuzzy weights. However,
Suppose the fuzzy expert system has only one block of rules of the form
<i : If x = Ai then y is Bi , 1 ≤ x ≤ n.
B 0 = A0 ◦ R
178
For example, one could have
B 0 (y) = sup min{A0 (x), R(x, y)}
M1 ≤x≤M2
for each y ∈ [N1 , N2 ]. A hybrid fuzzy neural net, the same as this fuzzy expert
system, is shown in Figure 3.26.
A' 1 B'
R
Figure 3.26 Combine the rules.
There is only one neuron with input weight equal to one. The transfer functions
(which maps fuzzy sets into fuzzy sets) inside the neuron is the fuzzy relation R.
So, we have input A0 to the neuron with its output B 0 = A0 ◦ R.
We obtained the simplest possible hybrid fuzzy neural net for the fuzzy expert
system. The major drawback is that there is no hardware available to implement
fuzzy neural nets.
• Case 2 Fire the rules individually and then combine their results.
We first compose A0 with each Rk to get Bk0 , the conclusion of the k-th rule, and
then combine all the Bk0 into one final conclusion B 0 . Let Bk0 be defined by the
compositional rule of inference as
Bk0 = A0 ◦ Rk
for all y ∈ [N1 , N2 ]. Then
B 0 (y) = Agg(B10 (y), . . . , Bn0 (y))
for some aggregation operator Agg.
A hybrid fuzzy neural net the same as this fuzzy expert system is displayed in Figure
3.27.
R1
1
A' 1 B'
1
1 Rn
All the weights are equal to one and the fuzzy relations Rk are the transfer functions
for the neurons in the hidden layer. The input signals to the output neuron are the
Bk0 which are aggregated by Agg. The transfer function in the output neuron is the
identity (no change) function.
179
3.3 Trainable neural nets for fuzzy IF-THEN rules
In this section we present some methods for implementing fuzzy IF-THEN rules by train-
able neural network architectures. Consider a block of fuzzy rules
{(A1 , B1 ), . . . , (An , Bn )}
{(Ai , Bi ), Ci }, 1 ≤ i ≤ n.
There are two main approaches to implement fuzzy IF-THEN rules (3.1) by standard
error backpropagation network.
• In the method proposed by Umano and Ezawa [166] a fuzzy set is represented by a
finite number of its membership values.
Let [α1 , α2 ] contain the support of all the Ai , plus the support of all the A0 we might
have as input to the system. Also, let [β1 , β2 ] contain the support of all the Bi , plus
the support of all the B 0 we can obtain as outputs from the system. i = 1, . . . , n.
Let M ≥ 2 and N ≥ be positive integers. Let
xj = α1 + (j − 1)(α2 − α1 )/(N − 1)
yi = β1 + (i − 1)(β2 − β1 )/(M − 1)
for 1 ≤ i ≤ M and 1 ≤ j ≤ N .
180
A discrete version of the continuous training set is consists of the input/output pairs
{(Ai (x1 ), . . . , Ai (xN )), (Bi (y1 ), . . . , Bi (yM ))}
for i = 1, . . . , n.
x1 xN
Figure 3.28 Representation of a fuzzy number by membership values.
Using the notations aij = Ai (xj ) and bij = Bi (yj ) our fuzzy neural network turns into
an N input and M output crisp network, which can be trained by the generalized
delta rule.
Bi
bij
y1 yj yM
Ai
aij
x1 xi xN
Figure 3.29 A network trained on membership values fuzzy numbers.
Example 3.3.1 Assume our fuzzy rule base consists of three rules
<1 : If x is small then y is negative,
<2 : If x is medium then y is about zero,
<3 : If x is big then y is positive,
181
(
1 − 2|u − 1/2| if 0 ≤ u ≤ 1
µmedium (u) =
0 otherwise
medium
1
small big
1/2 1
Figure 3.30 Membership functions for small, medium and big.
( (
−u if −1 ≤ u ≤ 0 1 − 2|u| if −1/2 ≤ u ≤ 1/2
µnegative (u) = µabout zero (u) =
0 otherwise 0 otherwise
(
u if 0 ≤ u ≤ 1
µpositive (u) =
0 otherwise
1
negative about zero positive
-1 1
Figure 3.31 Membership functions for negative, about zero and positive.
The training set derived from this rule base can be written in the form
Let [0, 1] contain the support of all the fuzzy sets we might have as input to the system.
Also, let [−1, 1] contain the support of all the fuzzy sets we can obtain as outputs from
the system. Let M = N = 5 and
xj = (j − 1)/4
for 1 ≤ j ≤ 5, and
182
A discrete version of the continuous training set is consists of three input/output pairs
{(a11 , . . . , a15 ), (b11 , . . . , b15 )}
{(a21 , . . . , a25 ), (b21 , . . . , b25 )}
{(a31 , . . . , a35 ), (b31 , . . . , b35 )}
where
a1j = µsmall (xj ), a2j = µmedium (xj ), a3j = µbig (xj )
for j = 1, . . . , 5, and
b1i = µnegative (yi ), b2i = µabout zero (yi ), b3i = µpositive (yi )
for i = 1, . . . , 5. Plugging into numerical values we obtain the following training set for a
standard backpropagation network
{(1, 0.5, 0, 0, 0), (1, 0.5, 0, 0, 0)}
{(0, 0.5, 1, 0.5, 0), (0, 0, 1, 0, 0)}
{(0, 0, 0, 0.5, 1), (0, 0, 0, 0.5, 1)}.
• Uehara and Fujise [165] use finite number of α-level sets to represent fuzzy num-
bers. Let M ≥ 2 and let
αj = (j − 1)/(M − 1), j = 1, . . . , M
be a partition of [0, 1]. Let [Ai ]αj denote the αj -level set of fuzzy numbers Ai
[Ai ]αj = {u | Ai (u) ≥ αj } = [aLij , aR
ij ]
for j = 1, . . . , M and [Bi ]αj denote the αj -level set of fuzzy number Bi
[Bi ]αj = {u | Bi (u) ≥ αj } = [bLij , bR
ij ]
for j = 1, . . . , M . Then the discrete version of the continuous training set is consists
of the input/output pairs
{(aLi1 , aR L R L R L R
i1 , . . . , aiM , aiM ), (bi1 , bi1 , . . . , biM , biM )}
where for i = 1, . . . , n.
αΜ = 1
Ai
αj
α1 = 0
L R
aij a ij
Figure 3.32 Representation of a fuzzy number by α-level sets.
The number of inputs and outputs depend on the number of α-level sets considered.
For example, in Figure 3.27, the fuzzy number Ai is represented by seven level sets,
i.e. by a fourteen-dimensional vector of real numbers.
183
Example 3.3.2 Assume our fuzzy rule base consists of three rules
medium
1
small big
{(aL11 , aR L R L R L R
11 , . . . , a16 , a16 ), (b11 , b11 , . . . , b16 , b16 )}
{(aL21 , aR L R L R L R
211 , . . . , a26 , a26 ), (b21 , b21 , . . . , b26 , b26 )}
{(aL31 , aR L R L R L R
31 , . . . , a36 , a36 ), (b31 , b31 , . . . , b36 , b36 )}
where
[aL1j , aR L R
1j ] = [b1j , b1j ] = [small]
αj
184
[aL2j , aR L R
2j ] = [b2j , b2j ] = [medium]
αj
and
[aL3j , aR L R αj
3j ] = [b3j , b3j ] = [big] .
{(0, 0.5, 0, 0.44, 0, 0.38, 0, 0.32, 0, 0.26, 0, 0.2), (0, 0.5, 0, 0.44, 0, 0.38, 0, 0.32, 0, 0.26, 0, 0.2)}
{(0.5, 1, 0.56, 1, 0.62, 1, 0.68, 1, 0.74, 1, 0.8, 1), (0.5, 1, 0.56, 1, 0.62, 1, 0.68, 1, 0.74, 1, 0.8, 1)}
{(0.25, 0.75, 0.3, 0.7, 0.35, 0.65, 0.4, 0.6, 0.45, 0.55, 0.5, 0.5),
(0.25, 0.75, 0.3, 0.7, 0.35, 0.65, 0.4, 0.6, 0.45, 0.55, 0.5, 0.5)}.
Exercise 3.3.1 Assume our fuzzy rule base consists of three rules
Exercise 3.3.2 Assume our fuzzy rule base consists of three rules
185
1 − (0.8 − u)/0.3
if 1/2 ≤ u ≤ 0.8
µbig (u) = 1 if 0.8 ≤ u ≤ 1
0 otherwise
(
1 − 4|u − 1/2| if 0.25 ≤ u ≤ 0.75
µmedium (u) =
0 otherwise
Assume that [0, 1] contains the support of all the fuzzy sets we might have as input and
output for the system. Derive training sets for standard backpropagation network from 10
selected values of α-level sets of fuzzy terms.
186
3.3.1 Implementation of fuzzy rules by regular FNN of Type 2
Ishibuchi, Kwon and Tanaka [88] proposed an approach to implement of fuzzy IF-
THEN rules by training neural networks on fuzzy training patterns.
Assume we are given the following fuzzy rules
where Xp = (Ap1 , . . . , Apn ) denotes the antecedent part and the fuzzy target output Bp
is the consequent part of the rule.
Our learning task is to train a neural network from fuzzy training pattern set (3.2) by a
regular fuzzy neural network of Type 2 from Table 3.1.
Ishibuchi, Fujioka and Tanaka [82] propose the following extension of the standard back-
propagation learning algorithm:
Οp
w1 wk
wi
ο1 οk
1 i k
wij
1 j n
Ap1 Apn
Suppose that Ap , the p-th training pattern, is presented to the network. The output of
the i-th hidden unit, oi , is computed as
X
n
opi = f ( wij Apj ).
j=1
where f (t) = 1/(1 + exp(−t)) is a unipolar transfer function. It should be noted that the
input-output relation of each unit is defined by the extension principle.
187
1
f
f(Net)
0 Net
Figure 3.35 Fuzzy input-output relation of each neuron.
where OpL (α) denotes the left-hand side and OpR (α) denotes the right-hand side of the
α-level sets of the computed output.
Since f is strictly monoton increasing we have
X
k X
k X
k
α α
[Op ] = [f ( wi opi )] = [f ( [wi opi ] (α)), f ( [wi opi ]R (α))],
L
where
X
n X
n X
n
α α
[opi ] = [f ( wij Apj )] = [f ( [wij Apj ] (α)), f ( [wij Apj ]R (α))].
L
1 Bp
L
Bp( α ) BR
p( α )
where BpL (α) denotes the left-hand side and BpR (α) denotes the right-hand side of the
α-level sets of the desired output.
188
A cost function to be minimized is defined for each α-level set as follows
where
1
eLp (α) = (BpL (α) − OpL (α))2
2
1 R
eRp (α) = (Bp (α) − Op (α))
R 2
2
i.e. eLp (α) denotes the error between the left-hand sides of the α-level sets of the desired
and the computed outputs, and eR p (α) denotes the error between the left right-hand sides
of the α-level sets of the desired and the computed outputs.
1 Οp
Οp( α)
α
L R
Οp( α) Οp( α)
Figure 3.37 An α-level set of the computed output pattern Op .
Then the error function for the p-th training pattern is
X
ep = αep (α) (3.3)
α
Theoretically this cost function satisfies the following equation if we use infinite number
of α-level sets in (3.3).
ep → 0 if and only if Op → Bp
From the cost function ep (α) the following learning rules can be derived
∂ep (α)
wi := wi − ηα ,
∂wi
for i = 1, . . . , k and
∂ep (α)
wij := wij − ηα
∂wij
for i = 1, . . . , k and j = 1, . . . , n.
The Reader can find the exact calculation of the partial derivatives
189
3.3.2 Implementation of fuzzy rules by regular FNN of Type 3
Following [95] we show how to implement fuzzy IF-THEN rules by regular fuzzy neural
nets of Type 3 (fuzzy input/output signals and fuzzy weights) from Table 3.1.
Assume we are given the following fuzzy rules
{(X1 , B1 ), . . . , (Xm , Bm )}
where Xp = (Ap1 , . . . , Apn ) denotes the antecedent part and the fuzzy target output Bp
is the consequent part of the rule.
The output of the i-th hidden unit, oi , is computed as
X
n
opi = f ( Wij Apj ).
j=1
where Apj is a fuzzy input, Wi and Wij are fuzzy weights of triangular form and f (t) =
1/(1 + exp(−t)) is a unipolar transfer function.
Οp
W1 Wk
Wi
1 i k
Wij
1 j n
Ap1 Apn
The fuzzy outputs for each unit is numerically calculated for the α-level sets of fuzzy
inputs and weights. Let us denote the α-level set of the computed output Op by
190
the α-level set of the target output Bp are denoted by
the α-level sets of the weights of the output unit are denoted by
the α-level sets of the weights of the hidden unit are denoted by
X
k X
k X
k
[Op ]α = [f ( Wi opi )]α = [f ( [Wi opi ]L (α)), f ( [Wi opi ]R (α))]
i=1 i=1 i=1
where
1 1 R
eLp (α) = (BpL (α) − OpL (α))2 , eR
p (α) = (Bp (α) − Op (α))
R 2
2 2
i.e. eLp (α) denotes the error between the left-hand sides of the α-level sets of the desired
and the computed outputs, and eR p (α) denotes the error between the left right-hand sides
of the α-level sets of the desired and the computed outputs.
Then the error function for the p-th training pattern is
X
ep = αep (α) (3.4)
α
Let us derive a learning algorithm of the fuzzy neural network from the error function
ep (α). Since fuzzy weights of hidden neurons are supposed to be of symmetrical triangular
1 2 3 1
form, they can be represented by three parameters Wij = (wij , wij , wij ). where wij denotes
2 3
the lower limit, wij denotes the center and wij denotes the upper limit of Wij .
Wij
w 1ij w ij
2
w ij3
Figure 3.39 Representation of Wij .
191
Similarly, the weights of the output neuron can be represented by three parameter Wi =
(wi1 , wi2 , wi3 ), where wi1 denotes the lower limit, wi2 denotes the center and wi3 denotes the
upper limit of Wi .
Wi
w 1i w i2 w 3i
Figure 3.40 Representation of Wi .
∂ep (α)
∆wi1 (t) = −η + β∆wi1 (t − 1)
∂wi1
∂ep (α)
∆wi3 (t) = −η + β∆wi3 (t − 1)
∂wi3
where η is a learning constant, β is a momentum constant and t indexes the number of
adjustments, for i = 1, . . . , k, and
∂ep (α)
1
wij (t) = −η 1
1
+ β∆wij (t − 1)
∂wij
∂ep (α)
3
wij (t) = −η 3
3
+ β∆wij (t − 1)
∂wij
where η is a learning constant, β is a momentum constant and t indexes the number of
adjustments, for i = 1, . . . , k and j = 1, . . . , n.
The explicit calculation of above derivatives can be found in ([95], pp. 291-292).
1 2 3
The fuzzy weight Wij = (wij , wij , wij ) is updated by the following rules
1 1 1
wij (t + 1) = wij (t) + ∆wij (t)
3 3 3
wij (t + 1) = wij (t) + ∆wij (t)
1 3
2
wij (t + 1) + wij (t + 1)
wij (t + 1) = ,
2
192
for i = 1, . . . , k and j = 1, . . . , n. The fuzzy weight Wi = (wi1 , wi2 , wi3 ) is updated in a
similar manner, i.e.
wi1 (t + 1) = wi1 (t) + ∆wi1 (t)
wi3 (t + 1) = wi3 (t) + ∆wi3 (t)
wi1 (t + 1) + wi3 (t + 1)
wi2 (t + 1) = ,
2
for i = 1, . . . , k.
After the adjustment of Wi it can occur that its lower limit may become larger than its
upper limit. In this case, we use the following simple heuristics
{(X1 , B1 ), . . . , (Xm , Bm )}
Summary 3.3.1 In this case, the learning algorithm can be summarized as follows:
• Step 1 Fuzzy weights are initialized at small random values, the running error E
is set to 0 and Emax > 0 is chosen
• Step 5 The training cycle is completed. For E < Emax terminate the training
session. If E > Emax then E is set to 0 and we initiate a new training cycle by
going back to Step 2.
193
3.4 Tuning fuzzy control parameters by neural nets
Fuzzy inference is applied to various problems. For the implementation of a fuzzy con-
troller it is necessary to determine membership functions representing the linguistic terms
of the linguistic inference rules. For example, consider the linguistic term approximately
one. Obviously, the corresponding fuzzy set should be a unimodal function reaching its
maximum at the value one. Neither the shape, which could be triangular or Gaussian, nor
the range, i.e. the support of the membership function is uniquely determined by approxi-
mately one. Generally, a control expert has some idea about the range of the membership
function, but he would not be able to argue about small changes of his specified range.
1
Figure 3.42 Triangular membership function for ”x is approximately one”.
1
Figure 3.43 Trapezoidal membership function for ”x is approximately one”.
The effectivity of the fuzzy models representing nonlinear input-output relationships de-
pends on the fuzzy partition of the input space.
Therefore, the tuning of membership functions becomes an import issue in fuzzy control.
Since this tuning task can be viewed as an optimization problem neural networks and
genetic algorithms [96] offer a possibility to solve this problem.
A straightforward approach is to assume a certain shape for the membership functions
194
which depends on different parameters that can be learned by a neural network. This idea
was carried out in [139] where the membership functions are assumed to be symmetrical
triangular functions depending on two parameters, one of them determining where the
function reaches its maximum, the order giving the width of the support. Gaussian
membership functions were used in [81].
Both approaches require a set training data in the form of correct input-output tuples
and a specification of the rules including a preliminary definition of the corresponding
membership functions.
We describe a simple method for learning of membership functions of the antecedent and
consequent parts of fuzzy IF-THEN rules.
Suppose the unknown nonlinear mapping to be realized by fuzzy systems can be repre-
sented as
{(x1 , y 1 ), . . . , (xK , y K )}
For modeling the unknown mapping in (3.5), we employ simplified fuzzy IF-THEN rules
of the following type
i = 1, . . . , m, where Aij are fuzzy numbers of triangular form and zi are real numbers.
In this context, the word simplified means that the individual rule outputs are given by
crisp numbers, and therefore, we can use their weighted sum (where the weights are the
firing strengths of the corresponding rules) to obtain the overall system output.
Let ok be the output from the fuzzy system corresponding to the input xk . Suppose the
firing level of the i-th rule, denoted by αi , is defined by Larsen’s product operator
Y
n
αi = Aij (xkj )
j=1
(one can define other t-norm for modeling the logical connective and), and the output of
the system is computed by the discrete center-of-gravity defuzzification method as
X
m ÁX
m
k
o = αi zi αi .
i=1 i=1
We define the measure of error for the k-th training pattern as usually
1
Ek = (ok − y k )2
2
where ok is the computed output from the fuzzy system < corresponding to the input
pattern xk and y k is the desired output, k = 1, . . . , K.
195
The steepest descent method is used to learn zi in the consequent part of the fuzzy rule
<i . That is,
∂Ek αi
zi (t + 1) = zi (t) − η = zi (t) − η(ok − y k ) ,
∂zi α1 + · · · + αm
for i = 1, . . . , m, where η is the learning constant and t indexes the number of the
adjustments of zi .
Suppose that every linguistic variable in (3.6) can have seven linguistic terms
{N B, N M, N S, ZE, P S, P M, P B}
and their membership function are of triangular form characterized by three parame-
ters (center, left width, right width). Of course, the membership functions representing
the linguistic terms {N B, N M, N S, ZE, P S, P M, P B} can vary from input variable to
input variable, e.g. the linguistic term ”Negative Big” can have maximum n different
representations.
NB NM NS ZE PS PM PB
-1 1
Figure 3.44 Initial linguistic terms for the input variables.
The parameters of triangular fuzzy numbers in the premises are also learned by the
steepest descent method.
We illustrate the above tuning process by a simple example. Consider two fuzzy rules of
the form (3.6) with one input and one output variable
<1 : if x is A1 then y = z1
<2 : if x is A2 then y = z2
where the fuzzy terms A1 ”small” and A2 ”big” have sigmoid membership functions
defined by
1 1
A1 (x) = , A2 (x) =
1 + exp(b1 (x − a1 )) 1 + exp(b2 (x − a2 ))
where a1 , a2 , b1 and b2 are the parameter set for the premises.
Let x be the input to the fuzzy system. The firing levels of the rules are computed by
1 1
α1 = A1 (x) = α2 = A2 (x) =
1 + exp(b1 (x − a1 )) 1 + exp(b2 (x − a2 ))
and the output of the system is computed by the discrete center-of-gravity defuzzification
method as
α1 z1 + α2 z2 A1 (x)z1 + A2 (x)z2
o= = .
α1 + α2 A1 (x) + A2 (x)
196
Suppose further that we are given a training set
{(x1 , y 1 ), . . . , (xK , y K )}
Our task is construct the two fuzzy rules with appropriate membership functions and con-
sequent parts to generate the given input-output pairs.
That is, we have to learn the following parameters
We define the measure of error for the k-th training pattern as usually
1
Ek = Ek (a1 , b1 , a2 , b2 , z1 , z2 ) = (ok (a1 , b1 , a2 , b2 , z1 , z2 ) − y k )2
2
where ok is the computed output from the fuzzy system corresponding to the input pattern
xk and y k is the desired output, k = 1, . . . , K.
The steepest descent method is used to learn zi in the consequent part of the i-th fuzzy
rule. That is,
∂Ek ∂
z1 (t + 1) = z1 (t) − η = z1 (t) − η Ek (a1 , b1 , a2 , b2 , z1 , z2 ) =
∂z1 ∂z1
α1 A1 (xk )
z1 (t) − η(o − y )
k k
= z1 (t) − η(o − y )
k k
α1 + α2 A1 (xk ) + A2 (xk )
∂Ek ∂
z2 (t + 1) = z2 (t) − η = z2 (t) − η Ek (a1 , b1 , a2 , b2 , z1 , z2 ) =
∂z2 ∂z2
α2 A2 (xk )
z2 (t) − η(ok − y k ) = z2 (t) − η(ok − y k )
α1 + α2 A1 (xk ) + A2 (xk )
197
where η > 0 is the learning constant and t indexes the number of the adjustments of zi .
In a similar manner we can find the shape parameters (center and slope) of the membership
functions A1 and A2 .
∂Ek ∂Ek
a1 (t + 1) = a1 (t) − η , b1 (t + 1) = b1 (t) − η
∂a1 ∂b1
∂Ek ∂Ek
a2 (t + 1) = a2 (t) − η, b2 (t + 1) = b2 (t) − η
∂a2 ∂b2
where η > 0 is the learning constant and t indexes the number of the adjustments of the
parameters. We show now how to compute analytically the partial derivative of the error
function Ek with respect to a1 , the center of the fuzzy number A1 .
∂Ek ∂ 1 ∂ k ∂ok
= Ek (a1 , b1 , a2 , b2 , z1 , z2 ) = (o (a1 , b1 , a2 , b2 , z1 , z2 ) − y ) = (o − y )
k 2 k k
,
∂a1 ∂a1 2 ∂a1 ∂a1
where · ¸
∂ok ∂ A1 (xk )z1 + A2 (xk )z2
= =
∂a1 ∂a1 A1 (xk ) + A2 (xk )
·µ ¶Á
∂ z1 z2
+
∂a1 1 + exp(b1 (xk − a1 )) 1 + exp(b2 (xk − a2 ))
µ ¶¸
1 1
+ =
1 + exp(b1 (xk − a1 )) 1 + exp(b2 (xk − a2 ))
· ¸
∂ z1 [1 + exp(b2 (xk − a2 ))] + z2 [1 + exp(b1 (xk − a1 ))]
=
∂a1 2 + exp(b1 (xk − a1 )) + exp(b2 (xk − a2 ))
− b1 z2 ²2 (2 + ²1 + ²2 ) + b1 ²1 (z1 (1 + ²2 ) + z2 (1 + ²1 ))
(2 + ²1 + ²2 )2
where we used the notations ²1 = exp(b1 (xk − a1 )) and ²2 = exp(b2 (xk − a2 )) .
The learning rules are simplified if we use the following fuzzy partition
1 1
A1 (x) = , A2 (x) =
1 + exp(−b(x − a)) 1 + exp(b(x − a))
where a and b are the shared parameters of A1 and A2 . In this case the equation
A1 (x) + A2 (x) = 1
198
Figure 3.44b Symmetrical membership functions.
∂Ek
z1 (t + 1) = z1 (t) − η = z1 (t) − η(ok − y k )A1 (xk )
∂z1
∂Ek
z2 (t + 1) = z2 (t) − η = z2 (t) − η(ok − y k )A2 (xk )
∂z2
∂Ek (a, b)
a(t + 1) = a(t) − η
∂a
∂Ek (a, b)
b(t + 1) = b(t) − η
∂b
where
∂Ek (a, b) ∂ok ∂
= (o − y )
k k
= (ok − y k ) [z1 A1 (xk ) + z2 A2 (xk )] =
∂a ∂a ∂a
∂ ∂A1 (xk )
(o − y ) [z1 A1 (x ) + z2 (1 − A1 (x ))] = (o − y )(z1 − z2 )
k k k k k k
=
∂a ∂a
(ok − y k )(z1 − z2 )bA1 (xk )(1 − A1 (xk )) = (ok − y k )(z1 − z2 )bA1 (xk )A2 (xk ).
and
∂Ek (a, b) ∂A1 (xk )
= (ok − y k )(z1 − z2 ) = −(ok − y k )(z1 − z2 )(xk − a)A1 (xk )A2 (xk ).
∂b ∂b
Jang [97] showed that fuzzy inference systems with simplified fuzzy IF-THEN rules are
universal approximators, i.e. they can approximate any continuous function on a compact
set to arbitrary accuracy. It means that the more fuzzy terms (and consequently more
rules) are used in the rule base, the closer is the output of the fuzzy system to the desired
values of the function to be approximated.
A method which can cope with arbitrary membership functions for the input variables is
proposed in [68, 162, 163]. The training data have to be divided into r disjoint clusters
R1 , . . . , Rr . Each cluster Ri corresponds to a control rule Ri . Elements of the clusters are
199
tuples of input-output values of the form (x, y) where x can be a vector x = (x1 , . . . , xn )
of n input variables.
This means that the rules are not specified in terms of linguistic variables, but in the form
of crisp input-output tuples.
A multilayer perceptron with n input units, some hidden layers, and r output units can
be used to learn these clusters. The input data for this learning task are the input vectors
of all clusters, i.e. the set
{x | ∃i ∃y : (x, y) ∈ Ri }.
The target output tui (x) for input x at output unit ui is defined as
½
1 if there exists y such that (x, y) ∈ Ri
tui (x) =
0 otherwise
After the network has learned its weights, arbitrary values for x can be taken as inputs.
Then the output at output unit ui can be interpreted as the degree to which x matches
the antecedent of rule Ri , i.e. the function
x → ou i
is the membership function for the fuzzy set representing the linguistic term on the left-
hand side of rule Ri .
In case of a Mamdani type fuzzy controller the same technique can be applied to the
output variable, resulting in a neural network which determines the fuzzy sets for the
right-hand sides of the rules.
For Sugeno type fuzzy controller, where each rule yields a crisp output value together
with a number, specifying the matching degree for the antecedent of the rule, another
technique can be applied. For each rule Ri a neural network is trained with the input-
output tuples of the set Ri . Thus these r neural networks determine the crisp output
values for the rules Ri , . . . , Rr .
These neural networks can also be used to eliminate unnecessary input variables in the
input vector x for the rules R1 , . . . , Rr by neglecting one input variable in one of the rules
and comparing the control result with the one, when the variable is not neglected. If the
performance of the controller is not influenced by neglecting input variable xj in rule Ri ,
xj is unnecessary for Ri and can be left out.
ANFIS (Adaptive Neural Fuzzy Inference Systems) [97] is a great example of an archi-
tecture for tuning fuzzy system parameters from input/output pairs of data. The fuzzy
inference process is implemented as a generalized neural network, which is then tuned by
gradient descent techniques. It is capable of tuning antecedent parameters as well as con-
sequent parameters of fuzzy rules which use a softened trapezoidal membership function.
It has been applied to a variety of problems, including chaotic time series prediction and
the IRIS cluster learning problem.
These tuning methods has a weak point, because the convergence of tuning depends on the
initial condition. Ishigami, Fukuda, Shibata and Arai [96] present a hybrid auto-tuning
method of fuzzy inference using genetic algorithms and the generalized delta learning rule,
which guarantees the optimal structure of the fuzzy model.
200
3.5 Fuzzy rule extraction from numerical data
Fuzzy systems and neural networks are widely used for function approximation. When
comparing these two technologies, fuzzy systems are more favorable in that their behavior
can be explained based on fuzzy rules and thus their performance can be adjusted by tun-
ing the rules. But since, in general, knowledge acquisition is difficult and also the universe
of discourse of each input variable needs to be divided into several intervals, applications
of fuzzy systems are restricted to the fields where expert knowledge is available and the
number of input variables is small. To overcome the problem of knowledge acquisition,
several methods for extracting fuzzy rules from numerical data have been developed.
In the previous section we described, how neural networks could be used to optimize
certain parameters of a fuzzy rule base.
We assumed that the fuzzy IF-THEN rules where already specified in linguistic form or
as a crisp clustering of a set of correct input-output tuples.
If we are given a set of crisp input-output tuples we can try to extract fuzzy (control)
rules from this set. This can either be done by fuzzy clustering methods [14] or by using
neural networks.
The input vectors of the input-output tuples can be taken as inputs for a Kohonen self-
organizing map, which can be interpreted in terms of linguistic variables [142]. The
main idea for this interpretation is to refrain from the winner take-all principle after the
weights for the self-organizing map are learned. Thus each output unit ui is from being
the ’winner’ given input vector x, a matching degree µi (x) can be specified, yielding the
degree to which x satisfies the antecedent of the corresponding rule.
Finally, in order to obtain a Sugeno type controller, to each rule (output unit) a crisp
control output value has to be associated. Following the idea of the Sugeno type controller,
we could choose the value
X X Á X
µi (x)y µi (x)
(x,y)∈S i (x,y)∈S
where S is the set of known input-output tuples for the controller and i indexes the rules.
Another way to obtain directly a fuzzy clustering is to apply the modified Kohonen
network proposed in [13].
Kosko uses another approach to generate fuzzy-if-then rules from existing data [107].
Kosko shows that fuzzy sets can be viewed as points in a multidimensional unit hypercube.
This makes it possible to use fuzzy associative memories (FAM) to represent fuzzy rules.
Special adaptive clustering algorithms allow to learn these representations (AFAM).
In [156] fuzzy rules with variable fuzzy regions (hyperboxes) are extracted for classifica-
tion problems. This approach has a potential applicability to problems having a high-
dimentional input space. But because the overlap of hyperboxes of different classes must
be resolved by dynamically expanding, splitting and contracting hyperboxes, the approach
is difficult to apply to the problems in which several classes overlap.
Abe and Lan [2] suggest a method for extracting fuzzy rules for pattern classification.
The fuzzy rules with variable fuzzy regions were defined by activation hyperboxes which
show the existence region of data for a class and inhibition hyperboxes which inhibit the
existence of the data for that class. These rules were extracted directly from numerical
201
data by recursively resolving overlaps between two classes.
Abe and Lan [3] present a method for extracting fuzzy rules directly from numerical data
for function approximation. Suppose that the unknown function has a one-dimensional
output y and an m-dimensional input vector x. First we divide, [M1 , M2 ], the universe of
discourse of y into n intervals as follows:
where y0 = M1 and yn = M2 . We call the i-th interval the output interval i. Using the
input data whose outputs are in the output interval i, we recursively define the input
region that generates output in the output interval i.
Namely, first we determine activation hyperboxes, which define the input region corre-
sponding to the output interval i, by calculating the minimum and maximum values of
input data for each output interval.
If the activation hyperbox for the output interval i overlaps with the activation hyperbox
for the output interval j, the overlapped region is defined as an inhibition hyperbox.
If the input data for output intervals i or/and j exist in the inhibition hyperbox, within
this inhibition hyperbox, we define one or two additional activation hyperboxes; moreover,
if two activation hyperboxes are defined and they overlap, we further define an additional
inhibition hyperbox: this process is repeated until the overlap is resolved. Fig. 3.45
illustrates this process schematically.
Aii(1) level 1
Output interval i
Iij(1)
Output interval j
Ajj(1)
level 2
Aij(2)
Iij(2)
Aji(2)
level 3
Figure 3.45 Recursive definition of activation and inhibition hyperboxes.
202
inhibition hyperbox (if generated), a fuzzy rule is defined. Fig. 3.46 shows a fuzzy system
architecture, including a fuzzy inference net which calculates degrees of membership for
output intervals and a defuzzifier.
For an input vector x, degrees of membership for output intervals 1 to n are calculated
in the inference net and then the output y is calculated by defuzzifier using the degrees of
membership as inputs.
i y
min Defuzzifier
Overlap with interval k
max n
The fuzzy inference net consists of four layers at most. The inference net is sparsely
connected. Namely, different output intervals have different units for the second to fourth
layers and there is no connection among units of different output intervals.
• The second layer units consist of fuzzy rules which calculate the degrees of mem-
bership for an input vector x.
• The third layer units take the maximum values of inputs from the second layer, which
are the degrees of membership generated by resolving overlaps between two output
intervals. The number of third layer units for the output interval i is determined by
the number of output intervals whose input spaces overlap with that of the output
interval i. Therefore, if there is no overlap between the input space of the output
interval i and that of any other output intervals, the output interval i and that of
any other output intervals, the network for the output interval i is reduced to two
layers.
• The fourth layer unit for the output interval i takes the minimum value among the
maximum values generated by the preceding layer, each of them is associated with
an overlap between two output intervals. Therefore, if the output interval i overlaps
with only one output interval, the network for the output interval i is reduced to
three layers. Calculation of a minimum in the fourth layer resolves overlaps among
more than two output intervals. Thus in the process of generating hyperboxes, we
need to resolve only an overlap between two output intervals at a time.
203
3.6 Neuro-fuzzy classifiers
Conventional approaches of pattern classification involve clustering training samples and
associating clusters to given categories. The complexity and limitations of previous mech-
anisms are largely due to the lacking of an effective way of defining the boundaries among
clusters. This problem becomes more intractable when the number of features used for
classification increases. On the contrary, fuzzy classification assumes the boundary be-
tween two neighboring classes as a continuous, overlapping area within which an object
has partial membership in each class. This viewpoint not only reflects the reality of many
applications in which categories have fuzzy boundaries, but also provides a simple repre-
sentation of the potentially complex partition of the feature space. In brief, we use fuzzy
IF-THEN rules to describe a classifier. Assume that K patterns xp = (xp1 , . . . , xpn ),
p = 1, . . . , K are given from two classes, where xp is an n-dimensional crisp vector.
Typical fuzzy classification rules for n = 2 are like
If xp1 is small and xp2 is very large then xp = (xp1 , xp2 ) belongs to Class C1
If xp1 is large and xp2 is very small then xp = (xp1 , xp2 ) belongs to Class C2
where xp1 and xp2 are the features of pattern (or object) p, small and very large are
linguistic terms characterized by appropriate membership functions. The firing level of a
rule
<i : If xp1 is Ai and xp2 is Bi then xp = (xp1 , xp2 ) belongs to Class Ci
with respect to a given object xp is interpreted as the degree of belogness of xp to Ci .
This firing level, denoted by αi , is usually determined as
204
the respective input feature. If this combination is not identical to the antecedents of an
already existing rule then a new rule is created.
However, it can occur that if the fuzzy partition is not set up correctly, or if the number
of linguistic terms for the input features is not large enough, then some patterns will be
missclassified.
x2
B3 1
B2 1/2
B1
A1 A2 A3
1
1/2 1 x1
Figure 3.47 Initial fuzzy partition with 9 fuzzy subspaces and 2 misclassified
patterns. Closed and open circles represent the given pattens from Class 1 and Class 2,
respectively.
The following 9 rules can be generated from the initial fuzzy partitions shown in Figure
3.47:
<1 : If x1 is small and x2 is big then x = (x1 , x2 ) belongs to Class C1
<2 : If x1 is small and x2 is medium then x = (x1 , x2 ) belongs to Class C1
<3 : If x1 is small and x2 is small then x = (x1 , x2 ) belongs to Class C1
<4 : If x1 is big and x2 is small then x = (x1 , x2 ) belongs to Class C1
<5 : If x1 is big and x2 is big then x = (x1 , x2 ) belongs to Class C1
<6 : If x1 is medium and x2 is small then xp = (x1 , x2 ) belongs to Class C2
<7 : If x1 is medium and x2 is medium then xp = (x1 , x2 ) belongs to Class C2
<8 : If x1 is medium and x2 is big then xp = (x1 , x2 ) belongs to Class C2
<9 : If x1 is big and x2 is medium then xp = (x1 , x2 ) belongs to Class C2
where we have used the linguistic terms small for A1 and B1 , medium for A2 and B2 , and
big for A3 and B3 .
However, the same rate of error can be reached by noticing that if ”x1 is medium” then the
pattern (x1 , x2 ) belongs to Class 2, independently from the value of x2 , i.e. the following
7 rules provides the same classification result
205
<1 : If x1 is small and x2 is big then x = (x1 , x2 ) belongs to Class C1
<2 : If x1 is small and x2 is medium then x = (x1 , x2 ) belongs to Class C1
<3 : If x1 is small and x2 is small then x = (x1 , x2 ) belongs to Class C1
<4 : If x1 is big and x2 is small then x = (x1 , x2 ) belongs to Class C1
<5 : If x1 is big and x2 is big then x = (x1 , x2 ) belongs to Class C1
<6 : If x1 is medium then xp = (x1 , x2 ) belongs to Class C2
<7 : If x1 is big and x2 is medium then xp = (x1 , x2 ) belongs to Class C2
Figure 3.47a is an example of fuzzy partitions (3 linguistic terms for the first input feature
and 5 for the second) which classify correctly the patterns.
As an other example, Let us consider a two-class classification problem [94]. In Figure 3.48
closed and open rectangulars represent the given from Class 1 and Class 2, respectively.
If one tries to classify all the given patterns by fuzzy rules based on a simple fuzzy grid,
a fine fuzzy partition and (6 × 6 = 36) rules are required.
206
0.5
B6 1
R25 R45
X2
R22 R42 B2
B1 0
A1 A4 A6
0.5
0 X1 1
Figure 3.49 Fuzzy partition with 36 fuzzy subspaces.
However, it is easy to see that the patterns from Figure 3.48 may be correctly classified
by the following five fuzzy IF-THEN rules
Sun and Jang [160] propose an adaptive-network-based fuzzy classifier to solve fuzzy
classification problems.
Figure 3.49a demonstrates this classifier architecture with two input variables x1 and x2 .
The training data are categorized by two classes C1 and C2 . Each input is represented by
two linguistic terms, thus we have four rules.
207
Layer 1 Layer 2 Layer 3 Layer 4
A1 T
x1 S C1
A2 T
B1 T
S C2
x2
B2 T
• Layer 1 The output of the node is the degree to which the given input satisfies the
linguistic label associated to this node. Usually, we choose bell-shaped membership
functions · µ ¶¸
1 u − ai1 2
Ai (u) = exp − ,
2 bi1
· µ ¶¸
1 v − ai2 2
Bi (v) = exp − ,
2 bi2
to represent the linguistic terms, where
is the parameter set. As the values of these parameters change, the bell-shaped
functions vary accordingly, thus exhibiting various forms of membership functions
on linguistic labels Ai and Bi . In fact, any continuous, such as trapezoidal and
triangular-shaped membership functions, are also quantified candidates for node
functions in this layer. The initial values of the parameters are set in such a way
that the membership functions along each axis satisfy ²-completeness, normality
and convexity. The parameters are then tuned with a descent-type method.
• Layer 2 Each node generates a signal corresponing to the conjuctive combination
of individual degrees of match. The output signal is the firing strength of a fuzzy
rule with respect to an object to be categorized.
In most pattern classification and query-retrival systems, the conjuction operator
plays an important role and its interpretation context-dependent.
Since does not exist a single operator that is suitable for all applications, we can use
parametrized t-norms to cope with this dynamic property of classifier design. For
example, we can use Hamacher’s t-norm with parameter γ ≥ 0
ab
HAN Dγ (a, b) = ,
γ + (1 − γ)(a + b − ab)
208
or Yager’s t-norm with parameter p > 0
All nodes in this layer is labeled by T , because we can choose any t-norm for
modeling the logical and operator. The nodes of this layer are called rule nodes.
Features can be combined in a compensatory way. For instance, we can use the
generalized p-mean proposed by Dyckhoff and Pedrycz:
µ ¶1/p
xp + y p
, p ≥ 1.
2
We take the linear combination of the firing strengths of the rules at Layer 3 and
apply a sigmoidal function at Layer 4 to calculate the degree of belonging to a
certain class.
{(xk , y k ), k = 1, . . . , K}
then the parameters of the hybrid neural net (which determine the shape of the member-
ship functions of the premises) can be learned by descent-type methods. This architecture
and learning procedure is called ANFIS (adaptive-network-based fuzzy inference system)
by Jang [97].
The error function for pattern k can be defined by
1£ k ¤
Ek = (o1 − y1k )2 + (ok2 − y2k )2
2
where y k is the desired output and ok is the computed output by the hybrid neural net.
209
3.7 FULLINS
Sugeno and Park [158] proposed a framework of learning based on indirect linguistic
instruction, and the performance of a system to be learned is improved by the evaluation
of rules.
Background Explanation
Knowledge
CER Supervisor
SRE
Dialogue Observation
Judgment
Instructions
Interpretation
Self of
Regulating Instructions
Task Performance
210
• Interpretation Functional Component
Interprets instructions by the meaning elements using the Background Knowledge
and the Dialogue Components. An instruction is assumed to have some meaning
elements, and a meaning element is associated with values called trends.
We now describe the method of linguistic instructions. A direct instruction is in the form
of entire methods or individual IF-THEN rules. An individual IF-THEN rule is a rule
(basic performance knowledge) to perform a given goal. An entire method is a set of rules
that work together to satisfy a goal. It is difficult for supervisor to give direct instructions,
because the instructions must be based on precise structure and components of rules to
reach a given goal. An indirect instruction is not a part of the basic performance knowledge
prepared for the system. An indirect instruction is not given in any specific format, and the
contents of the instruction have macroscopic properties. In FULLINS indirect instructions
are interpreted by meaning elements and their trends. For example, in a driving school
does not give minute instructions about the steering angle of the wheel, the degree of
stepping on the accelerator, etc. when he teaches a student higher level driving techniques
such as turning around a curve, safe driving, etc. After explanating and demonstrating
some driving techniques, the instructor gives the student a macroscopic indirect linguistic
instruction based on his judgement and evaluation of the student’s performance.
E.g. ”Turning around a curve”, the instructor judges the performance of a student’s
driving techniques and then gives an instruction to him like this
If you approach near the turning point, turn slowly, turn smoothly, etc.
If an instruction is given, the student interprets it with his internal knowledge through
dialogue with the instructor: Turn around the curve slowly, step on the brake slightly or
step on the accelerator weakly.
Indirect instructions Li have three components:
where LH stands for Linguistic hedges, AW for Atomic words, AP for Auxiliary Phrases.
An indirect instruction in a driving school is:
Li = [If you approach near the turning point, turn slowly, turn smoothly]
Then the following dialogue can take place between the instructor and the student
211
• SYS: Do you want me ”press the accelerator weaker than before”?
• SV: RIGHT
• SV: RIGHT
The supervisor’s instruction is interpreted through two questions, because there exists
causal relation between brake and accelerator in the above assumption:
Li = [press the brake slightly (m1 )] and [press on accelerator slowly (m2 )] and
[turn steering small (m3 )].
Instructions are entered by the supervisor’s input-key in FULLINS and by the instructor’s
voice in the driving school.
Definition 3.7.1 Meaning elements are words or phrases to interpret indirect linguistic
instructions.
• [degree of steering].
∆m1(+)
m1
∆m1(0)
∆m1(-)
Figure 3.51 Three trends of the meaning element m1 .
212
A set of meaning elements consists of a set of dependent and independent meaning ele-
ments. If the meaning element and its trend, [press on brake slightly] is selected, then
[press on accelerator weakly] is also selected without having any dialogue with the
instructor. The causal net is
∆m1 (+) → ∆m2 (−)
For example, if we have
∆m1 (+) → ∆m3 (+) → ∆m4 (−), ∆m5 (+) → ∆m6 (+)
m1 m5 m4
Figure 3.52 Causal net.
Meaning elements are searched through dialogue between the system and the supervisor,
and then Dialogue Meaning Elements Set and the Linguistic Instruction Knowledge Base
are used. Linguistic Instruction Knowledge Base consists of two memory modules: Atomic
Words Memory and Linguistic Hedges Memory module. Atomic Words Memory is a
module in which some atomic words are memorized. Some linguistic hedges are memorized
with each weight in Linguistic Hedges Memory:
[(non, 0), (slightly, 0.2), (rather, 0.4), (more, 0.6), (pretty, 0.8), (very, 1.0)].
LH entered together with AW is matched with each linguistic hedge prepared in Linguistic
Hedges Memory, then the weight allocated on the hedge is selected. The meaning of the
instruction is restricted by LH: the consequent parts of the evaluation rule constructed
by the searched meaning elements are modified by the weight of LH. The interpretation
of linguistic instruction
Li = (LHi )(∆m1 (+))
is the following
The amount of information about a learning object is increased, as the number of supervi-
sor’s instructions increases. If a supervisor’s instruction is given, Atomic Words Memory
is checked if AWi of the instruction does or does not exist in it. If the same one exists in
Atomic Words Memory then AWi is interpreted by the matched meaning element without
searching meaning element through dialogue.
213
The evaluation rule is constructed by a combination of the meaning element and its trend
using Constructing evaluation Rules. The meaning of linguistic instruction is restricted
by modifying the consequent part of the evaluation rule by
∆H = WLHi · ∆R
where ∆R is the maximum value for shifting the consequence parameter by LHi . The
figure shows an evaluation rule, where MOD is the modifying value of the parameter for
the consequent part in the basic performance rule. The modifying value by linguistic
hedge [more] is
∆H = 0.6 · ∆R.
The membership functions of the consequent part [fast] are ”zo”, ”nm” and ”nb” and
the ultimate membership functions of the consequent part are ”zo∗ ”, ”nm∗ ” and ”nb∗ ”.
zo fast zo* more
small
nm nm*
med
nb nb*
big
P3 P2
P2
P1
P1
P0 P0
Figure 3.54 Objective Line Following and Eight Flight Systems
The performance of the figure eight flight is improved by learning from the supervisor’s
instructions. The measure of performance of the is given by the following goals: following
the objective line P1 P2 , adjusting of the diameter of the right turning circle, and making
both diameters small or large simultaneously.
214
3.8 Applications of fuzzy neural systems
The first applications of fuzzy neural networks to consumer products appeared on the
(Japanese and Korean) market in 1991. Some examples include air conditioners, electric
carpets, electric fans, electric thermo-pots, desk-type electric heaters, forced-flue kerosene
fan heaters, kerosene fan heaters, microwave ovens, refrigerators, rice cookers, vacuum
cleaner, washing machines, clothes driers, photocopying machines, and word processors.
Neural networks are used to design membership functions of fuzzy systems that are em-
ployed as decision-making systems for controlling equipment. Although fuzzy logic can
encode expert knowledge directly using rules with linguistic labels, it usually takes a lot
of time to design and tune the membership functions which quantitatively define these
linquistic labels. Neural network learning techniques can automate this process and sub-
stantially reduce development time and cost while improving performance. The idea
of using neural networks to design membership functions was proposed by Takagi and
Hayashi [162]. This was followed by applications of the gradient descent method to the
tuning of parameters that define the shape and position of membership functions. This
system tuning is equivalent to learning in a feed-forward network. This method has been
widely used to design triangle membership functions, Gaussian membership functions,
sigmoidal membership functions, and bell-shaped membership functions. Simple function
shapes, such as triangles, are used for most actual products. The center and widths of
the membership functions are tuned by a gradient method, reducing the error between
the actual fuzzy system output and the desired output. Figure 3.55 is an example of this
type of neural network usage.
Temperature
Exposure lamp control
Humidity
Neural network
(as developing tool)
Nikko Securities uses a neural network to improve the rating of convertible bonds [140].
The system learns from the reactions of an expert rating instruction, which can change
according to the actual economic situation. It analyzes the results, and then uses them to
give advice. Their system consists of a seven-layer neural network. The neural network’s
215
internal connections and synaptic weights are initialized using the knowledge in the fuzzy
logic system; it then learns by backpropagation learning and changes its symbolic repre-
sentation accordingly. This representation is then returned to a fuzzy logic representation,
and the system has acquired knowledge. The system can then give advice based on the
knowledge it has acquired. Such a system is called a neural fuzzy system.
Distributed representation
Structured
Neural Network Actual system
3
2 4 Symbolic representation
Fuzzy logic, 1
Mathematical models, Expert
and Algorithms 5
This seven-layer system had a ratio of correct answers of 96%. A similar learning system
with a conventional three-layered neural network had a ratio of correct answers of 84%,
and was difficult to understand the internal representation. The system learned 40 times
faster than the three-layer system. This comparison is evidence of the effectiveness of
neural fuzzy systems.
Another way to combine fuzzy systems and neural networks is to connect them up serially.
In the Sanyo electric fan [151], the fan must rotate toward the user - which requires
calculating the direction of the remote controller. Three infrared sensors in the fan’s
body detect the strengths of the signal from a remote controller. First, the distance to
the remote is calculated by a fuzzy system. Then, this distance value and the ratios of
sensor outputs are used by a neural network to compute the required direction. The
latter calculation is done by a neural net because neither mathematical models nor fuzzy
reasoning proved good at carrying out this function. The final product has an error of
±4o as opposed to the ±10o error of statistical regression methods [138].
Sanyo uses neural networks for adjusting auto-exposure in their photocopying machine.
Moreover, the toner control of this machine is controlled by fuzzy reasoning. Ricoh Co. has
applied two neural networks to control electrostatic latent image conditions at necessary
potential, as well as a neural network to figure optimum developing bias voltage from
image density, temperature, humidity, and copy volume [127].
To make the control smoother, more sensitive, and more accurate one has to incorporate
more sensor data. This becomes more complicated as the input space increases in dimen-
sion. In this approach, a neural net handles the larger set of sensor inputs and corrects
216
the output of a fuzzy system (which was designed earlier for the old set of inputs). A
complete redesigning of the fuzzy system is thus avoided. This leads to substantial sav-
ings in development time (and cost), since redesigning the membership functions, which
becomes more difficult as the number of inputs increases, is obviated.
Figure 3.57 shows the schematic of a Hithachi washing mashine. The fuzzy system shown
in the upper part was part of the first model. Later, an improved model incorporated ex-
tra information by using a correcting neural net, as shown. The additional input (fed only
to the net) is electrical conductivity, which is used to measure the opacity/transparency
of the water. Toshiba produces washing mashines which have a similar control system
[136]. Sanyo uses a similar approach in its washing machine, although some of the in-
puts/outputs are different.
Washing time
Fuzzy
Clothes quality System
Rinsing time
Spinning time
Electrical conductivity
217
Bibliography
[1] S. Abe and M.-S. Lan, A classifier using fuzzy rules extracted directly from
numerical data, in: Proceedings of IEEE Internat. Conf. on Fuzzy Systems, San
Francisco,1993 1191-1198.
[2] S. Abe and M.-S. Lan, Fuzzy rules extraction directly from numerical data for
function approximation, IEEE Trans. Syst., Man, and Cybernetics, 25(1995)
119-129.
[3] S. Abe and M.-S. Lan, A method for fuzzy rule extraction directly from nu-
merical data and its application to pattern classification, IEEE Transactions on
Fuzzy Systems, 3(1995) 18-28.
[4] F. Aminzadeh and M. Jamshidi eds., Fuzzy Sets, Neural Networks, and Dis-
tributed Artificial Intelligence (Prentice-Hall, Englewood Cliffs, 1994).
[5] P.E. An, S. Aslam-Mir, M. Brown, and C.J. Harris, A reinforcement learning
approach to on-line optimal control, in: Proc. of IEEE International Conference
on Neural Networks, Orlando, Fl, 1994 2465–2471.
[6] K. Asakawa and H. Takagi, Neural Networks in Japan Communications of ACM,
37(1994) 106-112.
[7] K.. Asai, M. Sugeno and T. Terano, Applied Fuzzy Systems (Academic Press,
New York, 1994).
[8] A. Bastian, Handling the nonlinearity of a fuzzy logic controller at the transition
between rules, Fuzzy Sets and Systems, 71(1995) 369-387.
[9] H.R. Berenji, A reinforcement learning-based architecture for fuzzy logic control,
Int. Journal Approximate Reasoning, 6(1992) 267-292.
[10] H.R. Berenji and P. Khedkar, Learning and tuning fuzzy logic controllers
through reinforcements, IEEE Transactions on Neural Networks, 3(1992) 724-
740.
[11] H.R. Berenji, R.N. Lea, Y. Jani, P. Khedkar, A.Malkani and J. Hoblit, Space
shuttle attitude control by reinforcement learning and fuzzy logic, in: Proc.
IEEE Internat. Conf. on Fuzzy Systems, San Francisco,1993 1396-1401.
[12] H.R. Berenji, Fuzzy systems that can learn, in: J.M. Zurada, R.J. Marks and
C.J. Robinson eds., Computational Intelligence: Imitating Life (IEEE Press,
New York, 1994) 23-30.
218
[13] J.C. Bezdek, E.C. Tsao and N.K. Pal, Fuzzy Kohonen clustering networks, in:
Proc. IEEE Int. Conference on Fuzzy Systems 1992, San Diego, 1992 1035–1043.
[14] J.C. Bezdek and S.K. Pal eds., Fuzzy Models for Pattern Recognition (IEEE
Press, New York, 1992).
[15] S.A. Billings, H.B. Jamaluddin, and S. Chen. Properties of neural networks with
application to modelling nonlinear systems, Int. J. Control, 55(1992)193–224.
[16] A. Blanco, M. Delgado and I. Requena, Improved fuzzy neural networks for
solving relational equations, Fuzzy Sets and Systems, 72(1995) 311-322.
[17] M. Brown and C.J. Harris, A nonlinear adaptive controller: A comparison be-
tween fuzzy logic control and neurocontrol. IMA J. Math. Control and Info.,
8(1991) 239–265.
[18] M. Brown and C. Harris, Neurofuzzy Adaptive Modeling and Control (Prentice-
Hall, Englewood Cliffs, 1994).
[19] J.J. Buckley, Theory of the fuzzy controller: An introduction, Fuzzy Sets and
Systems, 51(1992) 249-258.
[20] J.J. Buckley and Y. Hayashi, Fuzzy neural nets and applications, Fuzzy Systems
and AI, 1(1992) 11-41.
[21] J.J. Buckley, Approximations between nets, controllers, expert systems and
processes, in: Proceedings of 2nd Internat. Conf. on Fuzzy Logic and Neural
Networks, Iizuka, Japan, 1992 89-90.
[22] J.J. Buckley, Y. Hayashi and E. Czogala, On the equivalence of neural nets and
fuzzy expert systems, Fuzzy Sets and Systems, 53(1993) 129-134.
[23] J.J.Buckley, Sugeno type controllers are universal controllers, Fuzzy Sets and
Systems, 53(1993) 299-304.
[24] J.J. Buckley and Y. Hayashi, Numerical relationships between neural networks,
continuous functions, and fuzzy systems, Fuzzy Sets and Systems, 60(1993) 1-8.
[25] J.J. Buckley and Y. Hayashi, Hybrid neural nets can be fuzzy controllers and
fuzzy expert systems, Fuzzy Sets and Systems, 60(1993) 135-142.
[26] J.J. Buckley and E. Czogala, Fuzzy models, fuzzy controllers and neural nets,
Arch. Theoret. Appl. Comput. Sci., 5(1993) 149-165.
[27] J.J. Buckley and Y. Hayashi, Can fuzzy neural nets approximate continuous
fuzzy functions? Fuzzy Sets and Systems, 61(1993) 43-51.
[28] J.J .Buckley and Y. Hayashi, Fuzzy neural networks, in: L.A. Zadeh and
R.R. Yager eds., Fuzzy Sets, Neural Networks and Soft Computing (Van Nos-
trand Reinhold, New York, 1994) 233-249.
[29] J.J .Buckley and Y. Hayashi, Fuzzy neural networks: A survey, Fuzzy Sets and
Systems, 66(1994) 1-13.
219
[30] J.J .Buckley and Y. Hayashi, Neural nets for fuzzy systems, Fuzzy Sets and
Systems, 71(1995) 265-276.
[31] G.A.Capenter et al, Fuzzy ARTMAP: A neural network architecture for incre-
mental supervised learning of analog multidimensional maps, IEEE Transac-
tions on Neural Networks, 3(1992) 698-713.
[32] S. Chen, S.A. Billings, and P.M. Grant, Recursive hybrid algorithm for non-
linear system identification using radial basis function networks, Int. J. Control,
55(1992) 1051–1070.
[33] F.C. Chen and M.H. Lin, On the learning and convergence of radial basis net-
works, in: Proc. IEEE Int. Conf. Neural Networks, San Francisco, 1993 983–988.
[34] E. Cox, Adaptive fuzzy systems, IEEE Spectrum, , February 1993, 27–31.
[35] E. Cox, The Fuzzy system Handbook. A Practitioner’s Guide to Building, Using,
and Maintaining Fuzzy Systems (Academic Press, New York, 1994).
[36] D. Dumitrescu, Fuzzy training procedures I, Fuzzy Sets and Systems, 56(1993)
155-169.
[37] P. Eklund, H. Virtanen and T. Riisssanen, On the fuzzy logic nature of neural
nets, in: Proccedings of Neuro-Nimes, 1991 293–300.
[38] P. Eklund and F. Klawonn, A Formal Framework for Fuzzy Logic Based Diagno-
sis, in: R.Lowen and M.Roubens eds., Proccedings of the Fourth IFSA Congress,
vol. Mathematics, Brussels, 1991, 58-61.
[40] P. Eklund, F. Klawonn, and D. Nauck, Distributing errors in neural fuzzy con-
trol. in: Proc. 2nd Internat Conf. on Fuzzy Logic and Neural Networks, Iizuka,
Japan, 1992 1139–1142.
[41] P. Eklund and F. Klawonn, Neural fuzzy logic programming, IEEE transactions
on Neural Networks 3(1992) 815-818.
[42] P. Eklund, Neural Logic: A Basis for Second Generation Fuzzy Controllers, in:
U.Höhle and E.P.Klement eds., Proceedings of 14th Linz Seminar on Fuzzy Set
Theory, Johannes Kepler Universität, 1992 19-23.
[44] P. Eklund, J. Forsström, A. Holm, M.. Nyström, and G. Selén, Rule generation
as an alternative to knowledge acquisition: A systems architecture for medical
informatics, Fuzzy Sets and Systems, 66(1994) 195-205.
220
[45] P. Eklund, Network size versus preprocessing, in: R.R. Yager and L.A. Zadeh
eds., Fuzzy Sets, Neural Networks and Soft Computing (Van Nostrand, New
York, 1994) 250-264.
[46] P. Eklund, A generic system for developing medical decision support, Fuzzy
Systems A.I. Rep. Letters, 3(1994) 71-78.
[48] A.O. Esogbue, A fuzzy adaptive controller using reinforcement learning neural
networks, in: Proc. IEEE Internat. Conf. on Fuzzy Systems, San Francisco,1993
178–183.
[52] S. Gallant, Neural Network Learning and Expert Systems, MIT Press, Cam-
bridge, Mass., USA, 1993
[53] A. Geyer-Schulz, Fuzzy rule based Expert Systems and Genetic Learning
(Physica-Verlag, Berlin, 1995).
[54] S. Giove, M. Nordio and A. Zorat, An Adaptive Fuzzy Control for Automatic
Dialysis, in: E.P. Klement and W. Slany eds., Fuzzy Logic in Artificial Intelli-
gence, (Springer-Verlag, Berlin 1993) 146-156.
[55] P.Y. Glorennec, Learning algorithms for neuro-fuzzy networks, in: A. Kandel
and G. Langholz eds., Fuzzy Control Systems (CRC Press, New York, 1994)
4-18.
[56] A. Gonzalez, R. Perez and J.L. Verdegay, Learning the structure of a fuzzy rule:
A genetic approach, Fuzzy Systems A.I.Rep. Letters, 3(1994) 57-70.
[57] S. Goonatilake and S. Khebbal eds., Intelligent Hybrid Systems, John Wiley and
Sons, New York 1995.
[58] M.M. Gupta and J. Qi, On fuzzy neuron models, in: Proceedings of International
Joint Conference on Neural Networks, Seattle, 1991 431-436.
[59] M.M. Gupta and J. Qi, On fuzzy neuron models, in: L.A. Zadeh and J. Kacprzyk
eds., Fuzzy Logic for the Management of Uncertainty (J. Wiley, New York, 1992)
479-491.
221
[60] M.M. Gupta, Fuzzy logic and neural networks, Proc. 2nd Internat. Conf. on
Fuzzy logic and Neural Networks, Iizuka, Japan, 1992 157-160.
[61] M.M. Gupta and M.B. Gorzalczany, Fuzzy neuro-computation technique and its
application to modeling and control, in: Proc. IEEE Internat. Conf on Fuzzy
Systems, San Diego, 1992 1271-1274.
[62] M.M. Gupta and D.H. Rao, On the principles of fuzzy neural networks, Fuzzy
Sets and Systems, 59(1993) 271-279.
[63] S.K. Halgamuge and M. Glesner, Neural networks in designing fuzzy systems
for real world applications, Fuzzy Sets and Systems, 65(1994) 1-12.
[64] C.J. Harris, C.G. Moore, and M. Brown, Intelligent control, aspects of fuzzy
logic and neural networks (World Scientific Press, 1993).
[65] C.J. Harris ed., Advances in Intelligent Control (Taylor and Francis, London,
1994).
[67] Y. Hayashi, J.J. Buckley and E. Czogala, Fuzzy neural controller, in: Proc.
IEEE Internat. Conf on Fuzzy Systems, San Diego, 1992 197-202.
[69] Y. Hayashi, Neural expert system using fuzzy teaching input, in: Proc. IEEE
Internat. Conf on Fuzzy Systems, San Diego, 1992 485-491.
[70] Y. Hayashi, J.J. Buckley and E. Czogala, Fuzzy neural network with fuzzy
signals and weight, International Journal of Intelligent Systems, 8(1992) 527-
537.
[71] Y. Hayashi, J.J. Buckley and E. Czogala, Direct fuzzification of neural network
and fuzzified delta rule, Proc. 2nd Internat. Conf. on Fuzzy logic and Neural
Networks, Iizuka, Japan, 1992 73-76.
[72] Y. Hayashi and J.J. Buckley, Direct fuzzification of neural networks, in: Pro-
ceedings of 1st Asian Fuzzy Systems Symposium, Singapore, 1993 560-567.
[73] Y. Hayashi and J.J. Buckley, Approximations between fuzzy expert systems and
neural networks, International Journal of Approximate Reasoning, 10(1994) 63-
73.
[75] K. Hirota and W. Pedrycz, OR/AND neuron in modeling fuzzy set connectives,
IEEE Transactions on Fuzzy Systems, 2(994) 151-161.
222
[76] K. Hirota and W. Pedrycz, Fuzzy modelling environment for designing fuzzy
controllers, Fuzzy Sets and Systems, 70(1995) 287-301.
[77] Hitachi, Neuro and fuzzy logic automatic washing machine and fuzzy logic drier,
Hitachi News Rel., No. 91-024 (Feb. 26, 1991). Hitachi, 1991 (in Japanese).
[80] K.J. Hunt, D. Sbarbaro-Hofer, R. Zbikowski and P.J. Gawthrop, Neural net-
works for control systems - a survey, Automatica, 28(1992) 1083–1112.
[81] H. Ichihashi, Iterative fuzzy modelling and a hierarchical network, in: R.Lowen
and M.Roubens eds., Proceedings of the Fourth IFSA Congress, Vol. Engineer-
ing, Brussels, 1991 49-52.
[85] H. Ishibuchi, K. Nozaki and H. Tanaka, Efficient fuzzy partition of pattern space
for classification problems, Fuzzy Sets and Systems, 59(1993) 295-304.
[86] H. Ishibuchi, R. Fujioka and H. Tanaka, Neural networks that learn from fuzzy
IF-THEN rules, IEEE Transactions on Fuzzy Systems, 1(993) 85-97.
[87] H. Ishibuchi, H. Okada and H. Tanaka, Fuzzy neural networks with fuzzy weights
and fuzzy biases, in: Proc. IEEE Internat. Conference on Neural Networks, San
Francisco, 1993 447-452.
[89] H. Ishibuchi, K. Kwon and H. Tanaka, Learning of fuzzy neural networks from
fuzzy inputs and fuzzy targets, in: Proc. 5th IFSA World Congress, Seoul,
Korea, 1993 147-150.
223
[90] H. Ishibuchi, K. Nozaki and H. Tanaka, Empirical study on learning in fuzzy
systems, in: Proc. 2nd IEEE Internat. Conference on Fuzzy Systems, San Fran-
cisco, 1993 606-611.
[97] J.-S. Roger Jang, ANFIS: Adaptive-network-based fuzzy inference system, IEEE
Trans. Syst., Man, and Cybernetics, 23(1993) 665-685.
[98] J.M. Keller and D. Hunt, Incorporating fuzzy membership functions into the
perceptron algorithm, IEEE Transactions on Pattern. Anal. Mach. Intell.,
7(1985) 693-699.
[99] J.M. Keller, R.R. Yager and H.Tahani, Neural network implementation of fuzzy
logic, Fuzzy Sets and Systems, 45(1992) 1-12.
[100] J.M. Keller and H.Tahani, Backpropagation neural networks for fuzzy logic,
Information Sciences, 6(1992) 205-221.
[101] J.M. Keller and H.Tahani, Implementation of conjunctive and disjunctive fuzzy
logic rules with neural networks, International Journal of Approximate Reason-
ing, 6(1992) 221-240.
[102] J.M. Keller, R. Krishnapuram, Z.H. Chen and O. Nasraoui, Fuzzy additive
hybrid operators for network-based decision making, International Journal of
Intelligent Systems 9(1994) 1001-1023.
[103] E. Khan and P. Venkatapuram, Neufuz: Neural network based fuzzy logic design
algorithms, in: Proceedings of IEEE International Conf. on Fuzzy Systems, San
Francisco, 1993 647–654.
224
[104] P.S. Khedkar, Learning as adaptive interpolation in neural fuzzy systems, in:
J.M. Zurada, R.J. Marks and C.J. Robinson eds., Computational Intelligence:
Imitating Life (IEEE Press, New York, 1994) 31-42.
[105] Y.S. Kim and S. Mitra, An adaptive integrated fuzzy clustering model for pat-
tern recognition, Fuzzy Sets and Systems, 65(1994) 297-310.
[106] S.G. Kong and B. Kosko, Adaptive fuzzy systems for backing up a truck-and-
trailer, IEEE Transactions on Neural Networks, 3(1992) 211-223.
[107] B. Kosko, Neural Networks and Fuzzy Systems (Prentice-Hall, Englewood Cliffs,
1992).
[109] R. Kruse, J. Gebhardt and R. Palm eds., Fuzzy Systems in Computer Science
(Vieweg, Braunschweig, 1994).
[111] H.K. Kwan and Y.Cai, A fuzzy neural network and its application to pattern
recognition, IEEE Transactions on Fuzzy Systems, 3(1994) 185-193.
[112] S.C. Lee and E.T. Lee, Fuzzy sets and neural networks, Journal of Cybernetics
4(1974) 83-103.
[113] S.C. Lee and E.T. Lee, Fuzzy neural networks, Math. Biosci. 23(1975) 151-177.
[114] H.-M. Lee and W.-T. Wang, A neural network architecture for classification of
fuzzy inputs, Fuzzy Sets and Systems, 63(1994) 159-173.
[115] M. Lee, S.Y. Lee and C.H. Park, Neuro-fuzzy identifiers and controllers, J. of
Intelligent Fuzzy Systems, 6(1994) 1-14.
[116] K.-M. Lee, D.-H. Kwang and H.L. Wang, A fuzzy neural network model for
fuzzy inference and rule tuning,International Journal of Uncertainty, Fuzziness
and Knowledge-Based Systems, 3(1994) 265-277.
[117] C.T. Lin and C.S.G. Lee, Neural-network-based fuzzy logic control and decision
system, IEEE Transactions on Computers, 40(1991) 1320-1336.
[118] Y. Lin and G.A. Cunningham III, A new approach to fuzzy-neural system mod-
eling, IEEE Transactions on Fuzzy systems, 3(1995) 190-198.
[119] C.T. Lin and Y.C. Lu, A neural fuzzy system with linguistic teaching signals,
IEEE Transactions on Fuzzy Systems, 3(1995) 169-189.
[120] R.J. Machado and A.F. Rocha, A hybrid architecture for fuzzy connectionist
expert systems, in: A. Kandel and G. Langholz eds., Hybrid Architectures for
Intelligent Systems (CRC Press, Boca Raton, FL, 1992).
225
[121] R.A. Marques Pereira, L. Mich and L. Gaio, Curve reconstruction with dynam-
ical fuzzy grading and weakly continuous constraints, in: Proceedings of the 2nd
Workshop on Current Issues in Fuzzy Technologies, Trento, June 1992, (Dipar-
timento di Informatica e Studi Aziendali, Universitá di Trento 1993) 77-85.
[122] L. Medsker, Hybrid Neural Network and Expert Systems (Kluwer Academic Pub-
lishers, Boston, 1994).
[123] S.Mitra and S.K.Pal, Neuro-fuzzy expert systems: overview with a case study,
in: S.Tzafestas and A.N. Venetsanopoulos eds., Fuzzy Reasoning in Information,
Decision and Control Systems ( Kluwer, Dordrecht, 1994) 121-143.
[124] S.Mitra and S.K.Pal, Self-organizing neural network as a fuzzy classifier, IEEE
Trans. Syst., Man, and Cybernetics, 24(1994) 385-399.
[125] S.Mitra and S.K.Pal, Fuzzy multi-layer perceptron, inferencing and rule gener-
ation, IEEE Transactions on Neural Networks, 6(1995) 51-63.
[126] S.Mitra, Fuzzy MLP based expert system for medical diagnosis, Fuzzy sets and
Systems, 65(1994) 285-296.
[127] T. Morita, M. Kanaya and T. Inagaki, Photo-copier image density control using
neural network and fuzzy theory. in: Proceedings of the Second International
Workshop on Industrial Fuzzy Control and Intelligent Systems, 1992 10-16.
[128] D. Nauck, F. Klawonn and R. Kruse, Fuzzy sets, fuzzy controllers and neu-
ral networks,Wissenschaftliche Zeitschrift der Humboldt-Universität zu Berlin,
reihe Medizin, 41(1992) 99-120.
[129] D. Nauck and R. Kruse, A fuzzy neural network learning fuzzy control rules and
membership functions by fuzzy error backpropagation, in: Proceedings of IEEE
Int. Conference on Neural Networks, San Francisco, 1993 1022-1027.
[130] D. Nauck, F. Klawonn and R. Kruse, Combining neural networks and fuzzy
controllers, in: E.P. Klement and W. Slany eds., Fuzzy Logic in Artificial Intel-
ligence, (Springer-Verlag, Berlin, 1993) 35-46.
[131] D. Nauck and R. Kruse, NEFCON-I: An X-Window based simulator for neural
fuzzy controllers, in: Proceedings of IEEE Int. Conference on Neural Networks,
Orlando, 1994 1638-1643.
[132] D. Nauck, Fuzzy neuro systems: An overview, in: R. Kruse, J. Gebhardt and
R. Palm eds., Fuzzy systems in Computer Science (Vieweg, Wiesbaden, 1994)
91-107.
[133] D. Nauck, Building neural fuzzy controllers with NEFCON-I. in: R. Kruse,
J. Gebhardt and R. Palm eds., Fuzzy systems in Computer Science (Vieweg,
wiesbaden, 1994) 141-151.
[134] D. Nauck, F. Klawonn and R. Kruse, Neurale Netze und Fuzzy-Systeme (Vieweg,
wiesbaden, 1994).
226
[135] D. Nauck and R. Kruse, NEFCLASS - A neuro-fuzzy approach for the classifi-
cation of data, in: K.M. George et al eds., Applied Computing, Proceedings of
the 1995 ACM Symposium on Applied Computing, Nashville, February 26-28,
1995, ACM Press, 1995.
[137] J. Nie and D. Linkens, Fuzzy Neural Control - Principles, Algorithms and Ap-
plications (Prentice-Hall, Englewood Cliffs, 1994).
[138] Nikkei Electronics, New trend in consumer electronics: Combining neural net-
works and fuzzy logic, Nikkei Elec., 528(1991) 165-169 (In Japanese).
[141] S.K.Pal and S.Mitra, Fuzzy versions of Kohonen’s net and MLP-based classi-
fication: Performance evaluation for certain nonconvex decision regions, Infor-
mation Sciences, 76(1994) 297-337.
[143] W. Pedrycz, Fuzzy Control and Fuzzy Systems (Wiley, New York, 1993).
[144] W. Pedrycz, Fuzzy Sets Engineering (CRC Press, Boca Raton, 1995).
[145] C. Posey, A.Kandel and G. Langholz, Fuzzy hybrid systems, in: A. Kandel and
G. Langholz eds., Hybrid architectures for Intelligent Systems (CRC Press, Boca
Raton, Florida, 1992) 174-196.
[146] G.V.S. Rajau and J Zhou, Adaptive hierarchical fuzzy controller, IEEE Trans.
Syst., Man, and Cybernetics, 23(1993) 973-980.
[147] A.L. Ralescu ed., Fuzzy Logic in Artificial Intelligence, Proc. IJCAI’93 Work-
shop, Chambéry, France, Lecture Note in artificial Intelligence, Vol. 847
(Springer, Berlin, 1994).
[148] J. Rasmussen, Diagnostic reasoning in action, IEEE Trans. Syst., Man, and
Cybernetics, 23(1993) 981-992.
[149] I. Requena and M. Delgado, R-FN: A model of fuzzy neuron, in: Proc. 2nd Int.
Conf. on Fuzzy Logic & Neural Networks, Iizuka, Japan, 1992 793-796.
227
[150] T. Riissanen, An Experiment with Clustering, Proceedings MEPP92, Interna-
tional Seminar on Fuzzy Control through Neural Interpretations of Fuzzy Sets,
Mariehamn, Åland, June 15-19, 1992, Åbo Akademi tryckeri, Åbo, 1992, 57-65.
[151] Sanyo, Electric fan series in 1991, Sanyo News Rel., (March 14, 1991). Sanyo,
1991 (In Japanese).
[152] E. Sanchez, Fuzzy logic knowledge systems and artificial neural networks in
medicine and biology, in: R.R. Yager and L.A. Zadeh eds., An Introduction to
Fuzzy Logic Applications in Intelligent Systems (Kluwer, Boston, 1992) 235-251.
[153] J.D. Schaffer, Combinations of genetic algorithms with neural networks or fuzzy
systems, in: J.M. Zurada, R.J. Marks and C.J. Robinson eds., Computational
Intelligence: Imitating Life (IEEE Press, New York, 1994) 371-382.
[154] R. Serra and G. Zanarini, Complex Systems and Cognitive Processes (Springer
Verlag, Berlin, 1990).
[155] J.J. Shann and H.C. Fu, A fuzzy neural network for rule acquiring on fuzzy
control system, Fuzzy Sets and Systems, 71(1995) 345-357.
[158] M. Sugeno and G.-K. Park, An approach to linguistic instruction based learning,
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems,
1(1993) 19-56.
[160] C.-T. Sun and J.-S. Jang, A neuro-fuzzy classifier and its applications, in: Proc.
IEEE Int. Conference on Neural Networks, San Francisco, 1993 94–98.
[161] H. Takagi, Fusion technology of fuzzy theory and neural networks - survey and
future directions, in: Proc. First Int. Conf. on Fuzzy Logic & Neural Networks,
1990 13–26.
[164] I.B.Turksen, Fuzzy expert systems for IE/OR/MS, Fuzzy Sets and Systems,
51(1992) 1-27.
228
[165] K. Uehara and M. Fujise, Learning of fuzzy inference criteria with artificial
neural network, in: Proc. 1st Int. Conf. on Fuzzy Logic & Neural Networks,
Iizuka, Japan, 1990 193-198.
[167] H. Virtanen, Combining and incrementing fuzzy evidence - Heuristic and for-
mal approaches to fuzzy logic programming, in: R.Lowen and M.Roubens eds.,
Proccedings of the fourth IFSA Congress, vol. Mathematics, Brussels, 1991 200-
203.
[168] L.-X. Wang and J.M. Mendel, Generating fuzzy rules by learning from examples,
IEEE Trans. Syst., Man, and Cybernetics, 22(1992) 1414-1427.
[170] P.J. Werbos, Neurocontrol and fuzzy logic: connections and designs, Interna-
tional Journal of Approximate Reasoning, 6(1992) 185-219.
[171] R.R. Yager, Using fuzzy logic to build neural networks, in: R.Lowen and
M.Roubens eds., Proceedings of the Fourth IFSA Congress, Vol. Artifical in-
telligence, Brussels, 1991 210-213.
[172] R.R. Yager, Implementing fuzzy logic controllers using a neural network frame-
work, Fuzzy Sets and Systems, 48(1992) 53-64.
[173] R.R. Yager and L.A. Zadeh eds., Fuzzy Sets, Neural Networks, and Soft Com-
puting (Van Nostrand Reinhold, New York, 1994).
[174] T. Yamakawa, A neo fuzzy neuron and its applications to system identifica-
tion and prediction of chaotic behaviour, in: J.M. Zurada, R.J. Marks and
C.J. Robinson eds., Computational Intelligence: Imitating Life (IEEE Press,
New York, 1994) 383-395.
[175] J. Yan, M. Ryan and J. Power, Using Fuzzy Logic - Towards Intelligent Systems
(Prentice-Hall, Englewood Cliffs, 1994).
[176] Y. Yam and K.S. Leung eds., Future Directions of Fuzzy Theory and Systems
(World Scientific, Singapore, 1994).
229
Chapter 4
Appendix
where y is the portfolio value, the linguistic variables x1 , x2 and x3 denote the exchange
rates between USD and DEM, USD and SEK, and USD and FIM, respectively.
The rules should be interpreted as:
<1 : If the US dollar is weak against German mark and the US dollar is strong against
the Swedish crown and the US dollar is weak against the Finnish mark then our
portfolio value is positive.
<2 : If the US dollar is medium against German mark and the US dollar is medium
against the Swedish crown and the US dollar is medium against the Finnish mark
then our portfolio value is about zero.
<3 : If the US dollar is strong against German mark and the US dollar is strong against
the Swedish crown and the US dollar is strong against the Finnish mark then our
portfolio value is negative.
Choose triangular membership functions for primary fuzzy sets {Li , Mi , Hsi }, i = 1, 2, 3,
take the actual daily exchange rates, a1 , a2 and a3 , from newspapers and evaluate the
daily portfolio value by Sugeno’s reasoning mechanism, i.e.
230
• The individual rule outputs are derived from the relationships
y1 = 200a1 + 100a2 + 100a3
y2 = 200a1 − 100a2 + 100a3
y3 = 200a1 − 100a2 − 100a3
• The overall system output is expressed as
α1 y1 + α2 y2 + α3 y3
y0 =
α1 + α2 + α3
L1 H2 L3
α1
y1
M1 M2 M3
α2
y2
H1 H2 H3
α3
min y3
The fuzzy set L3 describing that ”USD/FIM is low” can be given by the following mem-
bership function
1 − 2(t − 3.5) if 3.5 ≤ t ≤ 4
L3 (t) = 1 if t ≤ 3.5
0 if t ≥ 4
The fuzzy set M3 describing that ”USD/FIM is medium” can be given by the following
membership function
(
1 − 2|t − 4| if 3.5 ≤ t ≤ 4.5
M3 (t) =
0 otherwise
The fuzzy set H3 describing that ”USD/FIM is high” can be given by the following
membership function
1 − 2(4.5 − t) if 4 ≤ t ≤ 4.5
H3 (t) = 1 if t ≥ 4.5
0 if t ≤ 4
231
L3 M3 H3
1
3.5 4 4.5
Figure 4.2 Membership functions for x3 is low”, ”x3 is medium” and ”x3 is high”
The fuzzy set L2 describing that ”USD/SEK is low” can be given by the following mem-
bership function
1 − 2(t − 6.5) if 6.5 ≤ t ≤ 7
L2 (t) = 1 if t ≤ 6.5
0 if t ≥ 7
The fuzzy set M2 describing that ”USD/SEK is medium” can be given by the following
membership function
(
1 − 2|t − 7| if 6.5 ≤ t ≤ 7.5
M2 (t) =
0 otherwise
The fuzzy set H2 describing that ”USD/SEK is high” can be given by the following
membership function
1 − 2(7.5 − t) if 7 ≤ t ≤ 7.5
H2 (t) = 1 if t ≥ 7.5
0 if t ≤ 7
L2 M2 H2
1
6.5 7 7.5
Figure 4.3 Membership functions for x2 is low”, ”x2 is medium” and ”x2 is high”
232
The fuzzy set L1 describing that ”USD/DEM is low” can be given by the following
membership function
1 − 2(t − 1) if 1 ≤ t ≤ 1.5
L1 (t) = 1 if t ≤ 1
0 if t ≥ 1.5
The fuzzy set M1 describing that ”USD/DEM is medium” can be given by the following
membership function
(
1 − 2|t − 1.5| if 1 ≤ t ≤ 2
M1 (t) =
0 otherwise
The fuzzy set H1 describing that ”USD/DEM is high” can be given by the following
membership function
1 − 2(2 − t) if 1.5 ≤ t ≤ 2
H1 (t) = 1 if t ≥ 2
0 if t ≤ 1.5
L1 M1 H1
1
1 1.5 2
Figure 4.4 Membership functions for x1 is low”, ”x1 is medium” and ”x1 is high”
Table 4.1 shows some mean exchange rates from 1995, and the portfolio values de-
rived from the fuzzy rule base < = {<1 , <2 , <3 , } with the initial membership functions
{Li , Mi , Bi }, i = 1, 2, 3 for the primary fuzzy sets.
233
4.2 Exercises
Exercise 4.1 Interpret the following fuzzy set.
Solution 4.1 The fuzzy set from the Figure 4.5 can be interpreted as:
Exercise 4.2 Suppose we have a fuzzy partition of the universe of discourse [−1000, 1000]
with three fuzzy terms {N, ZE, P }, where
1 − (t + 1000)/500 if −1000 ≤ t ≤ −500
N (t) = 1 if t ≤ −1000
0 if t ≥ −500
1 − (1000 − t)/500 if 500 ≤ t ≤ 1000
P (t) = 1 if t ≥ 1000
0 if t ≤ 500
1 if −500 ≤ t ≤ 500
if t ≥ 1000
0
ZE(t) = 0 if t ≤ −1000
1 + (t + 500)/500 if −1000 ≤ t ≤ −500
1 − (t − 500)/500 if 500 ≤ t ≤ 1000
Find the biggest ² for which this fuzzy partition satisfies the property ²-completeness.
N ZE P
234
Exercise 4.3 Show that if γ ≤ γ 0 then the relationship
holds for all x, y ∈ [0, 1], i.e. the family of parametized Hamacher’s t-norms, {HAN Dγ },
is monoton decreasing.
it follows that
ab ab
HAN Dγ (a, b) = ≥ 0 = HAN Dγ 0 (a, b).
γ + (1 − γ)(a + b − ab) γ + (1 − γ 0 )(a + b − ab)
Exercise 4.4 Consider two fuzzy relations R and G, where R is interpreted linguistically
as ”x is approximately equal to y” and the linguistic interpretation of G is ”y is very close
to z”. Assume R and G have the following membership functions
y 1 y2 y3 z1 z2 z3
x1 1 0.1 0.1 y1 0.4 0.9 0.3
R= G=
x2 0 y 0 0.4 0
1 0 2
x3 0.9 1 1 y3 0.9 0.5 0.8
Solution 4.4
y1 y2 y3 z1 z2 z3 z1 z2 z3
x1 1 0.1 0.1 y1 0.4 0.9 0.3 x1 0.4 0.9 0.3
R◦G= ◦
=
x
2 0 1 0 y2
0 0.4 0
x
2 0 0.4 0
x3 0.9 1 1 y3 0.9 0.5 0.8 x3 0.9 0.9 0.8
235
Exercise 4.5 Assume the membership function of the fuzzy set A, big pressure is
1 if u ≥ 5
A(u) = 1 − (5 − u)/4 if 1 ≤ u ≤ 5
0 otherwise
where → is the L
à ukasiewicz implication.
Exercise 4.6 Let A, A0 , B ∈ F. Show that the Generalized Modus Ponens inference
rule with Gödel implication satisfies
• Basic property: A ◦ (A → B) = B
premise if x is A then y is B
fact x is A0
consequence: y is B 0
where the consequence B 0 is determined as a composition of the fact and the fuzzy impli-
cation operator
B 0 = A0 ◦ (A → B)
that is,
B 0 (v) = sup min{A0 (u), (A → B)(u, v)}, v ∈ V.
u∈U
236
Let us choose Gödel implication operator
½
1 if A(x) ≤ B(y)
A(x) → B(y) :=
B(y) otherwise
Proof.
• Basic property.
Let A0 = A and let x, y ∈ IR be arbitrarily fixed. On the one hand from the definition
of Gödel implication operator we obtain
½
A(x) if A(x) ≤ B(y)
min{A(x), A(x) → B(y)} =
B(y) if A(x) > B(y)
That is,
B 0 (y) = sup min{A(x), A(x) → B(y)} ≤ B(y)
x
On the other hand from continuity and normality of A it follows that there exists
an x0 ∈ IR such that A(x0 ) = B(y). So
B 0 (y) = sup min{A(x), A(x) → B(y)} ≥ min{A(x0 ), A(x0 ) → B(y)} = B(y)
x
A A' B'
237
Exercise 4.7 Construct a single-neuron network, which computes the material implica-
tion function. The training set is
x1 x2 o(x1 , x2 )
1. 1 1 1
2. 1 0 0
3. 0 1 1
4. 0 0 1
x1 -1
ο
−0.5
x2 1
Figure 4.8 A single-neuron network for the material implication.
z1 = x0 − y0 = 3 − 3 = 0
238
The firing level of the second rule is
α1 = min{µBIG (3), µSM ALL (3)} = min{0.5, 0.5} = 0.5
the individual output of the second rule is
z2 = x0 + y0 = 3 + 3 = 6
The firing level of the third rule is
α1 = min{µBIG (3), µBIG (3)} = min{0.5, 0.5} = 0.5
the individual output of the third rule is
z3 = x0 + 2y0 = 3 + 6 = 9
and the system output, z0 , is computed from the equation
z0 = (0 × 0.5 + 6 × 0.5 + 9 × 0.5)/(0.5 + 0.5 + 0.5) = 5.0
Exercise 4.10 What is the meaning of the error correction learning procedure?
Solution 4.10 The error correction learning procedure is simple enough in conception.
The procedure is as follows: During training an input is put into the network and flows
through the network generating a set of values on the output units. Then, the actual output
is compared with the desired target, and a match is computed. If the output and target
match, no change is made to the net. However, if the output differs from the target a
change must be made to some of the connections.
Exercise 4.11 Let A = (a, α, β) be a triangular fuzzy number. Calculate [A]γ as a func-
tion of a, α and β.
a-(1- γ) α a+(1- γ) β
γ
a-α a a+β
Figure 4.8 γ-cut of a triangular fuzzy number.
239
Exercise 4.12 Consider some alternative with the following scores on five criteria
Criteria: C1 C2 C3 C4 C5
Importance: VH VH M L VL
Score: M L OU VH OU
Exercise 4.13 Let A = (a, α) and B = (b, β) be fuzzy numbers of symmetrical triangular
form. Calculate their Hausdorff distances, D(A, B), as a function of a, b, α and β.
we get
That is,
D(A, B) = max{|a − b + α − β|, |a − b + β − α|}.
1
E(w1 , w2 ) = [(w2 − w1 )2 + (1 − w1 )2 ]
2
Find analytically the gradient vector
· ¸
0 ∂1 E(w)
E (w) =
∂2 E(w)
Find analytically the weight vector w∗ that minimizes the error function such that
E 0 (w) = 0.
240
Solution 4.14 The gradient vector of E is
· ¸ · ¸
0 (w1 − w2 ) + (w1 − 1) 2w1 − w2 − 1
E (w) = =
(w2 − w1 ) w2 − w 1
241
Solution 4.16 By using the chain rule for derivatives of composed functions we get
exp(−t)
f 0 (t) =
[1 + exp(−t)]2
Exercise 4.17 Construct a hybid neural net implementing Tsukumato’s reasoning mech-
anism with two input variables, two linguistiuc values for each input variable and two
fuzzy IF-THEN rules.
α1 = A1 (x0 ) × B1 (y0 )
α2 = A2 (x0 ) × B2 (y0 ),
where the logical and can be modelled by any continuous t-norm, e.g
α1 = A1 (x0 ) ∧ B1 (y0 )
α2 = A2 (x0 ) ∧ B2 (y0 ),
In this mode of reasoning the individual crisp control actions z1 and z2 are computed as
α1 z1 + α2 z2
z0 = = β1 z1 + β2 z2
α1 + α2
where β1 and β1 are the normalized values of α1 and α2 with respect to the sum (α1 + α2 ),
i.e.
α1 α2
β1 = , β2 = .
α1 + α2 α1 + α2
242
A1 B1 C1
α1
u v z1 w
A2 B2 C2
α2
xo u yo v min z2 w
Figure 4.10 Tsukamoto’s inference mechanism.
A hybrid neural net computationally identical to this type of reasoning is shown in the
Figure 4.11
A 1 A 1 ( x0 )
α1 β1
x0 T N β1 z 1
A 2 A 2 ( x0 ) z0
B1 ( y0 ) β2 z 2
B1 T N
α2 β2
y0
B2 B2 ( y0 )
• Layer 1 The output of the node is the degree to which the given input satisfies
the linguistic label associated to this node.
• Layer 2 Each node computes the firing strength of the associated rule. The output
of top neuron is
α1 = A1 (x0 ) × B1 (y0 ) = A1 (x0 ) ∧ B1 (y0 ),
and the output of the bottom neuron is
α2 = A2 (x0 ) × B2 (y0 ) = A2 (x0 ) ∧ B2 (y0 )
243
Both nodes in this layer is labeled by T , because we can choose other t-norms for
modeling the logical and operator. The nodes of this layer are called rule nodes.
β1 z1 = β1 C1−1 (α1 ).
The output of top neuron is the product of the normalized firing level and the
individual rule output of the second rule
β2 z2 = β2 C2−1 (α2 ),
• Layer 5 The single node in this layer computes the overall system output as the
sum of all incoming signals, i.e.
z0 = β1 z1 + β2 z2 .
Exercise 4.18 Show that fuzzy inference systems with simplified fuzzy IF-THEN rules
are universal approximators.
Solution 4.18 Consider a fuzzy inference systems with two simplified fuzzy IF-THEN
rules
Suppose that the output of the system < = {<1 , <2 } for a given input is computed by
α1 z1 + α2 z2
z= (4.1)
α1 + α2
where α1 and α2 denote the firing strengths of the rules with respect to given input vector.
Let z 0 be the output of the system < for some other input.
We recall the Stone-Weierstrass theorem:
Theorem 4.2.1 Let domain K be a compact space of n dimensions, and let G be a set
of continuous real-valued functions on K, satisfying the following criteria:
244
1. The constant function f (x) = 1 is in G.
3. If f1 and f2 are two functions in G, then f g and α1 f1 + α2 f2 are in G for any two real
numbers α1 and α2 .
Proof. We show that az + bz 0 , ∀a, b ∈ IR and zz 0 can be written in the form (4.1)
which means that fuzzy inference systems with simplified fuzzy IF-THEN rules satisfy
the conditions of Stone-Weierstrass theorem, i.e. they can approximate all continuous
functions on a compact domain.
For az + bz 0 we get
α1 α10 (az1 + bz10 ) + α1 α20 (az1 + bz20 ) + α2 α10 (az2 + bz10 ) + α2 α20 (az2 + bz20 )
α1 α10 + α1 α20 + α2 α10 + α2 α20
So, zz 0 is the output of a fuzzy inference system with four simplified fuzzy IF-THEN,
where the individual rule outputs are: az1 + bz10 , az1 + bz20 , az2 + bz10 and az2 + bz20 , and
the firing strengths of the associated rules are α1 α10 , α1 α20 , α2 α10 and α2 α20 , respectively.
Finally, for zz 0 we obtain
0
α1 z1 + α2 z2 α10 z10 + α20 z20 α1 α10 z1 z10 + α1 α20 z1 z20 + α2 α10 z2 z10 + α2 α20 z2 z20
zz = × =
α1 + α2 α10 + α20 α1 α10 + α1 α20 + α2 α10 + α2 α20
So, zz 0 is the output of a fuzzy inference system with four simplified fuzzy IF-THEN,
where the individual rule outputs are: z1 z10 , z1 z20 , z2 z10 and z2 z20 , and the firing strengths
of the associated rules are α1 α10 , α1 α20 , α2 α10 and α2 α20 , respectively.
Which completes the proof.
Exercise 4.19 Let A1 = (a1 , α) and A2 = (a2 , α) be fuzzy numbers of symmetric trian-
gular form. Compute analytically the membership function of their product-sum, A1 ⊕ A2 ,
defined by
(A1 ⊕ A2 )(y) = sup P AN D(A1 (x1 ), A2 (x2 )) = sup A1 (x1 )A2 (x2 ).
x1 +x2 =y x1 +x2 =y
Solution 4.19 The membership functions of A1 = (a1 , α) and A2 = (a2 , α) are defined
by
245
(
1 − |a1 − t|/α if |a1 − t| ≤ α
A1 (t) =
0 otherwise
(
1 − |a2 − t|/α if |a2 − t| ≤ α
A2 (t) =
0 otherwise
First we show hat the support of the product-sum, A1 ⊕ A2 , is equal to the sum of the
supports of A1 and A2 , i.e.
Using Lagrange’s multipliers method for the solution of (4.2) we get that its optimal value
is µ ¶
a1 + a2 − y 2
1−
2α
and its unique solution is x = 1/2(a1 − a2 + y) (where the derivative of the objective
function vanishes).
In order to determine (A1 ⊕ A2 )(y), y ∈ [a1 + a2 , a1 + a2 + 2α], we need to solve the
following mathematical programming problem
µ ¶µ ¶
a1 − x a2 − y + x
1− 1− (4.3)
α α
subject to a1 ≤ x ≤ a1 + α, a2 ≤ y − x ≤ a2 + α.
Using Lagrange’s multipliers method for the solution of (4.3) we get that its optimal value
is µ ¶
y − (a1 + a2 ) 2
1− .
2α
Summarizing these findings we obtain that
µ ¶2
1 − |a + a − y|/2α
1 2 if |a1 + a2 − y| ≤ 2α
(A1 ⊕ A2 )(y) = (4.4)
0 otherwise
246
Figure 4.12 Product-sum of fuzzy numbers (1, 3/2) and (2, 3/2).
Exercise 4.20 Let Ai = (ai , α), i ∈ N be fuzzy numbers of symmetric triangular form.
Suppose that
X
∞
a := ai
i=1
exists and it is finite. Find the limit distribution of the product-sum
Mn
Ai
i=1
when n → ∞.
Solution 4.20 Let us denote Bn the product sum of Ai , i = 1, . . . , n, i.e.
Bn = A1 ⊕ · · · ⊕ An
Making an induction argument on n we show that
µ ¶n
1 − |a1 + · · · + an − y| if |a1 + · · · + an − y| ≤ nα
Bn (y) = nα (4.5)
0 otherwise
From (4.4) it follows that (4.5) holds for n = 2. Let us assume that it holds for some
n ∈ N. Then using the definition of product-sum we obtain
Bn+1 (y) = (Bn + An+1 )(y) = sup Bn (x1 )An+1 (x2 ) =
x1 +x2 =y
µ ¶µ ¶ µ ¶
|a1 + · · · + an − x1 | 2 |an+1 − x2 | |a1 + · · · + an+1 − y| n+1
sup 1− 1− = 1− .
x1 +x2 =y nα α (n + 1)α
This ends the proof. From (4.5) we obtain the limit distribution of Bn ’s as
µ ¶
|a1 + · · · + an − y| n |a − y|
lim Bn (y) = lim 1 − = exp(− ).
n→∞ n→∞ nα α
247
Exercise 4.21 Suppose the unknown nonlinear mapping to be realized by fuzzy systems
can be represented as
{(x1 , y 1 ), . . . , (xK , y K )}
For modeling the unknown mapping in (4.6), we employ three simplified fuzzy IF-THEN
rules of the following type
if x is small then y = z1
if x is medium then y = z2
if x is big then y = z3
where the linguistic terms A1 = ”small”, A2 = ”medium” and A3 = ”big” are of trian-
gular form with membership functions (see Figure 4.14)
1 if v ≤ c1
A1 (v) = (c2 − x)/(c2 − c1 ) if c1 ≤ v ≤ c2
0 otherwise
(x − c1 )/(c2 − c1 )
if c1 ≤ u ≤ c2
A2 (u) = (c3 − x)/(c3 − c2 ) if c2 ≤ u ≤ c3
0 otherwise
1 if u ≥ c3
A3 (u) = (x − c2 )/(c3 − c2 ) if c2 ≤ u ≤ c3
0 otherwise
Derive the steepest descent method for tuning the premise parameters {c1 , c2 , c3 } and the
consequent parameters {y1 , y2 , y3 }.
A1 A2 A3
c1 c2 c3 1
Figure 4.14 Initial fuzzy partition with three linguistic terms.
Solution 4.21 Let x be the input to the fuzzy system. The firing levels of the rules are
computed by
248
α1 = A1 (x), α2 = A2 (x), α3 = A3 (x),
and the output of the system is computed by
where we have used the identity A1 (x) + A2 (x) + A3 (x) = 1 for all x ∈ [0, 1]. We define
the measure of error for the k-th training pattern as usually
1
Ek = Ek (c1 , c2 , c3 , z1 , z2 , z3 ) = (ok (c1 , c2 , c3 , z1 , z2 , z3 ) − y k )2
2
where ok is the computed output from the fuzzy system corresponding to the input pattern
xk and y k is the desired output, k = 1, . . . , K.
The steepest descent method is used to learn zi in the consequent part of the i-th fuzzy
rule. That is,
∂Ek
z1 (t + 1) = z1 (t) − η = z1 (t) − η(ok − y k )A1 (xk )
∂z1
∂Ek
z2 (t + 1) = z2 (t) − η = z2 (t) − η(ok − y k )A2 (xk )
∂z2
∂Ek
z3 (t + 1) = z3 (t) − η = z3 (t) − η(ok − y k )A3 (xk )
∂z3
where xk is the input to the system, η > 0 is the learning constant and t indexes the
number of the adjustments of zi .
In a similar manner we can tune the centers of A1 , A2 and A3 .
∂Ek ∂ok (x − c1 )
= (o − y )
k k
= (ok − y k ) (z1 − z2 )
∂c1 ∂c1 (c2 − c1 )2
0 ≤ c1 (t + 1) < c2 (t + 1) < c3 (t + 1) ≤ 1
249
Index
250
fuzzy quantity, 9 linguistic quantifiers, 96
fuzzy relation, 25, 235 linguistic variable, 48
fuzzy rule extraction, 202
fuzzy screening system, 100 Mamdani implication, 47
fuzzy set, 9 Mamdani’s inference mechanism, 75
fuzzy subsethood, 15 Mamdani-type FLC, 68
fuzzy training set, 187 maximum t-conorm, 21
measure of andness, 94
Gaines implication, 47 measure of dispersion, 95
Gaussian membership function, 86 measure of orness, 94
generalized p-mean, 92, 209 MICA operator, 90
generalized delta rule, 140 middle-of-maxima method, 72
Generalized Modus Ponens, 52 minimum t-norm, 20
Generalized Modus Tollens, 53 Modus Ponens, 52
genetic algorithms, 194 Modus Tollens, 53
geometric mean, 92
gradient vector, 130 negation rule, 52
Gödel implication, 47 Nguyen’s theorem, 38
normal fuzzy set, 10
Hamacher’s t-conorm, 21
Hamacher’s t-norm, 20, 235 OR fuzzy neuron, 158
Hamming distance, 42 orlike OWA operator, 94
harmonic mean, 92 overall system output, 75, 88, 231
Hausdorff distance, 42 OWA operator, 92
height defuzzification, 73
parity function, 127
hidden layer, 138
partial derivative, 130
hybrid fuzzy neural network, 163
perceptron learning, 124
hybrid neural net, 157, 243
portfolio value, 230
identity quantifier, 99 probabilistic t-conorm, 21
implication-OR fuzzy neuron, 159 probabilistic t-norm, 20
individual rule output, 75, 88, 170, 231 projection of a fuzzy relation, 27
inference mechanism, 75 projection rule, 52
inhibition hyperbox, 202
quasi-arithmetic mean, 92
intersection of fuzzy sets, 18
regular fuzzy neural net, 162
Kleene-Dienes implication, 47
regular neural net, 157
Kleene-Dienes-ÃLukasiewicz, 47
removal of the threshold, 121
Kohonen’s learning algorithm, 145
Kwan and Cai’s fuzzy neuron, 159 scalar product, 121
simplified fuzzy rules, 195
Larsen implication, 47
single-layer feedforward net, 123
Larsen’s inference mechanism, 79
singleton fuzzifier, 86
learning of membership functions, 195
slope, 198
learning rate, 124, 198
steepest descent method, 196, 197
linear activation function, 131
Stone-Weierstrass theorem, 142
linear combination of fuzzy numbers, 37
subset property of GMP, 54
linear threshold unit, 120
Sugeno’s inference mechanism, 77
linguistic modifiers, 49
251
sup-T composition, 29
sup-T compositional rule, 53
superset property of GMP, 55
supervised learning, 138
support, 10
t-conorm-based union, 22
t-norm implication, 47
t-norm-based intersection, 22
threshold-level, 120
total indeterminance, 54
trade-offs , 91
training set, 122, 197
trapezoidal fuzzy number, 13
triangular conorm, 20
triangular fuzzy number, 12
triangular norm, 19
Tsukamoto’s inference mechanism, 76
week t-norm, 20
weight vector, 120
window type OWA operator, 94
Yager’s t-conorm, 21
Yager’s t-norm, 20, 209
252