Ai Chap 3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

UNIT - III
REASONING AND INFERENCE
Reasoning Systems for Categories - Reasoning with Default Information - Non monotonic
reasoning - Fuzzy Logic-Fuzzy rules-fuzzy Inference-Neural Networks-Neuro-fuzzy
Inference- Bayes Rule and its Applications - Bayesian Networks – Hidden Markov Models-
Dempster – Shafer theory.

Reasoning Systems for Categories


Reasoning:
The reasoning is the mental process of deriving logical conclusion and making predictions
from available knowledge, facts, and beliefs. Or we can say, "Reasoning is a way to infer
facts from existing data." It is a general process of thinking rationally, to find valid
conclusions.
In artificial intelligence, the reasoning is essential so that the machine can also think
rationally as a human brain, and can perform like a human.

Reasoning in Artificial Intelligence:

To help machines perform human-level functions, artificial intelligence and its various
subfields like machine learning, deep learning, natural language processing, and more rely
on reasoning and knowledge representation. Reasoning in artificial intelligence helps
machines think rationally and perform functions like humans.

It is an important area of research in AI that helps machines in problem-solving, deriving


logical solutions, and making predictions based on the available information, knowledge,
facts, and data. Additionally, reasoning can be performed either in a formal and informal
manner or top-down or bottom-up approaches, depending on the way machines handle
uncertainties and partial truths. Like Probabilistic Reasoning in Artificial
Intelligence allows machines to deal with and represent uncertain knowledge and
information.

Reasoning, therefore, is the most fundamental capabilities associated with general


intelligence, whether human or artificial, which enables both humans and machines to
generate knowledge not available prior to the act of generation.

Example Of Reasoning In Artificial Intelligence

To better understand how reasoning works in artificial intelligence applications and systems,
let us consider two such examples:
 Siri: The cognitive virtual assistant uses reasoning to offer suggestions and
recommendations based on commands. For example, the nearest location, the date for
tomorrow, AM and PM, among other things.
 WolframAlpha: This computational knowledge engine relies on reasoning to perform
mathematical computations based on portions of food.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

In short, like humans, machines use reasoning, along with knowledge representation, logic,
and learning for analysis, problem-solving, making conclusions, and more

Types of Reasoning
In artificial intelligence, reasoning can be divided into the following categories:
1. Deductive reasoning
2. Inductive reasoning
3. Abductive reasoning
4. Common Sense Reasoning
5. Monotonic Reasoning
6. Non-monotonic Reasoning

1. Deductive reasoning:

Deductive reasoning is deducing new information from logically related known information.
It is the form of valid reasoning, which means the argument's conclusion must be true when
the premises are true.

Deductive reasoning is a type of propositional logic in AI, and it requires various rules and
facts. It is sometimes referred to as top-down reasoning, and contradictory to inductive
reasoning.

In deductive reasoning, the truth of the premises guarantees the truth of the conclusion.

Deductive reasoning mostly starts from the general premises to the specific conclusion,
which can be explained as below example.

Example:

Premise-1: All the human eats veggies

Premise-2: Suresh is human.

Conclusion: Suresh eats veggies.

The general process of deductive reasoning is given below:

2. Inductive Reasoning:

Inductive reasoning is a form of reasoning to arrive at a conclusion using limited sets of facts
by the process of generalization. It starts with the series of specific facts or data and reaches
to a general statement or conclusion.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Inductive reasoning is a type of propositional logic, which is also known as cause-effect


reasoning or bottom-up reasoning.

In inductive reasoning, we use historical data or various premises to generate a generic rule,
for which premises support the conclusion.

In inductive reasoning, premises provide probable supports to the conclusion, so the truth of
premises does not guarantee the truth of the conclusion.

Example:

Premise: All of the pigeons we have seen in the zoo are white.

Conclusion: Therefore, we can expect all the pigeons to be white.

3. Abductive reasoning:

Abductive reasoning is a form of logical reasoning which starts with single or multiple
observations then seeks to find the most likely explanation or conclusion for the observation.

Abductive reasoning is an extension of deductive reasoning, but in abductive reasoning, the


premises do not guarantee the conclusion.

Example:

Implication: Cricket ground is wet if it is raining

Axiom: Cricket ground is wet.

Conclusion It is raining.

4. Common Sense Reasoning

Common sense reasoning is an informal form of reasoning, which can be gained through
experiences.

Common Sense reasoning simulates the human ability to make presumptions about events
which occurs on every day.

It relies on good judgment rather than exact logic and operates on heuristic
knowledge and heuristic rules.

Example:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

1. One person can be at one place at a time.


2. If I put my hand in a fire, then it will burn.

The above two statements are the examples of common sense reasoning which a human mind
can easily understand and assume.

5. Monotonic Reasoning:

In monotonic reasoning, once the conclusion is taken, then it will remain the same even if we
add some other information to existing information in our knowledge base. In monotonic
reasoning, adding knowledge does not decrease the set of prepositions that can be derived.

To solve monotonic problems, we can derive the valid conclusion from the available facts
only, and it will not be affected by new facts.

Monotonic reasoning is not useful for the real-time systems, as in real time, facts get
changed, so we cannot use monotonic reasoning.

Monotonic reasoning is used in conventional reasoning systems, and a logic-based system is


monotonic.

Any theorem proving is an example of monotonic reasoning.

Example:

o Earth revolves around the Sun.

It is a true fact, and it cannot be changed even if we add another sentence in knowledge base
like, "The moon revolves around the earth" Or "Earth is not round," etc.

Advantages of Monotonic Reasoning:

o In monotonic reasoning, each old proof will always remain valid.


o If we deduce some facts from available facts, then it will remain valid for always.

Disadvantages of Monotonic Reasoning:

o We cannot represent the real world scenarios using Monotonic reasoning.


o Hypothesis knowledge cannot be expressed with monotonic reasoning, which means
facts should be true.
o Since we can only derive conclusions from the old proofs, so new knowledge from
the real world cannot be added.

6. Non-monotonic Reasoning
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

In Non-monotonic reasoning, some conclusions may be invalidated if we add some more


information to our knowledge base.

Logic will be said as non-monotonic if some conclusions can be invalidated by adding more
knowledge into our knowledge base.

Non-monotonic reasoning deals with incomplete and uncertain models.

"Human perceptions for various things in daily life, "is a general example of non-monotonic
reasoning.

Example: Let suppose the knowledge base contains the following knowledge:

o Birds can fly


o Penguins cannot fly
o Pitty is a bird

So from the above sentences, we can conclude that Pitty can fly.

However, if we add one another sentence into knowledge base "Pitty is a penguin", which
concludes "Pitty cannot fly", so it invalidates the above conclusion.

Advantages of Non-monotonic reasoning:

o For real-world systems such as Robot navigation, we can use non-monotonic


reasoning.
o In Non-monotonic reasoning, we can choose probabilistic facts or can make
assumptions.

Disadvantages of Non-monotonic Reasoning:

o In non-monotonic reasoning, the old facts may be invalidated by adding new


sentences.
o It cannot be used for theorem proving.

INDUCTIVE REASONING VS. DEDUCTIVE REASONING:


Inductive and deductive reasoning are the two most important and commonly used reasoning
techniques. However, these two are frequently used interchangeably and can be confused by
novices. Therefore, to further highlight their differences, here is a comparison of the two.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

ROLE OF REASONING IN KNOWLEDGE-BASED SYSTEMS:


A prominent form of Artificial Intelligence, Knowledge-based systems use knowledge from
human experts to perform decision making and problem-solving. This is made possible by the
knowledge base and inference engine. Moreover, it uses various reasoning techniques to deal
with uncertainties in data and information, present in the knowledge base.
Among the various types of reasoning, deductive reasoning and non-monotonic reasoning are
used by knowledge-based systems to solve problems. Moreover, it helps with the
implementation of knowledge-based systems and artificial intelligence in machines, enabling
them to perform tasks that require human-level intelligence and mental processes.
Reasoning with Default Information

Default reasoning

This is a very common from of non-monotonic reasoning. Here We want to draw


conclusions based on what is most likely to be true.

We have already seen examples of this and possible ways to represent this
knowledge.

We will discuss two approaches to do this:

 Non-Monotonic logic.
 Default logic.

DO NOT get confused about the label Non-Monotonic and Default being applied
to reasoning and a particular logic. Non-Monotonic reasoning is generic
descriptions of a class of reasoning. Non-Monotonic logic is a specific theory. The
same goes for Default reasoning and Default logic.

Non-Monotonic Logic

This is basically an extension of first-order predicate logic to include


a modal operator, M. The purpose of this is to allow for consistency.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

For example: : plays_instrument(x) improvises(x) jazz_musician(x)

states that for all x is x plays an instrument and if the fact that x can improvise is
consistent with all other knowledge then we can conclude that x is a jazz musician.

How do we define consistency?

One common solution (consistent with PROLOG notation) is

to show that fact P is true attempt to prove . If we fail we may say that P is
consistent (since is false).

However consider the famous set of assertions relating to President Nixon.

: Republican(x) Pacifist(x) Pacifist(x)

: Quaker(x) Pacifist(x) Pacifist(x)

Now this states that Quakers tend to be pacifists and Republicans tend not to be.

BUT Nixon was both a Quaker and a Republican so we could assert:

Quaker(Nixon)

Republican(Nixon)

This now leads to our total knowledge becoming inconsistent.

Default Logic

Default logic introduces a new inference rule:

which states if A is deducible and it is consistent to assume B then conclude C.

Now this is similar to Non-monotonic logic but there are some distinctions:

 New inference rules are used for computing the set of plausible extensions.
So in the Nixon example above Default logic can support both assertions
since is does not say anything about how choose between them -- it will
depend on the inference being made.
 In Default logic any nonmonotonic expressions are rules of inference rather
than expressions.

Non-Monotonic Reasoning
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Non-monotonic Reasoning is the process that changes its direction or values as the
knowledge base increases.

 It is also known as NMR in Artificial Intelligence.


 Non-monotonic Reasoning will increase or decrease based on the condition.
 Since that Non-monotonic Reasoning depends on knowledge and facts, It will change
itself with improving knowledge or facts.
 Example:
 Consider a bowl of water, If we put it on the stove and turn the flame on it
will obviously boil hot and as we will turn off the flame it will cool down
gradually.

Monotonic Reasoning vs Non-monotonic Reasoning

Monotonic Reasoning Non-Monotonic Reasoning

Monotonic Reasoning is the process


which does not change its direction or Non-monotonic Reasoning is the process
can say that it moves in the one which changes its direction or values as the
direction. knowledge base increases.
1

Monotonic Reasoning deals with very


specific type of models, which has valid Non-monotonic reasoning deals with
proofs. incomplete or not known facts.
2

The addition in knowledge will invalidate


The addition in knowledge won’t the previous conclusions and change the
change the result. result.
3

In non-monotonic reasoning, results and


In monotonic reasoning, results are set of prepositions will increase and
always true, therefore, set of decrease based on condition of added
prepositions will only increase. knowledge.
4

Monotonic Reasoning is based on true Non-monotonic Reasoning is based on


facts. assumptions.
5

Abductive Reasoning and Human


Deductive Reasoning is the type of Reasoning is a non-monotonic type of
monotonic reasoning. reasoning.
6
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Non-Monotonic reasoning is deal with incomplete and uncertain models.

Here we discuss general example of non monotonic reasoning.:


Let suppose the knowledge base contain following Knowledge:

– Birds Can Fly


– Dog can’t Fly
– Crow is a Bird

So from the above sentence , We can conclude that, crow can fly

however, if we add one another sentence into knowledge base “crow is a dog”, which
conclude “crow can’t fly” , So it invalidate above Conclusion.

Advantages of Non-monotonic reasoning:


– For real-world systems such as Robot navigation, we can use non-monotonic reasoning.
– In Non-monotonic reasoning, we can choose probabilistic facts or can make assumptions.

Disadvantages of Non-monotonic Reasoning:


– In non-monotonic reasoning, the old facts may be invalidated by adding new sentences.
– It cannot be used for theorem proving.

Types of non-monotonic reasoning:


1 Non-monotonic logic & Default Logic
2 Circumscription
3 truth maintenance systems.
1 Non-monotonic logic & Default Logic :
Default reasoning is very common form of non-monotonic reasoning.
There are two approach : (A) Non-monotonic Logic & (B) Default Logic.

(A) Non-Monotonic Logic :


Non-monotonic logic is predicate logic with one extension called Model Operator M which
means “Consistent with everything we Know.” The purpose of M is to allow consistency.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

(B) Default Logic:


Default logic have new rules: (A : B) / C
where A is known as prerequisite, B is the justification and C as the consequent.

2. Circumscription

Circumscription is a non-monotonic logic to formalize the common sense assumption.


Circumscription is a formalized rules of conjecture (guess) that can be used along with the
rules of inference of first order logic

3. Truth Maintenance System


Reasoning maintenance System (RMS) is a critical part of a reasoning system. it purpose to
assure that inference made by the reasoning system (RS) are valid.

The reasoning system provides the RSM with information about each infertence it perform,
and it return RMS provides the RS with information about the whole set of inference.

Fuzzy Logic

Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of
FL imitates the way of decision making in humans that involves all intermediate possibilities
between digital values YES and NO. In fuzzy logic, the degree of truth is between 0 and 1.
Example: William is smart (0.8 truth)
The fuzzy logic works on the levels of possibilities of input to achieve the definite output.
Fuzzy logic is useful for commercial and practical purposes.
 It can control machines and consumer products.
 It may not give accurate reasoning, but acceptable reasoning.
 Fuzzy logic helps to deal with the uncertainty in engineering.

Fuzzy Logic Systems Architecture


It has four main parts as shown −
 Fuzzification Module − It transforms the system inputs, which are crisp numbers, into fuzzy
sets. It splits the input signal into five steps such as −
LP x is Large Positive

MP x is Medium Positive

S x is Small

MN x is Medium Negative

LN x is Large Negative
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

 Knowledge Base − It stores IF-THEN rules provided by experts.


 Inference Engine − It simulates the human reasoning process by making fuzzy inference on
the inputs and IF-THEN rules.
 Defuzzification Module − It transforms the fuzzy set obtained by the inference engine into a
crisp value.

A membership function for a fuzzy set A on the universe of discourse X is defined as


µA:X→[0,1].Here, each element of X is mapped to a value between 0 and 1. It is called membership
value or degree of membership.
All membership functions for LP, MP, S, MN, and LN are shown as below −

Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the corresponding
output also changes.

Example of a Fuzzy Logic System


Let us consider an air conditioning system with 5-lvel fuzzy logic system. This system adjusts the
temperature of air conditioner by comparing the room temperature and the target temperature value.
Algorithm
 Define linguistic variables and terms.
 Construct membership functions for them.
 Construct knowledge base of rules.
 Convert crisp data into fuzzy data sets using membership functions. (fuzzification)
 Evaluate rules in the rule base. (interface engine)
 Combine results from each rule. (interface engine)
 Convert output data into non-fuzzy values. (defuzzification)
Logic Development
Step 1: Define linguistic variables and terms
Linguistic variables are input and output variables in the form of simple words or sentences. For
room temperature, cold, warm, hot, etc., are linguistic terms.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Temperature (t) = {very-cold, cold, warm, very-warm, hot}


Step 2: Construct membership functions for them
The membership functions of temperature variable are as shown −

Step3: Construct knowledge base rules


Create a matrix of room temperature values versus target temperature values that an air conditioning
system is expected to provide.
RoomTemp. Very_Cold Cold Warm Hot Very_Hot
/Target

Very_Cold No_Change Heat Heat Heat Heat

Cold Cool No_Change Heat Heat Heat

Warm Cool Cool No_Change Heat Heat

Hot Cool Cool Cool No_Change Heat

Very_Hot Cool Cool Cool Cool No_Change


Build a set of rules into the knowledge base in the form of IF-THEN-ELSE structures.
Sr. Condition Action
No.

1 IF temperature=(Cold OR Very_Cold) AND target=Warm THEN Heat

2 IF temperature=(Hot OR Very_Hot) AND target=Warm THEN Cool

3 IF (temperature=Warm) AND (target=Warm) THEN No_Change


Step 4: Obtain fuzzy value
Fuzzy set operations perform evaluation of rules. The operations used for OR and AND are Max and
Min respectively. Combine all results of evaluation to form a final result. This result is a fuzzy value.
Step 5: Perform defuzzification
Defuzzification is then performed according to membership function for output variable.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Application Areas of Fuzzy Logic


Automotive Systems
 Automatic Gearboxes
 Four-Wheel Steering
 Vehicle environment control
Consumer Electronic Goods
 Hi-Fi Systems
 Photocopiers
 Still and Video Cameras
 Television
Domestic Goods
 Microwave Ovens
 Refrigerators
 Toasters
 Vacuum Cleaners
 Washing Machines
Environment Control
 Air Conditioners/Dryers/Heaters
 Humidifiers

Advantages of Fuzzy Logic System


 Mathematical concepts within fuzzy reasoning are very simple.
 Able to modify Fuzzy Logic System by just adding or deleting rules due to flexibility of fuzzy
logic.
 Fuzzy logic Systems can take imprecise, distorted, noisy input information.
 FLSs are easy to construct and understand.
 Fuzzy logic is a solution to complex problems in all fields of life, including medicine, as it
resembles human reasoning and decision making.
Disadvantages of Fuzzy Logic System
 There is no systematic approach to fuzzy system designing.
 They are understandable only when simple.
 They are suitable for the problems which do not need high accuracy.

Fuzzy Inference

Fuzzy Inference System is the key unit of a fuzzy logic system having decision making as its
primary work. It uses the “IF…THEN” rules along with connectors “OR” or “AND” for
drawing essential decision rules.
Characteristics of Fuzzy Inference System
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Following are some characteristics of FIS −


 The output from FIS is always a fuzzy set irrespective of its input which can be fuzzy
or crisp.
 It is necessary to have fuzzy output when it is used as a controller.
 A defuzzification unit would be there with FIS to convert fuzzy variables into crisp
variables.
Functional Blocks of FIS
The following five functional blocks will help you understand the construction of FIS −
 Rule Base − It contains fuzzy IF-THEN rules.
 Database − It defines the membership functions of fuzzy sets used in fuzzy rules.
 Decision-making Unit − It performs operation on rules.
 Fuzzification Interface Unit − It converts the crisp quantities into fuzzy quantities.
 Defuzzification Interface Unit − It converts the fuzzy quantities into crisp
quantities. Following is a block diagram of fuzzy interference system.

Working of FIS
The working of the FIS consists of the following steps −
 A fuzzification unit supports the application of numerous fuzzification methods, and
converts the crisp input into fuzzy input.
 A knowledge base - collection of rule base and database is formed upon the
conversion of crisp input into fuzzy input.
 The defuzzification unit fuzzy input is finally converted into crisp output.
Methods of FIS
Let us now discuss the different methods of FIS. Following are the two important methods
of FIS, having different consequent of fuzzy rules −

Mamdani Fuzzy Inference System


 Takagi-Sugeno Fuzzy Model (TS Method)
Mamdani Fuzzy Inference System
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

This system was proposed in 1975 by Ebhasim Mamdani. Basically, it was anticipated to
control a steam engine and boiler combination by synthesizing a set of fuzzy rules obtained
from people working on the system.

Steps for Computing the Output

Following steps need to be followed to compute the output from this FIS −
 Step 1 − Set of fuzzy rules need to be determined in this step.
 Step 2 − In this step, by using input membership function, the input would be made
fuzzy.
 Step 3 − Now establish the rule strength by combining the fuzzified inputs according
to fuzzy rules.
 Step 4 − In this step, determine the consequent of rule by combining the rule strength
and the output membership function.
 Step 5 − For getting output distribution combine all the consequents.
 Step 6 − Finally, a defuzzified output distribution is obtained.
Following is a block diagram of Mamdani Fuzzy Interface System.

Takagi-Sugeno Fuzzy Model (TS Method)


This model was proposed by Takagi, Sugeno and Kang in 1985. Format of this rule is given
as −
IF x is A and y is B THEN Z = f(x,y)
Here, AB are fuzzy sets in antecedents and z = f(x,y) is a crisp function in the consequent.

Fuzzy Inference Process

The fuzzy inference process under Takagi-Sugeno Fuzzy Model (TS Method) works in the
following way −
 Step 1: Fuzzifying the inputs − Here, the inputs of the system are made fuzzy.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

 Step 2: Applying the fuzzy operator − In this step, the fuzzy operators must be
applied to get the output.

Rule Format of the Sugeno Form

The rule format of Sugeno form is given by −


if 7 = x and 9 = y then output is z = ax+by+c
Comparison between the two methods
Let us now understand the comparison between the Mamdani System and the Sugeno
Model.
 Output Membership Function − The main difference between them is on the basis
of output membership function. The Sugeno output membership functions are either
linear or constant.
 Aggregation and Defuzzification Procedure − The difference between them also
lies in the consequence of fuzzy rules and due to the same their aggregation and
defuzzification procedure also differs.
 Mathematical Rules − More mathematical rules exist for the Sugeno rule than the
Mamdani rule.
 Adjustable Parameters − The Sugeno controller has more adjustable parameters
than the Mamdani controller.

Neural Networks

The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known as
nodes.

The given figure illustrates the typical diagram of Biological Neural Network.

The typical Artificial Neural Network looks something like the given figure.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.

Relationship between Biological neural network and artificial neural network:

Biological Neural Network Artificial Neural Network

Dendrites Inputs

Cell nucleus Nodes

Synapse Weights

Axon Output

An Artificial Neural Network in the field of Artificial intelligence where it attempts to


mimic the network of neurons makes up a human brain so that computers will have an option
to understand things and make decisions in a human-like manner. The artificial neural
network is designed by programming computers to behave simply like interconnected brain
cells.

There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.

We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs.
If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform
the same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

The architecture of an artificial neural network:

To understand the concept of the architecture of an artificial neural network, we have to


understand what a neural network consists of. In order to define a neural network that
consists of a large number of artificial neurons, which are termed units arranged in a
sequence of layers. Lets us look at various types of layers available in an artificial neural
network.

Artificial Neural Network primarily consists of three layers:

Input Layer:

As the name suggests, it accepts inputs in several different formats provided by the
programmer.

Hidden Layer:

The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.

Output Layer:

The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.

The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.

It determines weighted total is passed as an input to an activation function to produce the


output. Activation functions choose whether a node should fire or not. Only those who are
fired make it to the output layer. There are distinctive activation functions available that can
be applied upon the sort of task we are performing.

How do artificial neural networks work?


191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network
receives the input signal from the external source in the form of a pattern and image in the
form of a vector. These inputs are then mathematically assigned by the notations x(n) for
every n number of inputs.

Afterward, each of the input is multiplied by its corresponding weights ( these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general
terms, these weights normally represent the strength of the interconnection between neurons
inside the artificial neural network. All the weighted inputs are summarized inside the
computing unit.

If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.

The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions. Some of the commonly used sets of activation functions are the
Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at each
of them in details:

Binary:

In binary activation function, the output is either a one or a 0. Here, to accomplish this, there
is a threshold value set up. If the net weighted input of neurons is more than 1, then the final
output of the activation function is returned as one or else the output is returned as 0.

Sigmoidal Hyperbolic:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function is
defined as:

F(x) = (1/1 + exp(-????x))

Where ???? is considered the Steepness parameter.

Types of Artificial Neural Network:

There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.

Feedback ANN:

In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.

Feed-Forward ANN:

A feed-forward network is a basic neural network comprising of an input layer, an output


layer, and at least one layer of a neuron. Through assessment of its output by reviewing its
input, the intensity of the network can be noticed based on group behavior of the associated
neurons, and the output is decided. The primary advantage of this network is that it figures
out how to evaluate and recognize input patterns.

Advantages of Artificial Neural Network (ANN)

Parallel processing capability:

Artificial neural networks have a numerical value that can perform more than one task
simultaneously.

Storing data on the entire network:

Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.

Capability to work with incomplete knowledge:


191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.

Having a memory distribution:

For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.
The succession of the network is directly proportional to the chosen instances, and if the
event can't appear to the network in all its aspects, it can produce false output.

Having fault tolerance:

Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.

Disadvantages of Artificial Neural Network:

Assurance of proper network structure:

There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.

Unrecognized behavior of the network:

It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.

Hardware dependence:

Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.

Difficulty of showing the issue to the network:

ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.

The duration of the network is unknown:

The network is reduced to a specific value of the error, and this value does not give us
optimum results.

Applications of Neural Networks


They can perform tasks that are easy for a human but difficult for a machine −
 Aerospace − Autopilot aircrafts, aircraft fault detection.
 Automotive − Automobile guidance systems.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

 Military − Weapon orientation and steering, target tracking, object discrimination,


facial recognition, signal/image identification.
 Electronics − Code sequence prediction, IC chip layout, chip failure analysis,
machine vision, voice synthesis.
 Financial − Real estate appraisal, loan advisor, mortgage screening, corporate bond
rating, portfolio trading program, corporate financial analysis, currency value
prediction, document readers, credit application evaluators.
 Industrial − Manufacturing process control, product design and analysis, quality
inspection systems, welding quality analysis, paper quality prediction, chemical
product design analysis, dynamic modeling of chemical process systems, machine
maintenance analysis, project bidding, planning, and management.
 Medical − Cancer cell analysis, EEG and ECG analysis, prosthetic design, transplant
time optimizer.
 Speech − Speech recognition, speech classification, text to speech conversion.
 Telecommunications − Image and data compression, automated information
services, real-time spoken language translation.
 Transportation − Truck Brake system diagnosis, vehicle scheduling, routing
systems.
 Software − Pattern Recognition in facial recognition, optical character recognition,
etc.
 Time Series Prediction − ANNs are used to make predictions on stocks and natural
calamities.
 Signal Processing − Neural networks can be trained to process an audio signal and
filter it appropriately in the hearing aids.
 Control − ANNs are often used to make steering decisions of physical vehicles.
 Anomaly Detection − As ANNs are expert at recognizing patterns, they can also be
trained to generate an output when something unusual occurs that misfits the pattern.

Neuro-fuzzy Inference

An adaptive neuro-fuzzy inference system (ANFIS) is an application of adaptive neuro- fuzzy


logics that uses a framework of Artificial Intelligence (AI). Hence, its inference system
corresponds to a set of IF–THEN rules in fuzzy neural networks. It has learning capability to
approximate nonlinear functions.

Hence, adaptive neuro-fuzzy inference system (ANFIS) deserves to consider to be a universal


estimator.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Application of Adaptive Neuro-Fuzzy Inference System in Artificial Intelligence

Principle-
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Flowchart for Speech recognition to establish fuzzy inference system through ANFIS
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

using a wavelet transform


191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Adaptive Neuro-Fuzzy Inference System architecture


191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

There are two parts in the ANFIS network structure, namely premise and consequence parts.
This architecture basically has five layers. Fuzzy layer, product layer (π),
normalized layer (N), defuzzy layer and total output layer are the 5 different ANFIS layers.
So, the first layer is a fuzzification layer. It takes the input values and determines
the membership functions belonging to them.

The membership degrees of each function computes using a specific parameter set. The
second layer is the “rule layer”.

So, it is responsible for generating the firing strengths for the rules.

The third layer is to normalize the computed firing strengths, by dividing each value for the
total firing strength.

The fourth layer takes the normalized values and the consequence parameter set as an input .
Thus, the values returns the defuzzification values and those values pass to the last layer to
return the final output.

The
proposed algorithms of the speech recognition using a wavelet transform, subtractive
clustering and ANFIS.

Adaptive
Neuro-Fuzzy Inference in Artificial Intelligence
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

ANFIS layer- Fuzzification layer

The first layer of an ANFIS architecture describes the difference to a neural network. Neural
networks are operating with a data pre-processing step. Hence, in this step,
the features convert into normalized values between 0 and 1. An ANFIS network doesn’t
need a sigmoid function.

Let’s take an example:

Let’s say, the network gets the distance between two points in the 2d space as input. The
distance measured in pixels have values from 0 up to 500 pixels. So, with the help of
membership function the numerical values converts into Fuzzy numbers. It consists
of semantic descriptions like near value, middle value and far value. An
individual neuron gives each possible value.

Adaptive Neuro-Fuzzy Inference System Analysis (Pros and Cons)-

Bayes Rule and its Applications

Bayes' theorem:

Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

In probability theory, it relates the conditional probability and marginal probabilities of two
random events.

Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian
inference is an application of Bayes' theorem, which is fundamental to Bayesian statistics.

It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).

Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.

Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine
the probability of cancer more accurately with the help of age.

Bayes' theorem can be derived using product rule and conditional probability of event A with
known event B:

As from product rule we can write:

1. P(A ⋀ B)= P(A|B) P(B) or

Similarly, the probability of event B with known event A:

1. P(A ⋀ B)= P(B|A) P(A)

Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of
most modern AI systems for probabilistic inference.

It shows the simple relationship between joint and conditional probabilities. Here,

P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability
of hypothesis A when we have occurred an evidence B.

P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate
the probability of evidence.

P(A) is called the prior probability, probability of hypothesis before considering the
evidence

P(B) is called marginal probability, pure probability of an evidence.

In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can
be written as:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.

Applying Bayes' rule:

Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and want
to determine the fourth one. Suppose we want to perceive the effect of some unknown cause,
and want to compute that cause, then the Bayes' rule becomes:

Example-1:

Question: what is the probability that a patient has diseases meningitis with a stiff
neck?

Given Data:

A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:

o The Known probability that a patient has meningitis disease is 1/30,000.


o The Known probability that a patient has a stiff neck is 2%.

Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis. , so we can calculate the following as:

P(a|b) = 0.8

P(b) = 1/30000

P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff
neck.

Example-2:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Question: From a standard deck of playing cards, a single card is drawn. The
probability that the card is king is 4/52, then calculate posterior probability
P(King|Face), which means the drawn face card is a king card.

Solution:

P(king): probability that the card is King= 4/52= 1/13

P(face): probability that a card is a face card= 3/13

P(Face|King): probability of face card when we assume it is a king = 1

Putting all values in equation (i) we will get:

Application of Bayes' theorem in Artificial intelligence:

Following are some applications of Bayes' theorem:

o It is used to calculate the next step of the robot when the already executed step is
given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.

BAYESIAN NETWORK
• A Bayesian network is a probabilistic graphical model that represents a set of variables and
their probabilistic independencies. Otherwise known as Bayes net, Bayesian belief Network
or simply Belief Networks. A Bayesian network specifies a joint distribution in a structured
form. It represent dependencies and independence via a directed graph. Networks of concepts
linked with conditional probabilities.

• Bayesian network consists of


– Nodes = random variables
– Edges = direct dependence
• Directed edges => direct dependence
• Absence of an edge => conditional independence
• Requires that graph is acyclic (no directed cycles)
• 2 components to a Bayesian network
– The graph structure (conditional independence assumptions)
– The numerical probabilities (for each variable given its parents)
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

For eg, evidence says that lab produces 98% accurate results. It means that a person X has
98% malaria or 2% of not having malaria. This factor is called uncertainty factor. This is the reason
that we go for Bayesian theory. Bayesian theory is also known as probability learning.

The probabilities are numeric values between 0 and 1 that represent uncertainties.
i) Simple Bayesian network

p(A,B,C) = p(C|A,B)p(A)p(B)
ii) 3-way Bayesian network (Marginal Independence)

p(A,B,C) = p(A) p(B) p(C)


iii) 3-way Bayesian network (Conditionally independent effects)

p(A,B,C) = p(B|A)p(C|A)p(A)
B and C are conditionally independent Given A
iv) 3-way Bayesian network (Markov dependence)

p(A,B,C) = p(C|B) p(B|A)p(A)

Problem 1
You have a new burglar alarm installed. It is reliable about detecting burglary, but responds to minor
earth quakes. Two neighbors (John, Mary) promise to call you at work when they hear the alarm. John
always calls when hears alarm, but confuses with phone ringing. Mary likes lod nusic and sometimes
misses alarm. Find the probability of the event that the alarm has sounded but neither a burglary nor
an earth quake has occurred and both Mary and John call.
Consider 5 binary variables
B=Burglary occurs at your house
E=Earth quake occurs at your home
A=Alarm goes off
J=John calls to report alarm
M=Mary calls to report the alarm
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Probability of the event that the alarm has sounded but neither a burglary nor an earth quake has
occurred and both Mary and John call
P(J,M,A,E,B)=P(J|A).P(M|A).P(A|E,B).P(E).P(B)
=0.90*0.70*0.001*0.99*0.998
=0.00062
Problem 2
Rain influences sprinkler usage. Rain and sprinkler influences whether grass is wet or not. What is the
probability that rain gives grass wet?

Solution
Let S= Sprinkler
R=Rain
G=Grass wet
P(G,S,R)=P(G|S,R).P(S|R).P(R)
=0.99*0.01*0.2
=0.00198
Problem 3
Bayesian Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

age income student credit_rating ys_compu


<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Solution
• P(Ci):
P(buys_computer = “yes”) = 9/14 = 0.643
P(buys_computer = “no”) = 5/14= 0.357
• Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4
• X = (age <= 30 , income = medium, student = yes, credit_rating = fair)

P(X|Ci) :
P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) :
P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)

Problem 4
Did the patient have malignant tumour or not?
A patient takes a lab test and the result comes back positive. The test returns a correct positive
result in only 98% of the cases in which a malignant tumour actually present, and a correct negative
result in only 97% of the cases in which it is not present. Furthermore, o.oo8 of the entire population
have this tumour.
Solution:
P(tumour)=0.008 P(tumour)=0.992
P(+|tumour)=0.98 P(-|tumour)=0.02
P(+|tumour)=0.03 P(-|tumour)=0.97
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

The probability of not having tumour is high. So the person is not having malignant tumour.

Hidden Markov Model (HMM)

When we can not observe the state themselves but only the result of some probability
function(observation) of the states we utilize HMM. HMM is a statistical Markov model in
which the system being modeled is assumed to be a Markov process with unobserved

(hidden) states.

A Markov model is a Stochastic method for randomly changing systems where it is assumed
that future states do not depend on past states. These models show all possible states as well
as the transitions, rate of transitions and probabilities between them.

Markov models are often used to model the probabilities of different states and the rates of
transitions among them. The method is generally used to model systems. Markov models can
also be used to recognize patterns, make predictions and to learn the statistics of sequential
data.

There are four types of Markov models that are used situationally:

 Markov chain - used by systems that are autonomous and have fully observable states

 Hidden Markov model - used by systems that are autonomous where the state is partially
observable.

 Markov decision processes - used by controlled systems with a fully observable state.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

 Partially observable Markov decision processes - used by controlled systems where the
state is partially observable.

Markov Model: Series of (hidden) states z={z_1,z_2………….} drawn from state alphabet S

={s_1,s_2,…….𝑠_|𝑆|} where z_i belongs to S.

Hidden Markov Model: Series of observed output x = {x_1,x_2,………} drawn from an output
alphabet V= {𝑣1, 𝑣2, . . , 𝑣_|𝑣|} where x_i belongs to V

Assumptions of HMM

HMM too is built upon several assumptions and the following is vital.

 Output independence assumption: Output observation is conditionally independent of


all other hidden states and all other observations when given the current hidden state.

Eq.5.

 Emission Probability Matrix: Probability of hidden state generating output v_i given
that state at the corresponding time was s_j.

Hidden Markov Model as a finite state machine

Consider the example given below in Fig.3. which elaborates how a person feels on different
climates.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Fig.3. Markov Model as Finite State Machine — Image by Author

Set of states (S) = {Happy, Grumpy}

Set of hidden states (Q) = {Sunny , Rainy}

State series over time = z∈ S_T

Observed States for four day = {z1=Happy, z2= Grumpy, z3=Grumpy, z4=Happy}

 The feeling that you understand from a person emoting is called the observations since
you observe them.

 The weather that influences the feeling of a person is called the hidden state since you
can’t observe it.

Emission probabilities

In the above example, feelings (Happy or Grumpy) can be only observed. A person can
observe that a person has an 80% chance to be Happy given that the climate at the particular
point of observation( or rather day in this case) is Sunny. Similarly the 60% chance of a person
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

being Grumpy given that the climate is Rainy. Here mentioned 80% and 60% are Emission
probabilities since they deal with observations.

Transition probabilities

When we consider the climates (hidden states) that influence the observations there are
correlations between consecutive days being Sunny or alternate days being Rainy. There is
80% for the Sunny climate to be in successive days whereas 60% chance for consecutive days
being Rainy. The probabilities that explain the transition to/from hidden states are Transition
probabilities.

Three important questions in HMM are

1. What is the probability of an observed sequence?

2. What is the most likely series of states to generate an observed sequence?

3. How can we learn the values for the HMMs parameters A and B given some data?

1. Probability of Observed Sequence

We have to add up the likelihood of the data x given every possible series of hidden states.
This will lead to a complexity of O(|S|)^T. Hence two alternate procedures were introduced to
find the probability of an observed sequence.

 Forward Procedure

Calculate the total probability of all the observations (from t_1 ) up to time t.

𝛼_𝑖 (𝑡) = 𝑃(𝑥_1 , 𝑥_2 , … , 𝑥_𝑡, 𝑧_𝑡 = 𝑠_𝑖; 𝐴, 𝐵)

 Backward Procedure
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Similarly calculate total probability of all the observations from final time (T) to t.

𝛽_i (t) = P(x_T , x_T-1 , …, x_t+1 , z_t= s_i ; A, B)

Example using forward procedure

S = {hot,cold}

v = {v1=1 ice cream ,v2=2 ice cream,v3=3 ice cream} where V is the Number of ice creams
consumed on a day.

Example Sequence = {x1=v2,x2=v3,x3=v1,x4=v2}

Fig.4. Given data as matrices — Image by Author


191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Fig.5. Generated Finite state machines for HMM — Image by Author

We first need to calculate the prior probabilities (that is, the probability of being hot or
cold previous to any actual observation). This can be obtained from S_0 or π. From Fig.4. S_0
is provided as 0.6 and 0.4 which are the prior probabilities. Then based on Markov and
HMM assumptions we follow the steps in figures Fig.6, Fig.7. and Fig.8. below to calculate
the probability of a given sequence.

1. For first observed output x1=v2

Fig.6. Step 1 — Image by Author

2. for observed output x2=v3


191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Fig.7. Step 2 — Image by Author

3. for observed output x3 and x4

Similarly for x3=v1 and x4=v2, we have to simply multiply the paths that lead to v1 and v2.

Fig.8. Step 3 and 4 —Image by Author

2. Maximum likelihood assignment

For a given observed sequence of outputs _ , we intend to find the most likely series
of states _ . We can understand this with an example found below.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Fig.9. Data for example 2 — Image by Author

Fig.10. Markov Model as a Finite State Machine from Fig.9. data —Image by Author

The Viterbi algorithm is a dynamic programming algorithm similar to the forward procedure
which is often used to find maximum likelihood. Instead of tracking the total probability of
generating the observations, it tracks the maximum probability and the corresponding state
sequence.

Consider the sequence of emotions : H,H,G,G,G,H for 6 consecutive days. Using the Viterbi
algorithm we will find out the more likelihood of the series.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Fig.11. The Viterbi algorithm requires to choose the best path —Image by Author

There will be several paths that will lead to sunny for Saturday and many paths that lead to
Rainy Saturday. Here we intend to identify the best path up-to Sunny or Rainy Saturday and
multiply with the transition emission probability of Happy (since Saturday makes the person
feels Happy).

Let's consider A sunny Saturday. The previous day(Friday) can be sunny or rainy. Then we
need to know the best path up-to Friday and then multiply with emission probabilities that lead
to grumpy feeling. Iteratively we need to figure out the best path at each day ending up in
more likelihood of the series of days.

Fig.12. Step-1 — Image by Author


191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Fig.13. Step-2 — Image by Author

Fig.14. Iterate the algorithm to choose the best path — Image by Author

The algorithm leaves you with maximum likelihood values and we now can produce the
sequence with a maximum likelihood for a given output sequence.

3. Learn the values for the HMMs parameters A and B

Learning in HMMs involves estimating the state transition probabilities A and the output
emission probabilities B that make an observed sequence most likely. Expectation-
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

Maximization algorithms are used for this purpose. An algorithm is known as Baum-Welch
algorithm, that falls under this category and uses the forward algorithm, is widely used.

DEMPSTER - SHAFER THEORY


• This means that it is possible to believe that something could be both true and false to some
degree
Dempster-Shafer theory is an approach to combining evidence. Dempster (1967) developed
means for combining degrees of belief derived from independent items of evidence.
Each fact has a degree of support, between 0 and 1:
– 0 No support for the fact
– 1 full support for the fact
• Differs from Bayesian approach in that:
– Belief in a fact and its negation need not sum to 1.
– Both values can be 0 (meaning no evidence for or against the fact)

Set of possible conclusions:


ʘ = {ɵ1, ɵ2,…, ɵn}
– ʘ is the set of possible conclusions to be drawn
– Each ɵi is mutually exclusive: at most one has to be true.
– ʘ is Exhaustive: At least one ɵi has to be true.
Mass function m(A):
(where A is a member of the power set)= proportion of all evidence that supports this element of the
power set.
Each m(A) is between 0 and 1.
• All m(A) sum to 1.
• m(Ø) is 0 - at least one must be true.
Belief in A:
The belief in an element A of the Power set is the sum of the masses of elements which are subsets of
A (including A itself).
A={q1, q2, q3}
Bel(A) = m(q1)+m(q2)+m(q3) + m({q1, q2})+m({q2, q3})+m({q1, q3})+m({q1, q2, q3})
Plausibility of A: pl(A)
The plausibility of an element A, pl(A), is the sum of all the masses of the sets that intersect with the
set A:
pl({B,J}) = m(B)+m(J)+m(B,J)+m(B,S)+m(J,S)+m(B,J,S)

Problem 1
• 4 people (B, J, S and K) are locked in a room when the lights go out.
• When the lights come on, K is dead, stabbed with a knife.
• Not suicide (stabbed in the back)
• No-one entered the room.
• Assume only one killer.
• ʘ= { B, J, S}
• P(ʘ) = (Ø, {B}, {J}, {S}, {B,J}, {B,S}, {J,S}, {B,J,S} )
Mass function m(A):
• Detectives, after reviewing the crime-scene, assign mass probabilities to various elements of the
power set:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III

The certainty associated with a given subset A is defined by the belief interval: [ bel(A) pl(A) ].
From this we observed that J is a killer.

You might also like