Ai Chap 3
Ai Chap 3
Ai Chap 3
UNIT - III
REASONING AND INFERENCE
Reasoning Systems for Categories - Reasoning with Default Information - Non monotonic
reasoning - Fuzzy Logic-Fuzzy rules-fuzzy Inference-Neural Networks-Neuro-fuzzy
Inference- Bayes Rule and its Applications - Bayesian Networks – Hidden Markov Models-
Dempster – Shafer theory.
To help machines perform human-level functions, artificial intelligence and its various
subfields like machine learning, deep learning, natural language processing, and more rely
on reasoning and knowledge representation. Reasoning in artificial intelligence helps
machines think rationally and perform functions like humans.
To better understand how reasoning works in artificial intelligence applications and systems,
let us consider two such examples:
Siri: The cognitive virtual assistant uses reasoning to offer suggestions and
recommendations based on commands. For example, the nearest location, the date for
tomorrow, AM and PM, among other things.
WolframAlpha: This computational knowledge engine relies on reasoning to perform
mathematical computations based on portions of food.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
In short, like humans, machines use reasoning, along with knowledge representation, logic,
and learning for analysis, problem-solving, making conclusions, and more
Types of Reasoning
In artificial intelligence, reasoning can be divided into the following categories:
1. Deductive reasoning
2. Inductive reasoning
3. Abductive reasoning
4. Common Sense Reasoning
5. Monotonic Reasoning
6. Non-monotonic Reasoning
1. Deductive reasoning:
Deductive reasoning is deducing new information from logically related known information.
It is the form of valid reasoning, which means the argument's conclusion must be true when
the premises are true.
Deductive reasoning is a type of propositional logic in AI, and it requires various rules and
facts. It is sometimes referred to as top-down reasoning, and contradictory to inductive
reasoning.
In deductive reasoning, the truth of the premises guarantees the truth of the conclusion.
Deductive reasoning mostly starts from the general premises to the specific conclusion,
which can be explained as below example.
Example:
2. Inductive Reasoning:
Inductive reasoning is a form of reasoning to arrive at a conclusion using limited sets of facts
by the process of generalization. It starts with the series of specific facts or data and reaches
to a general statement or conclusion.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
In inductive reasoning, we use historical data or various premises to generate a generic rule,
for which premises support the conclusion.
In inductive reasoning, premises provide probable supports to the conclusion, so the truth of
premises does not guarantee the truth of the conclusion.
Example:
Premise: All of the pigeons we have seen in the zoo are white.
3. Abductive reasoning:
Abductive reasoning is a form of logical reasoning which starts with single or multiple
observations then seeks to find the most likely explanation or conclusion for the observation.
Example:
Conclusion It is raining.
Common sense reasoning is an informal form of reasoning, which can be gained through
experiences.
Common Sense reasoning simulates the human ability to make presumptions about events
which occurs on every day.
It relies on good judgment rather than exact logic and operates on heuristic
knowledge and heuristic rules.
Example:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
The above two statements are the examples of common sense reasoning which a human mind
can easily understand and assume.
5. Monotonic Reasoning:
In monotonic reasoning, once the conclusion is taken, then it will remain the same even if we
add some other information to existing information in our knowledge base. In monotonic
reasoning, adding knowledge does not decrease the set of prepositions that can be derived.
To solve monotonic problems, we can derive the valid conclusion from the available facts
only, and it will not be affected by new facts.
Monotonic reasoning is not useful for the real-time systems, as in real time, facts get
changed, so we cannot use monotonic reasoning.
Example:
It is a true fact, and it cannot be changed even if we add another sentence in knowledge base
like, "The moon revolves around the earth" Or "Earth is not round," etc.
6. Non-monotonic Reasoning
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Logic will be said as non-monotonic if some conclusions can be invalidated by adding more
knowledge into our knowledge base.
"Human perceptions for various things in daily life, "is a general example of non-monotonic
reasoning.
Example: Let suppose the knowledge base contains the following knowledge:
So from the above sentences, we can conclude that Pitty can fly.
However, if we add one another sentence into knowledge base "Pitty is a penguin", which
concludes "Pitty cannot fly", so it invalidates the above conclusion.
Default reasoning
We have already seen examples of this and possible ways to represent this
knowledge.
Non-Monotonic logic.
Default logic.
DO NOT get confused about the label Non-Monotonic and Default being applied
to reasoning and a particular logic. Non-Monotonic reasoning is generic
descriptions of a class of reasoning. Non-Monotonic logic is a specific theory. The
same goes for Default reasoning and Default logic.
Non-Monotonic Logic
states that for all x is x plays an instrument and if the fact that x can improvise is
consistent with all other knowledge then we can conclude that x is a jazz musician.
to show that fact P is true attempt to prove . If we fail we may say that P is
consistent (since is false).
Now this states that Quakers tend to be pacifists and Republicans tend not to be.
Quaker(Nixon)
Republican(Nixon)
Default Logic
Now this is similar to Non-monotonic logic but there are some distinctions:
New inference rules are used for computing the set of plausible extensions.
So in the Nixon example above Default logic can support both assertions
since is does not say anything about how choose between them -- it will
depend on the inference being made.
In Default logic any nonmonotonic expressions are rules of inference rather
than expressions.
Non-Monotonic Reasoning
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Non-monotonic Reasoning is the process that changes its direction or values as the
knowledge base increases.
So from the above sentence , We can conclude that, crow can fly
however, if we add one another sentence into knowledge base “crow is a dog”, which
conclude “crow can’t fly” , So it invalidate above Conclusion.
2. Circumscription
The reasoning system provides the RSM with information about each infertence it perform,
and it return RMS provides the RS with information about the whole set of inference.
Fuzzy Logic
Fuzzy Logic (FL) is a method of reasoning that resembles human reasoning. The approach of
FL imitates the way of decision making in humans that involves all intermediate possibilities
between digital values YES and NO. In fuzzy logic, the degree of truth is between 0 and 1.
Example: William is smart (0.8 truth)
The fuzzy logic works on the levels of possibilities of input to achieve the definite output.
Fuzzy logic is useful for commercial and practical purposes.
It can control machines and consumer products.
It may not give accurate reasoning, but acceptable reasoning.
Fuzzy logic helps to deal with the uncertainty in engineering.
MP x is Medium Positive
S x is Small
MN x is Medium Negative
LN x is Large Negative
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Here, the input to 5-level fuzzifier varies from -10 volts to +10 volts. Hence the corresponding
output also changes.
Fuzzy Inference
Fuzzy Inference System is the key unit of a fuzzy logic system having decision making as its
primary work. It uses the “IF…THEN” rules along with connectors “OR” or “AND” for
drawing essential decision rules.
Characteristics of Fuzzy Inference System
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Working of FIS
The working of the FIS consists of the following steps −
A fuzzification unit supports the application of numerous fuzzification methods, and
converts the crisp input into fuzzy input.
A knowledge base - collection of rule base and database is formed upon the
conversion of crisp input into fuzzy input.
The defuzzification unit fuzzy input is finally converted into crisp output.
Methods of FIS
Let us now discuss the different methods of FIS. Following are the two important methods
of FIS, having different consequent of fuzzy rules −
This system was proposed in 1975 by Ebhasim Mamdani. Basically, it was anticipated to
control a steam engine and boiler combination by synthesizing a set of fuzzy rules obtained
from people working on the system.
Following steps need to be followed to compute the output from this FIS −
Step 1 − Set of fuzzy rules need to be determined in this step.
Step 2 − In this step, by using input membership function, the input would be made
fuzzy.
Step 3 − Now establish the rule strength by combining the fuzzified inputs according
to fuzzy rules.
Step 4 − In this step, determine the consequent of rule by combining the rule strength
and the output membership function.
Step 5 − For getting output distribution combine all the consequents.
Step 6 − Finally, a defuzzified output distribution is obtained.
Following is a block diagram of Mamdani Fuzzy Interface System.
The fuzzy inference process under Takagi-Sugeno Fuzzy Model (TS Method) works in the
following way −
Step 1: Fuzzifying the inputs − Here, the inputs of the system are made fuzzy.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Step 2: Applying the fuzzy operator − In this step, the fuzzy operators must be
applied to get the output.
Neural Networks
The term "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain. Similar to the human brain that has neurons
interconnected to one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks. These neurons are known as
nodes.
The given figure illustrates the typical diagram of Biological Neural Network.
The typical Artificial Neural Network looks something like the given figure.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Dendrites from Biological Neural Network represent inputs in Artificial Neural Networks,
cell nucleus represents Nodes, synapse represents Weights, and Axon represents Output.
Dendrites Inputs
Synapse Weights
Axon Output
There are around 1000 billion neurons in the human brain. Each neuron has an association
point somewhere in the range of 1,000 and 100,000. In the human brain, data is stored in such
a manner as to be distributed, and we can extract more than one piece of this data when
necessary from our memory parallelly. We can say that the human brain is made up of
incredibly amazing parallel processors.
We can understand the artificial neural network with an example, consider an example of a
digital logic gate that takes an input and gives an output. "OR" gate, which takes two inputs.
If one or both the inputs are "On," then we get "On" in output. If both the inputs are "Off,"
then we get "Off" in output. Here the output depends upon input. Our brain does not perform
the same task. The outputs to inputs relationship keep changing because of the neurons in our
brain, which are "learning."
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the
programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations
to find hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally
results in output that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and
includes a bias. This computation is represented in the form of a transfer function.
Artificial Neural Network can be best represented as a weighted directed graph, where the
artificial neurons form the nodes. The association between the neurons outputs and neuron
inputs can be viewed as the directed edges with weights. The Artificial Neural Network
receives the input signal from the external source in the form of a pattern and image in the
form of a vector. These inputs are then mathematically assigned by the notations x(n) for
every n number of inputs.
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the
details utilized by the artificial neural networks to solve a specific problem ). In general
terms, these weights normally represent the strength of the interconnection between neurons
inside the artificial neural network. All the weighted inputs are summarized inside the
computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or
something else to scale up to the system's response. Bias has the same input, and weight
equals to 1. Here the total of weighted inputs can be in the range of 0 to positive infinity.
Here, to keep the response in the limits of the desired value, a certain maximum value is
benchmarked, and the total of weighted inputs is passed through the activation function.
The activation function refers to the set of transfer functions used to achieve the desired
output. There is a different kind of the activation function, but primarily either linear or non-
linear sets of functions. Some of the commonly used sets of activation functions are the
Binary, linear, and Tan hyperbolic sigmoidal activation functions. Let us take a look at each
of them in details:
Binary:
In binary activation function, the output is either a one or a 0. Here, to accomplish this, there
is a threshold value set up. If the net weighted input of neurons is more than 1, then the final
output of the activation function is returned as one or else the output is returned as 0.
Sigmoidal Hyperbolic:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
The Sigmoidal Hyperbola function is generally seen as an "S" shaped curve. Here the tan
hyperbolic function is used to approximate output from the actual net input. The function is
defined as:
There are various types of Artificial Neural Networks (ANN) depending upon the human
brain neuron and network functions, an artificial neural network similarly performs tasks. The
majority of the artificial neural networks will have some similarities with a more complex
biological partner and are very effective at their expected tasks. For example, segmentation or
classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved
results internally. As per the University of Massachusetts, Lowell Centre for Atmospheric
Research. The feedback networks feed information back into itself and are well suited to
solve optimization issues. The Internal system error corrections utilize feedback ANNs.
Feed-Forward ANN:
Artificial neural networks have a numerical value that can perform more than one task
simultaneously.
Data that is used in traditional programming is stored on the whole network, not on a
database. The disappearance of a couple of pieces of data in one place doesn't prevent the
network from working.
After ANN training, the information may produce output even with inadequate data. The loss
of performance here relies upon the significance of missing data.
For ANN is to be able to adapt, it is important to determine the examples and to encourage
the network according to the desired output by demonstrating these examples to the network.
The succession of the network is directly proportional to the chosen instances, and if the
event can't appear to the network in all its aspects, it can produce false output.
Extortion of one or more cells of ANN does not prohibit it from generating output, and this
feature makes the network fault-tolerance.
There is no particular guideline for determining the structure of artificial neural networks.
The appropriate network structure is accomplished through experience, trial, and error.
It is the most significant issue of ANN. When ANN produces a testing solution, it does not
provide insight concerning why and how. It decreases trust in the network.
Hardware dependence:
Artificial neural networks need processors with parallel processing power, as per their
structure. Therefore, the realization of the equipment is dependent.
ANNs can work with numerical data. Problems must be converted into numerical values
before being introduced to ANN. The presentation mechanism to be resolved here will
directly impact the performance of the network. It relies on the user's abilities.
The network is reduced to a specific value of the error, and this value does not give us
optimum results.
Neuro-fuzzy Inference
Principle-
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Flowchart for Speech recognition to establish fuzzy inference system through ANFIS
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
There are two parts in the ANFIS network structure, namely premise and consequence parts.
This architecture basically has five layers. Fuzzy layer, product layer (π),
normalized layer (N), defuzzy layer and total output layer are the 5 different ANFIS layers.
So, the first layer is a fuzzification layer. It takes the input values and determines
the membership functions belonging to them.
The membership degrees of each function computes using a specific parameter set. The
second layer is the “rule layer”.
So, it is responsible for generating the firing strengths for the rules.
The third layer is to normalize the computed firing strengths, by dividing each value for the
total firing strength.
The fourth layer takes the normalized values and the consequence parameter set as an input .
Thus, the values returns the defuzzification values and those values pass to the last layer to
return the final output.
The
proposed algorithms of the speech recognition using a wavelet transform, subtractive
clustering and ANFIS.
Adaptive
Neuro-Fuzzy Inference in Artificial Intelligence
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
The first layer of an ANFIS architecture describes the difference to a neural network. Neural
networks are operating with a data pre-processing step. Hence, in this step,
the features convert into normalized values between 0 and 1. An ANFIS network doesn’t
need a sigmoid function.
Let’s say, the network gets the distance between two points in the 2d space as input. The
distance measured in pixels have values from 0 up to 500 pixels. So, with the help of
membership function the numerical values converts into Fuzzy numbers. It consists
of semantic descriptions like near value, middle value and far value. An
individual neuron gives each possible value.
Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which
determines the probability of an event with uncertain knowledge.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
In probability theory, it relates the conditional probability and marginal probabilities of two
random events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian
inference is an application of Bayes' theorem, which is fundamental to Bayesian statistics.
Bayes' theorem allows updating the probability prediction of an event by observing new
information of the real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine
the probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with
known event B:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of
most modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability
of hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate
the probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the
evidence
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can
be written as:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Bayes' rule allows us to compute the single term P(B|A) in terms of P(A|B), P(B), and P(A).
This is very useful in cases where we have a good probability of these three terms and want
to determine the fourth one. Suppose we want to perceive the effect of some unknown cause,
and want to compute that cause, then the Bayes' rule becomes:
Example-1:
Question: what is the probability that a patient has diseases meningitis with a stiff
neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to have a stiff neck, and it occurs
80% of the time. He is also aware of some more facts, which are given as follows:
Let a be the proposition that patient has stiff neck and b be the proposition that patient has
meningitis. , so we can calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02
Hence, we can assume that 1 patient out of 750 patients has meningitis disease with a stiff
neck.
Example-2:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Question: From a standard deck of playing cards, a single card is drawn. The
probability that the card is king is 4/52, then calculate posterior probability
P(King|Face), which means the drawn face card is a king card.
Solution:
o It is used to calculate the next step of the robot when the already executed step is
given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
BAYESIAN NETWORK
• A Bayesian network is a probabilistic graphical model that represents a set of variables and
their probabilistic independencies. Otherwise known as Bayes net, Bayesian belief Network
or simply Belief Networks. A Bayesian network specifies a joint distribution in a structured
form. It represent dependencies and independence via a directed graph. Networks of concepts
linked with conditional probabilities.
For eg, evidence says that lab produces 98% accurate results. It means that a person X has
98% malaria or 2% of not having malaria. This factor is called uncertainty factor. This is the reason
that we go for Bayesian theory. Bayesian theory is also known as probability learning.
The probabilities are numeric values between 0 and 1 that represent uncertainties.
i) Simple Bayesian network
p(A,B,C) = p(C|A,B)p(A)p(B)
ii) 3-way Bayesian network (Marginal Independence)
p(A,B,C) = p(B|A)p(C|A)p(A)
B and C are conditionally independent Given A
iv) 3-way Bayesian network (Markov dependence)
Problem 1
You have a new burglar alarm installed. It is reliable about detecting burglary, but responds to minor
earth quakes. Two neighbors (John, Mary) promise to call you at work when they hear the alarm. John
always calls when hears alarm, but confuses with phone ringing. Mary likes lod nusic and sometimes
misses alarm. Find the probability of the event that the alarm has sounded but neither a burglary nor
an earth quake has occurred and both Mary and John call.
Consider 5 binary variables
B=Burglary occurs at your house
E=Earth quake occurs at your home
A=Alarm goes off
J=John calls to report alarm
M=Mary calls to report the alarm
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Probability of the event that the alarm has sounded but neither a burglary nor an earth quake has
occurred and both Mary and John call
P(J,M,A,E,B)=P(J|A).P(M|A).P(A|E,B).P(E).P(B)
=0.90*0.70*0.001*0.99*0.998
=0.00062
Problem 2
Rain influences sprinkler usage. Rain and sprinkler influences whether grass is wet or not. What is the
probability that rain gives grass wet?
Solution
Let S= Sprinkler
R=Rain
G=Grass wet
P(G,S,R)=P(G|S,R).P(S|R).P(R)
=0.99*0.01*0.2
=0.00198
Problem 3
Bayesian Classifier: Training Dataset
Class:
C1:buys_computer = ‘yes’
C2:buys_computer = ‘no’
Data sample
X = (age <=30, Income = medium, Student = yes Credit_rating = Fair)
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
P(X|Ci) :
P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
P(X|Ci)*P(Ci) :
P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007
Therefore, X belongs to class (“buys_computer = yes”)
Problem 4
Did the patient have malignant tumour or not?
A patient takes a lab test and the result comes back positive. The test returns a correct positive
result in only 98% of the cases in which a malignant tumour actually present, and a correct negative
result in only 97% of the cases in which it is not present. Furthermore, o.oo8 of the entire population
have this tumour.
Solution:
P(tumour)=0.008 P(tumour)=0.992
P(+|tumour)=0.98 P(-|tumour)=0.02
P(+|tumour)=0.03 P(-|tumour)=0.97
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
The probability of not having tumour is high. So the person is not having malignant tumour.
When we can not observe the state themselves but only the result of some probability
function(observation) of the states we utilize HMM. HMM is a statistical Markov model in
which the system being modeled is assumed to be a Markov process with unobserved
(hidden) states.
A Markov model is a Stochastic method for randomly changing systems where it is assumed
that future states do not depend on past states. These models show all possible states as well
as the transitions, rate of transitions and probabilities between them.
Markov models are often used to model the probabilities of different states and the rates of
transitions among them. The method is generally used to model systems. Markov models can
also be used to recognize patterns, make predictions and to learn the statistics of sequential
data.
There are four types of Markov models that are used situationally:
Markov chain - used by systems that are autonomous and have fully observable states
Hidden Markov model - used by systems that are autonomous where the state is partially
observable.
Markov decision processes - used by controlled systems with a fully observable state.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Partially observable Markov decision processes - used by controlled systems where the
state is partially observable.
Markov Model: Series of (hidden) states z={z_1,z_2………….} drawn from state alphabet S
Hidden Markov Model: Series of observed output x = {x_1,x_2,………} drawn from an output
alphabet V= {𝑣1, 𝑣2, . . , 𝑣_|𝑣|} where x_i belongs to V
Assumptions of HMM
HMM too is built upon several assumptions and the following is vital.
Eq.5.
Emission Probability Matrix: Probability of hidden state generating output v_i given
that state at the corresponding time was s_j.
Consider the example given below in Fig.3. which elaborates how a person feels on different
climates.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Observed States for four day = {z1=Happy, z2= Grumpy, z3=Grumpy, z4=Happy}
The feeling that you understand from a person emoting is called the observations since
you observe them.
The weather that influences the feeling of a person is called the hidden state since you
can’t observe it.
Emission probabilities
In the above example, feelings (Happy or Grumpy) can be only observed. A person can
observe that a person has an 80% chance to be Happy given that the climate at the particular
point of observation( or rather day in this case) is Sunny. Similarly the 60% chance of a person
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
being Grumpy given that the climate is Rainy. Here mentioned 80% and 60% are Emission
probabilities since they deal with observations.
Transition probabilities
When we consider the climates (hidden states) that influence the observations there are
correlations between consecutive days being Sunny or alternate days being Rainy. There is
80% for the Sunny climate to be in successive days whereas 60% chance for consecutive days
being Rainy. The probabilities that explain the transition to/from hidden states are Transition
probabilities.
3. How can we learn the values for the HMMs parameters A and B given some data?
We have to add up the likelihood of the data x given every possible series of hidden states.
This will lead to a complexity of O(|S|)^T. Hence two alternate procedures were introduced to
find the probability of an observed sequence.
Forward Procedure
Calculate the total probability of all the observations (from t_1 ) up to time t.
Backward Procedure
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Similarly calculate total probability of all the observations from final time (T) to t.
S = {hot,cold}
v = {v1=1 ice cream ,v2=2 ice cream,v3=3 ice cream} where V is the Number of ice creams
consumed on a day.
We first need to calculate the prior probabilities (that is, the probability of being hot or
cold previous to any actual observation). This can be obtained from S_0 or π. From Fig.4. S_0
is provided as 0.6 and 0.4 which are the prior probabilities. Then based on Markov and
HMM assumptions we follow the steps in figures Fig.6, Fig.7. and Fig.8. below to calculate
the probability of a given sequence.
Similarly for x3=v1 and x4=v2, we have to simply multiply the paths that lead to v1 and v2.
For a given observed sequence of outputs _ , we intend to find the most likely series
of states _ . We can understand this with an example found below.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Fig.10. Markov Model as a Finite State Machine from Fig.9. data —Image by Author
The Viterbi algorithm is a dynamic programming algorithm similar to the forward procedure
which is often used to find maximum likelihood. Instead of tracking the total probability of
generating the observations, it tracks the maximum probability and the corresponding state
sequence.
Consider the sequence of emotions : H,H,G,G,G,H for 6 consecutive days. Using the Viterbi
algorithm we will find out the more likelihood of the series.
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Fig.11. The Viterbi algorithm requires to choose the best path —Image by Author
There will be several paths that will lead to sunny for Saturday and many paths that lead to
Rainy Saturday. Here we intend to identify the best path up-to Sunny or Rainy Saturday and
multiply with the transition emission probability of Happy (since Saturday makes the person
feels Happy).
Let's consider A sunny Saturday. The previous day(Friday) can be sunny or rainy. Then we
need to know the best path up-to Friday and then multiply with emission probabilities that lead
to grumpy feeling. Iteratively we need to figure out the best path at each day ending up in
more likelihood of the series of days.
Fig.14. Iterate the algorithm to choose the best path — Image by Author
The algorithm leaves you with maximum likelihood values and we now can produce the
sequence with a maximum likelihood for a given output sequence.
Learning in HMMs involves estimating the state transition probabilities A and the output
emission probabilities B that make an observed sequence most likely. Expectation-
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
Maximization algorithms are used for this purpose. An algorithm is known as Baum-Welch
algorithm, that falls under this category and uses the forward algorithm, is widely used.
Problem 1
• 4 people (B, J, S and K) are locked in a room when the lights go out.
• When the lights come on, K is dead, stabbed with a knife.
• Not suicide (stabbed in the back)
• No-one entered the room.
• Assume only one killer.
• ʘ= { B, J, S}
• P(ʘ) = (Ø, {B}, {J}, {S}, {B,J}, {B,S}, {J,S}, {B,J,S} )
Mass function m(A):
• Detectives, after reviewing the crime-scene, assign mass probabilities to various elements of the
power set:
191AIC403T - ARTIFICIAL INTELLIGENCEUnit-III
The certainty associated with a given subset A is defined by the belief interval: [ bel(A) pl(A) ].
From this we observed that J is a killer.