0% found this document useful (0 votes)
39 views48 pages

Unit 4

The document discusses different types of reasoning used in artificial intelligence including deductive, inductive, abductive, common sense, monotonic, and non-monotonic reasoning. It provides examples and comparisons of deductive vs inductive reasoning and monotonic vs non-monotonic reasoning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views48 pages

Unit 4

The document discusses different types of reasoning used in artificial intelligence including deductive, inductive, abductive, common sense, monotonic, and non-monotonic reasoning. It provides examples and comparisons of deductive vs inductive reasoning and monotonic vs non-monotonic reasoning.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Unit-4

Uncertain Knowledge

Uncertain data: Data that is missing, unreliable, inconsistent


or noisy

Uncertain knowledge: When the available knowledge has


multiple causes leading to multiple effects or incomplete
knowledge of causality in the domain

Uncertain knowledge representation: The representations


which provides a restricted model of the real system, or has
limited expressiveness

Inference: In case of incomplete or default reasoning


methods, conclusions drawn might not be completely
accurate. Let’s understand this better with the help of an
example.

Reasoning

The reasoning is the mental process of deriving logical


conclusion and making predictions from available knowledge,
facts, and beliefs. Or we can say, "Reasoning is a way to infer
facts from existing data." It is a general process of thinking
rationally, to find valid conclusions.
In artificial intelligence, the reasoning is essential so that the
machine can also think rationally as a human brain, and can
perform like a human.
Types of Reasoning

In artificial intelligence, reasoning can be divided into the


following categories:
o Deductive reasoning
o Inductive reasoning
o Abductive reasoning
o Common Sense Reasoning
o Monotonic Reasoning
o Non-monotonic Reasoning

1. Deductive reasoning:

Deductive reasoning is deducing new information from


logically related known information. It is the form of valid
reasoning, which means the argument's conclusion must be
true when the premises are true.
Deductive reasoning is a type of propositional logic in AI, and
it requires various rules and facts. It is sometimes referred to
as top-down reasoning, and contradictory to inductive
reasoning.
In deductive reasoning, the truth of the premises guarantees
the truth of the conclusion.
Deductive reasoning mostly starts from the general premises
to the specific conclusion, which can be explained as below
example.
Example:
Premise-1: All the human eats veggies
Premise-2: Suresh is human.
Conclusion: Suresh eats veggies.
The general process of deductive reasoning is given below:

2. Inductive Reasoning:
Inductive reasoning is a form of reasoning to arrive at a
conclusion using limited sets of facts by the process of
generalization. It starts with the series of specific facts or data
and reaches to a general statement or conclusion.
Inductive reasoning is a type of propositional logic, which is
also known as cause-effect reasoning or bottom-up
reasoning.
In inductive reasoning, we use historical data or various
premises to generate a generic rule, for which premises
support the conclusion.
In inductive reasoning, premises provide probable supports
to the conclusion, so the truth of premises does not
guarantee the truth of the conclusion.
Example:
Premise: All of the pigeons we have seen in the zoo are
white.
Conclusion: Therefore, we can expect all the pigeons to be
white.
3. Abductive reasoning:
Abductive reasoning is a form of logical reasoning which
starts with single or multiple observations then seeks to find
the most likely explanation or conclusion for the observation.
Abductive reasoning is an extension of deductive reasoning,
but in abductive reasoning, the premises do not guarantee
the conclusion.
Example:
Implication: Cricket ground is wet if it is raining
Axiom: Cricket ground is wet.
Conclusion It is raining.
4. Common Sense Reasoning
Common sense reasoning is an informal form of reasoning,
which can be gained through experiences.
Common Sense reasoning simulates the human ability to
make presumptions about events which occurs on every day.
It relies on good judgment rather than exact logic and
operates on heuristic knowledge and heuristic rules.
Example:
1. One person can be at one place at a time.
2. If I put my hand in a fire, then it will burn.
The above two statements are the examples of common
sense reasoning which a human mind can easily understand
and assume.
5. Monotonic Reasoning:
In monotonic reasoning, once the conclusion is taken, then it
will remain the same even if we add some other information
to existing information in our knowledge base. In monotonic
reasoning, adding knowledge does not decrease the set of
prepositions that can be derived.
To solve monotonic problems, we can derive the valid
conclusion from the available facts only, and it will not be
affected by new facts.
Monotonic reasoning is not useful for the real-time systems,
as in real time, facts get changed, so we cannot use
monotonic reasoning.
Monotonic reasoning is used in conventional reasoning
systems, and a logic-based system is monotonic.
Any theorem proving is an example of monotonic reasoning.
Example:
o Earth revolves around the Sun.

It is a true fact, and it cannot be changed even if we add


another sentence in knowledge base like, "The moon
revolves around the earth" Or "Earth is not round," etc.
Advantages of Monotonic Reasoning:
o In monotonic reasoning, each old proof will always
remain valid.
o If we deduce some facts from available facts, then it will
remain valid for always.

Disadvantages of Monotonic Reasoning:


o We cannot represent the real world scenarios using
Monotonic reasoning.
o Hypothesis knowledge cannot be expressed with
monotonic reasoning, which means facts should be true.
o Since we can only derive conclusions from the old
proofs, so new knowledge from the real world cannot be
added.

6. Non-monotonic Reasoning
In Non-monotonic reasoning, some conclusions may be
invalidated if we add some more information to our
knowledge base.
Logic will be said as non-monotonic if some conclusions can
be invalidated by adding more knowledge into our knowledge
base.
Non-monotonic reasoning deals with incomplete and
uncertain models.
"Human perceptions for various things in daily life, "is a
general example of non-monotonic reasoning.
Example: Let suppose the knowledge base contains the
following knowledge:
o Birds can fly
o Penguins cannot fly
o Pitty is a bird
So from the above sentences, we can conclude that Pitty can
fly.
However, if we add one another sentence into knowledge
base "Pitty is a penguin", which concludes "Pitty cannot fly",
so it invalidates the above conclusion.
Advantages of Non-monotonic reasoning:
o For real-world systems such as Robot navigation, we can
use non-monotonic reasoning.
o In Non-monotonic reasoning, we can choose
probabilistic facts or can make assumptions.

Disadvantages of Non-monotonic Reasoning:


o In non-monotonic reasoning, the old facts may be
invalidated by adding new sentences.
o It cannot be used for theorem proving.

Difference between Inductive and Deductive reasoning


Basis for Deductive Reasoning Inductive Reasoning
comparison

Definition Deductive reasoning is the Inductive reasoning arrives at a


form of valid reasoning, to conclusion by the process of
deduce new information or generalization using specific facts or
conclusion from known data.
related facts and
information.

Approach Deductive reasoning Inductive reasoning follows a


follows a top-down bottom-up approach.
approach.

Starts from Deductive reasoning starts Inductive reasoning starts from the
from Premises. Conclusion.

Validity In deductive reasoning In inductive reasoning, the truth of


conclusion must be true if premises does not guarantee the
the premises are true. truth of conclusions.
Usage Use of deductive reasoning Use of inductive reasoning is fast
is difficult, as we need facts and easy, as we need evidence
which must be true. instead of true facts. We often use it
in our daily life.

Process Theory→ hypothesis→ Observations-→patterns→hypothesi


patterns→confirmation. s→Theory.

Argument In deductive reasoning, In inductive reasoning, arguments


arguments may be valid or may be weak or strong.
invalid.

Structure Deductive reasoning Inductive reasoning reaches from


reaches from general facts specific facts to general facts.
to specific facts.

Monotonic Reasoning vs Non-monotonic Reasoning

Monotonic Reasoning Non-Monotonic Reasoning

Monotonic Reasoning is
Non-monotonic Reasoning is
the process which does
the process which changes
not change its direction or
its direction or values as the
can say that it moves in
knowledge base increases.
1 the one direction.

Monotonic Reasoning
Non-monotonic reasoning
deals with very specific
deals with incomplete or not
type of models, which has
known facts.
2 valid proofs.
Monotonic Reasoning Non-Monotonic Reasoning

The addition in knowledge


The addition in knowledge will invalidate the previous
won’t change the result. conclusions and change the
3 result.

In non-monotonic
In monotonic reasoning,
reasoning, results and set of
results are always true,
prepositions will increase
therefore, set of
and decrease based on
prepositions will only
condition of added
increase.
4 knowledge.

Monotonic Reasoning is Non-monotonic Reasoning is


5 based on true facts. based on assumptions.

Abductive Reasoning and


Deductive Reasoning is
Human Reasoning is a
the type of monotonic
non-monotonic type of
reasoning.
6 reasoning.

Forward Chaining and backward chaining


In artificial intelligence, forward and backward chaining is one
of the important topics, but before understanding forward
and backward chaining lets first understand that from where
these two terms came.
Inference engine:
The inference engine is the component of the intelligent
system in artificial intelligence, which applies logical rules to
the knowledge base to infer new information from known
facts. The first inference engine was part of the expert
system. Inference engine commonly proceeds in two modes,
which are:
a. Forward chaining
a. Backward chaining

Horn Clause and Definite clause:


Horn clause and definite clause are the forms of sentences,
which enables knowledge base to use a more restricted and
efficient inference algorithm. Logical inference algorithms use
forward and backward chaining approaches, which require KB
in the form of the first-order definite clause.
Definite clause: A clause which is a disjunction of literals
with exactly one positive literal is known as a definite clause
or strict horn clause.
Horn clause: A clause which is a disjunction of literals with at
most one positive literal is known as horn clause. Hence all
the definite clauses are horn clauses.
Example: (¬ p V ¬ q V k). It has only one positive literal k.
It is equivalent to p ∧ q → k.

A. Forward Chaining
Forward chaining is also known as a forward deduction or
forward reasoning method when using an inference engine.
Forward chaining is a form of reasoning which start with
atomic sentences in the knowledge base and applies
inference rules (Modus Ponens) in the forward direction to
extract more data until a goal is reached.
The Forward-chaining algorithm starts from known facts,
triggers all rules whose premises are satisfied, and add their
conclusion to the known facts. This process repeats until the
problem is solved.
Properties of Forward-Chaining:
o It is a down-up approach, as it moves from bottom to
top.
o It is a process of making a conclusion based on known
facts or data, by starting from the initial state and
reaches the goal state.
o Forward-chaining approach is also called as data-driven
as we reach to the goal using available data.
o Forward -chaining approach is commonly used in the
expert system, such as CLIPS, business, and production
rule systems.

Consider the following famous example which we will use in


both approaches:
Example:
"As per the law, it is a crime for an American to sell
weapons to hostile nations. Country A, an enemy of
America, has some missiles, and all the missiles were sold to
it by Robert, who is an American citizen."
Prove that "Robert is criminal."
To solve the above problem, first, we will convert all the
above facts into first-order definite clauses, and then we will
use a forward-chaining algorithm to reach the goal.
Facts Conversion into FOL:
o It is a crime for an American to sell weapons to hostile
nations. (Let's say p, q, and r are variables)
American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) →
Criminal(p) ...(1)
o Country A has some missiles. ?p Owns(A, p) ∧
Missile(p). It can be written in two definite clauses by
using Existential Instantiation, introducing new Constant
T1.
Owns(A, T1) ......(2)
Missile(T1) .......(3)
o All of the missiles were sold to country A by Robert.
?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A)
......(4)
o Missiles are weapons.
Missile(p) → Weapons (p) .......(5)
o Enemy of America is known as hostile.
Enemy(p, America) →Hostile(p) ........(6)
o Country A is an enemy of America.
Enemy (A, America) .........(7)
o Robert is American
American(Robert). ..........(8)

Forward chaining proof:


Step-1:
In the first step we will start with the known facts and will
choose the sentences which do not have implications, such
as: American(Robert), Enemy(A, America), Owns(A, T1), and
Missile(T1). All these facts will be represented as below.

Step-2:
At the second step, we will see those facts which infer from
available facts and with satisfied premises.
Rule-(1) does not satisfy premises, so it will not be added in
the first iteration.
Rule-(2) and (3) are already added.
Rule-(4) satisfy with the substitution {p/T1}, so Sells (Robert,
T1, A) is added, which infers from the conjunction of Rule (2)
and (3).
Rule-(6) is satisfied with the substitution(p/A), so Hostile(A) is
added and which infers from Rule-(7).

Step-3:
At step-3, as we can check Rule-(1) is satisfied with the
substitution {p/Robert, q/T1, r/A}, so we can add
Criminal(Robert) which infers all the available facts. And
hence we reached our goal statement.
Hence it is proved that Robert is Criminal using forward
chaining approach.
B. Backward Chaining:
Backward-chaining is also known as a backward deduction or
backward reasoning method when using an inference engine.
A backward chaining algorithm is a form of reasoning, which
starts with the goal and works backward, chaining through
rules to find known facts that support the goal.
Properties of backward chaining:
o It is known as a top-down approach.
o Backward-chaining is based on modus ponens inference
rule.
o In backward chaining, the goal is broken into sub-goal or
sub-goals to prove the facts true.
o It is called a goal-driven approach, as a list of goals
decides which rules are selected and used.
o Backward -chaining algorithm is used in game theory,
automated theorem proving tools, inference engines,
proof assistants, and various AI applications.
o The backward-chaining method mostly used
a depth-first search strategy for proof.

Example:
In backward-chaining, we will use the same above example,
and will rewrite all the rules.
o American (p) ∧ weapon(q) ∧ sells (p, q, r) ∧ hostile(r) →
Criminal(p) ...(1)
Owns(A, T1) ........(2)
o Missile(T1)
o ?p Missiles(p) ∧ Owns (A, p) → Sells (Robert, p, A)
......(4)
o Missile(p) → Weapons (p) .......(5)
o Enemy(p, America) →Hostile(p) ........(6)
o Enemy (A, America) .........(7)
o American(Robert). ..........(8)

Backward-Chaining proof:
In Backward chaining, we will start with our goal predicate,
which is Criminal(Robert), and then infer further rules.
Step-1:
At the first step, we will take the goal fact. And from the goal
fact, we will infer other facts, and at last, we will prove those
facts true. So our goal fact is "Robert is Criminal," so following
is the predicate of it.
Step-2:
At the second step, we will infer other facts form goal fact
which satisfies the rules. So as we can see in Rule-1, the goal
predicate Criminal (Robert) is present with substitution
{Robert/P}. So we will add all the conjunctive facts below the
first level and will replace p with Robert.
Here we can see American (Robert) is a fact, so it is proved
here.

Step-3:t At step-3, we will extract further fact Missile(q)


which infer from Weapon(q), as it satisfies Rule-(5). Weapon
(q) is also true with the substitution of a constant T1 at q.
Step-4:
At step-4, we can infer facts Missile(T1) and Owns(A, T1) form
Sells(Robert, T1, r) which satisfies the Rule- 4, with the
substitution of A in place of r. So these two statements are
proved here.
Step-5:
At step-5, we can infer the fact Enemy(A,
America) from Hostile(A) which satisfies Rule- 6. And hence
all the statements are proved true using backward chaining.

Difference between backward chaining and forward


chaining

S. Forward Chaining Backward Chaining


No
.

1. Forward chaining starts from Backward chaining starts from


known facts and applies the goal and works backward
inference rule to extract through inference rules to
more data unit it reaches to find the required facts that
the goal. support the goal.
2. It is a bottom-up approach It is a top-down approach
3. Forward chaining is known Backward chaining is known
as data-driven inference as goal-driven technique as
technique as we reach to the we start from the goal and
goal using the available data. divide into sub-goal to extract
the facts.
4. Forward chaining reasoning Backward chaining reasoning
applies a breadth-first search applies a depth-first search
strategy. strategy.
5. Forward chaining tests for all Backward chaining only tests
the available rules for few required rules.
6. Forward chaining is suitable Backward chaining is suitable
for the planning, monitoring, for diagnostic, prescription,
control, and interpretation and debugging application.
application.
7. Forward chaining can Backward chaining generates
generate an infinite number a finite number of possible
of possible conclusions. conclusions.
8. It operates in the forward It operates in the backward
direction. direction.
9. Forward chaining is aimed Backward chaining is only
for any conclusion. aimed for the required data.

Uncertainty:
Till now, we have learned knowledge representation using
first-order logic and propositional logic with certainty, which
means we were sure about the predicates. With this
knowledge representation, we might write A→B, which
means if A is true then B is true, but consider a situation
where we are not sure about whether A is true or not then
we cannot express this statement, this situation is called
uncertainty.
So to represent uncertain knowledge, where we are not sure
about the predicates, we need uncertain reasoning or
probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in
the real world.
1. Information occurred from unreliable sources.
2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.

Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation
where we apply the concept of probability to indicate the
uncertainty in knowledge. In probabilistic reasoning, we
combine probability theory with logic to handle the
uncertainty.
We use probability in probabilistic reasoning because it
provides a way to handle the uncertainty that is the result of
someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the
certainty of something is not confirmed, such as "It will rain
today," "behavior of someone for some situations," "A match
between two teams or two players." These are probable
sentences for which we can assume that it will happen but
not sure about it, so here we use probabilistic reasoning.
Need of probabilistic reasoning in AI:
o When there are unpredictable outcomes.
o When specifications or possibilities of predicates
becomes too large to handle.
o When an unknown error occurs during an experiment.

In probabilistic reasoning, there are two ways to solve


problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics

As probabilistic reasoning uses probability and related terms,


so before understanding probabilistic reasoning, let's
understand some common terms:
Probability: Probability can be defined as a chance that an
uncertain event will occur. It is the numerical measure of the
likelihood that an event will occur. The value of probability
always remains between 0 and 1 that represent ideal
uncertainties.
1. 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.

1. P(A) = 0, indicates total uncertainty in an event A.


1. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the
below formula.

o P(¬A) = probability of a not happening event.


o P(¬A) + P(A) = 1.

Event: Each possible outcome of a variable is called an event.


Sample space: The collection of all possible events is called
sample space.
Random variables: Random variables are used to represent
the events and objects in the real world.
Prior probability: The prior probability of an event is
probability computed before observing new information.
Posterior Probability: The probability that is calculated after
all evidence or information has taken into account. It is a
combination of prior probability and new information.
Conditional probability:
Conditional probability is a probability of occurring an event
when another event has already happened.
Let's suppose, we want to calculate the event A when event B
has already occurred, "the probability of A under the
conditions of B", it can be written as:
Where P(A⋀B)= Joint probability of a and B
P(B)= Marginal probability of B.
If the probability of A is given and we need to find the
probability of B, then it will be given as:

It can be explained by using the below Venn diagram, where


B is occurred event, so sample space will be reduced to set B,
and now we can only calculate event A when event B is
already occurred by dividing the probability of P(A⋀B) by P( B
).

Example:
In a class, there are 70% of the students who like English and
40% of the students who likes English and mathematics, and
then what is the percent of students those who like English
also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Bayes' theorem:

Bayes' theorem is also known as Bayes' rule, Bayes' law,


or Bayesian reasoning, which determines the probability of
an event with uncertain knowledge.
In probability theory, it relates the conditional probability and
marginal probabilities of two random events.
Bayes' theorem was named after the British
mathematician Thomas Bayes. The Bayesian inference is an
application of Bayes' theorem, which is fundamental to
Bayesian statistics.
It is a way to calculate the value of P(B|A) with the
knowledge of P(A|B).
Bayes' theorem allows updating the probability prediction of
an event by observing new information of the real world.
Example: If cancer corresponds to one's age then by using
Bayes' theorem, we can determine the probability of cancer
more accurately with the help of age.
Bayes' theorem can be derived using product rule and
conditional probability of event A with known event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or

Similarly, the probability of event B with known event A:


1. P(A ⋀ B)= P(B|A) P(A)

Equating right hand side of both the equations, we will get:

The above equation (a) is called as Bayes' rule or Bayes'


theorem. This equation is basic of most modern AI systems
for probabilistic inference.
It shows the simple relationship between joint and
conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate,
and it will be read as Probability of hypothesis A when we
have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that
hypothesis is true, then we calculate the probability of
evidence.
P(A) is called the prior probability, probability of hypothesis
before considering the evidence
P(B) is called marginal probability, pure probability of an
evidence.
In the equation (a), in general, we can write P (B) =
P(A)*P(B|Ai), hence the Bayes' rule can be written as:

Where A1, A2, A3,........, An is a set of mutually exclusive and


exhaustive events.
Applying Bayes' rule:

Bayes' rule allows us to compute the single term P(B|A) in


terms of P(A|B), P(B), and P(A). This is very useful in cases
where we have a good probability of these three terms and
want to determine the fourth one. Suppose we want to
perceive the effect of some unknown cause, and want to
compute that cause, then the Bayes' rule becomes:

Example-1:
Question: what is the probability that a patient has diseases
meningitis with a stiff neck?
Given Data:
A doctor is aware that disease meningitis causes a patient to
have a stiff neck, and it occurs 80% of the time. He is also
aware of some more facts, which are given as follows:
o The Known probability that a patient has meningitis
disease is 1/30,000.
o The Known probability that a patient has a stiff neck is
2%.

Let a be the proposition that patient has stiff neck and b be


the proposition that patient has meningitis. , so we can
calculate the following as:
P(a|b) = 0.8
P(b) = 1/30000
P(a)= .02

Hence, we can assume that 1 patient out of 750 patients has


meningitis disease with a stiff neck.
Example-2:
Question: From a standard deck of playing cards, a single
card is drawn. The probability that the card is king is 4/52,
then calculate posterior probability P(King|Face), which
means the drawn face card is a king card.
Solution:

P(king): probability that the card is King= 4/52= 1/13


P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a
king = 1
Putting all values in equation (i) we will get:

Application of Bayes' theorem in Artificial intelligence:

Following are some applications of Bayes' theorem:


o It is used to calculate the next step of the robot when
the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.

Bayesian Belief Network in artificial intelligence


Bayesian belief network is key computer technology for
dealing with probabilistic events and to solve a problem
which has uncertainty. We can define a Bayesian network as:
"A Bayesian network is a probabilistic graphical model which
represents a set of variables and their conditional
dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision
network, or Bayesian model.
Bayesian networks are probabilistic, because these networks
are built from a probability distribution, and also use
probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to
represent the relationship between multiple events, we need
a Bayesian network. It can also be used in various tasks
including prediction, anomaly detection, diagnostics,
automated insight, reasoning, time series prediction,
and decision making under uncertainty.
Bayesian Network can be used for building models from data
and experts opinions, and it consists of two parts:
o Directed Acyclic Graph
o Table of conditional probabilities.

The generalized form of Bayesian network that represents


and solve decision problems under uncertain knowledge is
known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs
(directed links), where:
o Each node corresponds to the random variables, and a
variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship
or conditional probabilities between random variables.
These directed links or arrows connect the pair of nodes
in the graph.
These links represent that one node directly influence
the other node, and if there is no directed link that
means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random
variables represented by the nodes of the network
graph.
o If we are considering node B, which is connected
with node A by a directed arrow, then node A is
called the parent of Node B.
o Node C is independent of node A.

The Bayesian network has mainly two components:


o Causal Component
o Actual numbers

Each node in the Bayesian network has condition probability


distribution P(Xi |Parent(Xi) ), which determines the effect of
the parent on that node.
Bayesian network is based on Joint probability distribution
and conditional probability. So let's first understand the joint
probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities
of a different combination of x1, x2, x3.. xn, are known as
Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in
terms of the joint probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:
Let's understand the Bayesian network through an example
by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to
detect burglary. The alarm reliably responds at detecting a
burglary but also responds for minor earthquakes. Harry has
two neighbors David and Sophia, who have taken a
responsibility to inform Harry at work when they hear the
alarm. David always calls Harry when he hears the alarm, but
sometimes he got confused with the phone ringing and calls
at that time too. On the other hand, Sophia likes to listen to
high music, so sometimes she misses to hear the alarm. Here
we would like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there
is neither a burglary, nor an earthquake occurred, and David
and Sophia both called the Harry.
Solution:
o The Bayesian network for the above problem is given
below. The network structure is showing that burglary
and earthquake is the parent node of the alarm and
directly affecting the probability of alarm's going off, but
David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do
not directly perceive the burglary and also do not notice
the minor earthquake, and they also not confer before
calling.
o The conditional distributions for each node are given as
conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the
entries in the table represent an exhaustive set of cases
for the variable.
o In CPT, a boolean variable with k boolean parents
contains 2K probabilities. Hence, if there are two
parents, then CPT will contain 4 probability values

List of all events occurring in this network:


o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)

We can write the events of problem statement in the form of


probability: P[D, S, A, B, E], can rewrite the above probability
statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Decision tree
o Decision Tree is a Supervised learning technique that
can be used for both classification and Regression
problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier,
where internal nodes represent the features of a
dataset, branches represent the decision rules and each
leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are
the Decision Node and Leaf Node. Decision nodes are
used to make any decision and have multiple branches,
whereas Leaf nodes are the output of those decisions
and do not contain any further branches.
o The decisions or the test are performed on the basis of
features of the given dataset.
o It is a graphical representation for getting all the
possible solutions to a problem/decision based on
given conditions.
o It is called a decision tree because, similar to a tree, it
starts with the root node, which expands on further
branches and constructs a tree-like structure.
o In order to build a tree, we use the CART
algorithm, which stands for Classification and
Regression Tree algorithm.
o A decision tree simply asks a question, and based on the
answer (Yes/No), it further split the tree into subtrees.
o Below diagram explains the general structure of a
decision tree
Why use Decision Trees?
There are various algorithms in Machine learning, so
choosing the best algorithm for the given dataset and
problem is the main point to remember while creating a
machine learning model. Below are the two reasons for using
the Decision tree:
o Decision Trees usually mimic human thinking ability
while making a decision, so it is easy to understand.
o The logic behind the decision tree can be easily
understood because it shows a tree-like structure.

Decision Tree Terminologies


 Root Node: Root node is from where the decision tree
starts. It represents the entire dataset, which further gets
divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the
tree cannot be segregated further after getting a leaf node.
 Splitting: Splitting is the process of dividing the decision
node/root node into sub-nodes according to the given
conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the
unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the
parent node, and other nodes are called the child nodes.
How does the Decision Tree algorithm Work?
In a decision tree, for predicting the class of the given
dataset, the algorithm starts from the root node of the tree.
This algorithm compares the values of root attribute with the
record (real dataset) attribute and, based on the comparison,
follows the branch and jumps to the next node.
For the next node, the algorithm again compares the
attribute value with the other sub-nodes and move further. It
continues the process until it reaches the leaf node of the
tree. The complete process can be better understood using
the below algorithm:
o Step-1: Begin the tree with the root node, says S, which
contains the complete dataset.
o Step-2: Find the best attribute in the dataset
using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible
values for the best attributes.
o Step-4: Generate the decision tree node, which contains
the best attribute.
o Step-5: Recursively make new decision trees using the
subsets of the dataset created in step -3. Continue this
process until a stage is reached where you cannot
further classify the nodes and called the final node as a
leaf node.

Example: Suppose there is a candidate who has a job offer


and wants to decide whether he should accept the offer or
Not. So, to solve this problem, the decision tree starts with
the root node (Salary attribute by ASM). The root node splits
further into the next decision node (distance from the office)
and one leaf node based on the corresponding labels. The
next decision node further gets split into one decision node
(Cab facility) and one leaf node. Finally, the decision node
splits into two leaf nodes (Accepted offers and Declined
offer). Consider the below diagram:
Attribute Selection Measures
While implementing a Decision tree, the main issue arises
that how to select the best attribute for the root node and for
sub-nodes. So, to solve such problems there is a technique
which is called as Attribute selection measure or ASM. By
this measurement, we can easily select the best attribute for
the nodes of the tree. There are two popular techniques for
ASM, which are:
o Information Gain
o Gini Index

1. Information Gain:
o Information gain is the measurement of changes in
entropy after the segmentation of a dataset based on an
attribute.
o It calculates how much information a feature provides us
about a class.
o According to the value of information gain, we split the
node and build the decision tree.
o A decision tree algorithm always tries to maximize the
value of information gain, and a node/attribute having
the highest information gain is split first. It can be
calculated using the below formula:

1. Information Gain= Entropy(S)- [(Weighted Avg) *Entropy


(each feature)
Entropy: Entropy is a metric to measure the impurity in a
given attribute. It specifies randomness in data. Entropy can
be calculated as:
Entropy(s)= -P(yes)log2 P(yes)- P(no) log2 P(no)
Where,
o S= Total number of samples
o P(yes)= probability of yes
o P(no)= probability of no

2. Gini Index:
o Gini index is a measure of impurity or purity used while
creating a decision tree in the CART(Classification and
Regression Tree) algorithm.
o An attribute with the low Gini index should be preferred
as compared to the high Gini index.
o It only creates binary splits, and the CART algorithm uses
the Gini index to create binary splits.
o Gini index can be calculated using the below formula:

Gini Index= 1- ∑jPj2


Pruning: Getting an Optimal Decision tree
Pruning is a process of deleting the unnecessary nodes from a
tree in order to get the optimal decision tree.
A too-large tree increases the risk of overfitting, and a small
tree may not capture all the important features of the
dataset. Therefore, a technique that decreases the size of the
learning tree without reducing accuracy is known as Pruning.
There are mainly two types of tree pruning technology used:
o Cost Complexity Pruning
o Reduced Error Pruning.

Advantages of the Decision Tree


o It is simple to understand as it follows the same process
which a human follow while making any decision in
real-life.
o It can be very useful for solving decision-related
problems.
o It helps to think about all the possible outcomes for a
problem.
o There is less requirement of data cleaning compared to
other algorithms.

Disadvantages of the Decision Tree


o The decision tree contains lots of layers, which makes it
complex.
o It may have an overfitting issue, which can be resolved
using the Random Forest algorithm.
o For more class labels, the computational complexity of
the decision tree may increase.

Common sense
In artificial intelligence (AI), commonsense reasoning is a
human-like ability to make presumptions about the type and
essence of ordinary situations humans encounter every day.
These assumptions include judgments about the nature of
physical objects, taxonomic properties, and peoples'
intentions.
Some definitions and characterizations of common sense
from different authors include:
● "Commonsense knowledge includes the basic facts
about events (including actions) and their effects,
facts about knowledge and how it is obtained, facts
about beliefs and desires. It also includes the basic
facts about material objects and their properties."
● "Commonsense knowledge differs from encyclopedic
knowledge in that it deals with general knowledge
rather than the details of specific entities."
● Commonsense knowledge is "real world knowledge
that can provide a basis for additional knowledge to
be gathered and interpreted automatically".
● The commonsense world consists of "time, space,
physical interactions, people, and so on".
● Common sense is "all the knowledge about the world
that we take for granted but rarely state out loud".
● Common sense is "broadly reusable background
knowledge that's not specific to a particular subject
area... knowledge that you ought to have."

What is a Plan?
We require domain description, task specification, and goal
description for any planning system. A plan is considered a
sequence of actions, and each action has its preconditions
that must be satisfied before it can act and some effects that
can be positive or negative.
So, we have Forward State Space Planning
(FSSP) and Backward State Space Planning (BSSP) at the
basic level.

1. Forward State Space Planning (FSSP)


FSSP behaves in the same way as forwarding state-space
search. It says that given an initial state S in any domain, we
perform some necessary actions and obtain a new state S'
(which also contains some new terms), called a progression.
It continues until we reach the target position. Action should
be taken in this matter.
o Disadvantage: Large branching factor
o Advantage: The algorithm is Sound

2. Backward State Space Planning (BSSP)


BSSP behaves similarly to backward state-space search. In
this, we move from the target state g to the sub-goal g,
tracing the previous action to achieve that goal. This process
is called regression (going back to the previous goal or
sub-goal). These sub-goals should also be checked for
consistency. The action should be relevant in this case.
o Disadvantages: not sound algorithm (sometimes
inconsistency can be found)
o Advantage: Small branching factor (much smaller than
FSSP)

So for an efficient planning system, we need to combine the


features of FSSP and BSSP, which gives rise to target stack
planning which will be discussed in the next article.
What is planning in AI?
Planning in artificial intelligence is about decision-making
actions performed by robots or computer programs to
achieve a specific goal.
Execution of the plan is about choosing a sequence of tasks
with a high probability of accomplishing a specific task.
Block-world planning problem
o The block-world problem is known as the Sussmann
anomaly.
o The non-interlaced planners of the early 1970s were
unable to solve this problem. Therefore it is considered
odd.
o When two sub-goals, G1 and G2, are given, a
non-interleaved planner either produces a plan for G1
that is combined with a plan for G2 or vice versa.
o In the block-world problem, three blocks labeled 'A', 'B',
and 'C' are allowed to rest on a flat surface. The given
condition is that only one block can be moved at a time
to achieve the target.

The start position and target position are shown in the


following diagram.

Components of the planning system


The plan includes the following important steps:
o Choose the best rule to apply the next rule based on the
best available guess.
o Apply the chosen rule to calculate the new problem
condition.
o Find out when a solution has been found.
o Detect dead ends so they can be discarded and direct
system effort in more useful directions.
o Find out when a near-perfect solution is found.

Target stack plan


o It is one of the most important planning algorithms used
by STRIPS.
o Stacks are used in algorithms to capture the action and
complete the target. A knowledge base is used to hold
the current situation and actions.
o A target stack is similar to a node in a search tree, where
branches are created with a choice of action.

The important steps of the algorithm are mentioned below:


1. Start by pushing the original target onto the stack.
Repeat this until the pile is empty. If the stack top is a
mixed target, push its unsatisfied sub-targets onto the
stack.
2. If the stack top is a single unsatisfied target, replace it
with action and push the action precondition to the
stack to satisfy the condition.
iii. If the stack top is an action, pop it off the stack, execute it
and replace the knowledge base with the action's effect.
If the stack top is a satisfactory target, pop it off the stack.
Non-linear Planning
This Planning is used to set a goal stack and is included in the
search space of all possible sub-goal orderings. It handles the
goal interactions by the interleaving method.
Advantages of non-Linear Planning
Non-linear Planning may be an optimal solution concerning
planning length (depending on the search strategy used).
Disadvantages of Nonlinear Planning
It takes a larger search space since all possible goal orderings
are considered.
Complex algorithm to understand.
Algorithm
1. Choose a goal 'g' from the goal set
2. If 'g' does not match the state, then
o Choose an operator 'o' whose add-list matches goal
g
o Push 'o' on the OpStack
o Add the preconditions of 'o' to the goal set
3. While all preconditions of the operator on top of
OpenStack are met in a state
o Pop operator o from top of opstack
o state = apply(o, state)
o plan = [plan; o]

You might also like