0% found this document useful (0 votes)
22 views

AI & ML Unit 2 Notes

Uploaded by

Anandakumar A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

AI & ML Unit 2 Notes

Uploaded by

Anandakumar A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Unit – II

Acting Under Uncertainty:


Uncertainty:
• The knowledge representation, A->B means if A is true then B is true, but a
situation where not sure about whether A is true or not then cannot express this
statement, this situation is called Uncertainty.
• Agents must act under Uncertainty.

Causes for uncertainty:

• Information occurred from unreliable sources


• Experimental Errors
• Equipment Fault
• Temperature variation
• Climate change

Probabilistic Reasoning:

• It is a way of knowledge representation, the concept of probability is applied to


indicate the uncertainty in knowledge.
Need for Probabilistic Reasoning in AI:
✓ When there are unpredictable outcomes
✓ When an unknown error occurs during an experiment

Ways to solve problems with uncertain knowledge:

✓ Baye’s rule
✓ Bayesian Statistics

Probability:

• It can be defined as a chance that an uncertain event will occur.


• The value of probability always remains between 0 and 1.
• 0 ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
• P(A) = 0, indicates total uncertainty in an event A
• P(A) =1, indicates total certainty in an event A.

Event: Each possible outcome of a variable is called event.


Sample Space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and
objects in real world.
Prior Probability: It is probability computed before observing new information.
Posterior Probability: It is calculated after all information has taken into
account.

Conditional Probability:

• It is probability of occurring an event when another event has already happened.

Bayesian Inference:

• Bayesian inference is a probabilistic approach to machine learning that provides


estimates of the probability of specific events.
• Bayesian inference is a statistical method for understanding the uncertainty
inherent in prediction problems.
• Bayesian inference algorithm can be viewed as a Markov Chain Monte Carlo
algorithm that uses prior probability distributions to optimize the likelihood
function.
• The basis of Bayesian inference is the notion of a priori and a posteriori
probabilities.
• The priori probability is the probability of an event before any evidence is
considered.
• The posteriori probability is the probability of an event after taking into account
all available evidence.

Baye’s Theorem / Baye’s Rule:

• Baye’s theorem determines the probability of an event with uncertain


knowledge.
• It can be derived using product rule and conditional probability of event A
with known event B.

P(A|B) is known as posterior, P(B|A) is called likelihood, P(A) is called Prior


probability, P(B) is called Marginal Probability.

Application of Baye’s Theorem:

• It is used to calculate next step pf robot when already executed step is given.
• Helpful in weather forecasting.
• Solve Monty Hall problem

Naïve Bayes Theorem:

• It is a classification technique based on Baye’s Theorem with an independence


assumption.
• The full joint distribution can be written as

Bayesian Networks:

• "A Bayesian network is a probabilistic graphical model which represents a set of


variables and their conditional dependencies using a directed acyclic graph."
• It is also called a Bayes network, belief network, decision network, or Bayesian
model.
• Bayesian Network can be used for building models from data and experts
opinions, and it consists of two parts: Directed Acyclic Graph, Table of
conditional probabilities.
• It is used to represent conditional dependencies.
• It can also be used in various tasks including prediction, anomaly detection,
diagnostics, automated insight, reasoning, time series prediction.
• A Bayesian network graph is made up of nodes and Arcs.

• Each node corresponds to the random variables, and a variable can be continuous
or discrete.
• Arc or directed arrows represent the causal relationship or conditional
probabilities between random variables.
• These directed links or arrows connect the pair of nodes in the graph.
• These links represent that one node directly influence the other node.
• The Bayesian network graph does not contain any cyclic graph. Hence, it is
known as a directed acyclic graph or DAG.
• The Bayesian network has mainly two components: 1. Causal Component 2.
Actual numbers
• Bayesian network is based on Joint probability distribution and conditional
probability.

Joint probability distribution:

• If variables are x1, x2, x3,....., xn, then the probabilities of a different
combination of x1, x2, x3.. xn, are known as Joint probability distribution.
• P[x1, x2, x3, ,xn], can be written as the following way in terms of the joint
probability distribution. = P[x1| x2, x3,....., xn]. p[x2, x3, , xn] = P[x1| x2,
x3,....., xn]P[x2|x3,....., xn] P[xn-1|xn]P[xn].
• Global semantics defines the full joint distribution as the product of local
condition distributions.
• Local semantics defines each node is conditionally independent of its
nondescendants given its parents.

Example:

The network structure is showing that burglary and earthquake is the parent node of the alarm and directly
affecting the probability of alarm's going off.

Variables: Burglar, Earthquake, Alarm, Johncalls, Marycalls

Conditional Probability table for Alarm A:


Conditional Probability table for David Calls:

Conditional Probability table for Sophia Calls:

Applications of Bayesian Networks:


• Spam Filtering
• Biomonitoring
• Image processing
• Turbo code
• Document Classification

Exact Inference in BN:

• In exact inference, analytically compute the conditional probability distribution


over the variables of interest.
• The basic task for any probabilistic inference system is to compute the
posterior probability distribution for a set of variables.
• The notation X denotes the query variable, E denotes the set of evidence
variables E1,…,Em, Y denotes nonevidence variables.
• Conditional probability can be computed by summing terms from the full joint
distribution.

• Now, a Bayesian network gives a complete representation of the full joint


distribution.
• More specifically, Equation shows that the terms P(x, e, y) in the joint
distribution can be written as products of conditional probabilities from the
network.
• To compute this expression, we have to add four terms, each computed by
multiplying five numbers.
• In the worst case, where we have to sum out almost all the variables, the
complexity of the algorithm for a network with n Boolean variables is O(n2n).

• This expression can be evaluated by looping through the variables in order.


Variable Elimination Algorithm:

• The enumeration algorithm can be improved substantially by eliminating


repeated calculations.
• Do the calculation once and save the results for later use.
• This is a form of dynamic programming.
• It works by evaluating expressions such as equation in right-to-left order.

Approximate Inference in BN:

• Given the intractability of exact inference in large networks, we will consider approximate inference
methods.

• This section describes randomized sampling algorithms, also called Monte Carlo algorithms.

• They work by generating random events based on the probabilities in the Bayes net and counting
up the different answers found in those random events.

Direct Sampling methods:

• The primitive element in any sampling algorithm is the generation of samples


from a known probability distribution.
• For example, an unbiased coin can be thought of as a random variable Coin with
values (heads, tails) and a prior distribution P(Coin) = (0.5,0.5).
• Sampling from this distribution is exactly like flipping the coin: with probability
0.5 it will return heads, and with probability 0.5 it will return tails.
• Given a source of random numbers r uniformly distributed in the range [0,1], it
is a simple matter to sample any distribution on a single variable, whether
discrete or continuous.
• The idea is to sample each variable in turn, in topological order.
• The probability distribution from which the value is sampled is conditioned on
the values already assigned to the variable’s parents.
• Applying it to the network with the ordering Cloudy, Sprinkler, Rain.
• Sample from P(Cloudy) = {0.5,0.5}, value is true.
• Sample from P(Sprinkler | Cloudy) = {0.1,0.9}, value is false.
• Sample from P(Rain | Cloudy = true) = {0.8,0.2}, value is true.

Rejection Sampling in Bayesian Networks:

• Rejection sampling is a general method for producing samples from a hard-to-


sample distribution given an easy-to-sample distribution.
• It can be used to compute conditional probabilities that is, to determine P(X |e).
• First it generates samples from the prior distribution specified by network.
• Then it rejects all those that do not matches the evidence.

Markov Chain Monte Carlo (MCMC) Algorithm:

• MCMC generates each event by making a random change to the preceding event.
• It is therefore helpful to think of the network as being in a particular current state
specifying a value for every variable.
• The next state is generated by randomly sampling a value for one of the
nonevidence variables Xi, conditioned on the current values of the variables in
the Markov blanket of Xi.
• MCMC therefore wanders randomly around the state space-the space of possible
complete assignmentsflipping one variable at a time, but keeping the evidence
variables fixed.
• Consider the query P(Rain1 Sprinkler = true, Wet Grass = true) applied to the
network.
• The evidence variables Sprinkler and WetGrass are fixed to their observed values
and the hidden variables Cloudy and Rain are initialized randomly.
• Thus, the initial state is [true, true, false, true]. Now the following steps are
executed repeatedly:
• Cloudy is sampled, given the current values of its Markov blanket variables: in
this case, we sample from P(Cloudy1 Sprinkler = true, Rain =false). Suppose the
result is Cloudy =false. Then the new current state is [false, true, false, true].

Causal Networks:

• A causal network is an acyclic digraph arising from an evolution of a substitution


system.
• Each substitution event is a vertex in a causal network.
• Two events which are related by causal dependence, meaning one occurs just
before the other, having edge between the corresponding vertices in the causal
network.
• The edge is directed edge leading from the past event to future event.
• A CBN is a graph formed by nodes representing random variables, connected by
links denoting causal influence.
• Some causal networks are independent of choice of evolution and these are
called Causally Invariant.

Structural Causal Models (SCMs):

• SCMs consists of two parts: a graph which visualizes causal connections, and
equations which express the details of the connection. Graph is a mathematical
construction that consists of vertices(nodes) and edges(links).
• SCMs use a special kind pf graph called Directed Acyclic Graph(DAG) for
which all edges are directed and no cycles exist.
• DAGs are common starting place for causal inference.
• Bayesian and causal networks are completely identical.

• A network with 2 nodes and 1 edge.


• This network can be both a Bayesian or causal network.

Implementing Causal Inference:


1) The do-operator:

• The do-operator is a mathematical representation of a physical intervention.


• If the model starts with Z → X → Y.
2) Confounding:

• In this example, age is a confounder pf education and wealth.


• Adjusting for age just means that when looking at age, education and wealth
data, one would compare data points within afe groups, not between age
groups.

3) Estimating Causal Effects:

• Treatment Effect = (Outcome under E) minus (Outcome under C).


• The difference between the outcome a child would receive if assigned to
treatment E and outcome that same child would receive of assigned to
treatment C.
• These are called Potential Outcomes.

You might also like