AI Finals Summary Version 2
AI Finals Summary Version 2
ꓯ for all, ꓱ for some/exists, ꓦ or, ꓥ and, → implies, ↔ if and only if, ⌐ not variable symbol, or an n-place function of n terms. 1. Eliminate all ↔ connectives
Sentence = Symbol e.g. (S ꓦ T), ⌐S, (S) - x and f( x1, ..., xn ) are terms, where each xi is a term. 2. Eliminate all → connectives
Possible worlds = all possible outcome (truth table) - A term with no variables is a ground term 3. Reduce the scope of each negation symbol to a single predicate
Model = the possible world that the result is true • An atomic sentence (T/F) is an n-place predicate of n terms 4. Standardize variables: rename all variables
Valid sentence/tautology = always true • A complex sentence is atomic sentences with logical connectives 5. Eliminate existential quantification by introducing Skolem
Inconsistence sentence/contradiction = always false • A quantified sentence adds quantifiers ꓯ and ꓱ constants/functions
• A well-formed formula (wff) containing no “free” variables. That is, all (ꓱx)P(x) → P(c)
variables are “bound” by universal or existential quantifiers. c is a Skolem constant (not used in any other sentence)
Universal quantification is equivalent to: (ꓯx)(ꓱy)P(x,y) → (ꓯx)P(x, f(x))
– Conjunction of all sentences obtained by substituting an object for the since ꓱ is within universally quantified variable, use a Skolem function f
quantified variable. to construct a new value that depends on the universally quantified
– Used with “implies” to form “rules” and make blanket statements about variable
every individual in the world f must be a new function name not occurring in an where else in the KB.
Existential quantification is equivalent to: 6. Remove universal quantifiers by moving them all to the left end; making
– Disjunction of all sentences obtained by substitution of an object for the the scope of each the entire sentence; and dropping the “prefix” part
quantified variable 7. Put into conjunctive normal form using distributive and associative laws
– Used with “and” to specify a list of properties about an individual 8. Split conjuncts into separate clauses
A common mistake is to represent this English sentence as the FOL 9. Standardize variables so each clause contains only variable names that
sentence do not occur in any other clause
(ꓱx) MUICT-student(x) → smart(x) Unification
– But what happens when there is a person who is not an MUICT • “pattern-matching” procedure
student? – Takes two atomic sentences, called literals, as input
switching order of the same quantifiers is fine, but switching order of – Returns “Failure” if they do not match, substitution list, θ, if they do
different quantifiers changes the meaning • unify(p,q) = θ means subst(θ, p) = subst(θ, q) for p and q
• θ is called the most general unifier (mgu)
• All variables in the given two literals are implicitly universally quantified
• To make literals match, replace (ꓯx) variables by terms
Resolution TP as search
• Resolution as the bottom-up construction of a search tree, where the
leaves are the clauses produced by KB and the negation of the goal
• When a pair of clauses generates a new resolvent clause, add a new
node to the tree with arcs directed from the resolvent to 2 parent clauses
• Resolution succeeds when a node containing the False clause is
produced, becoming the root node of the tree
• A strategy is complete if its use guarantees that the empty clause (i.e.,
note: KB: knowledge base / already existing sentences
false) can be derived whenever it is entailed Strategies
• several general (domain-independent) strategies are useful in controlling
a resolution theorem prover.
– Breadth-first
• Level 0 clauses are the original axioms and negation of the goal
• Level k clauses are the resolvents computed from two clauses, one of
which must be from level k-1 and the other from any earlier level
• Compute all possible level 1 clauses, then level 2 clauses, etc.
• Complete, but very inefficient
Sentences-> clauses:
1. Eliminate ↔
2. Eliminate →
3. Reduce scope of ⌐
4. Distribute ˅ over ˄
Represent as a set of clauses
• A literal is a either an atomic sentence or the negate, e.g. P or ⌐P – Length heuristics
• Two literals are complementary if one is the negation, e.g. P and ⌐P • Shortest-clause heuristic:
Generate a clause with the fewest literals first
• Unit resolution:
proof: derive with the above rules to see if the whole sentence is always Prefer resolution steps in which at least one parent clause is a “unit
true or not clause,” i.e., a clause containing a single literal
– Not complete in general, but complete for Horn clause KBs
Resolution refutation
however, it is hard to identify individuals or talk about properties and • Set of axioms KB and goal sentence Q, show that KB |= Q
relationships of an individual. Generalizations, patterns, regularities also • Proof by contradiction: Add Q to KB and try to prove false. – Set of support
can’t easily be represented i.e., (KB |- Q) ↔ (KB ∧ ⌐Q |- False) Set of support
First order Logic • Resolution is refutation complete: given sentence Q is entailed by KB, • At least one parent clause must be the negation of the goal or a
– Objects, individual identities but can’t be used to generate all logical consequences of set of sentences “descendant” of such a goal clause (i.e., derived from a goal clause)
– Properties of objects that distinguish them from other objects • Also, it cannot be used to prove that Q is not entailed by KB. • (When there’s a choice, take the most recent descendant)
– Relations that hold among sets of objects • Resolution won’t always answer since entailment is only semi-decidable • Complete (assuming all possible set-of-support clauses are derived)
– Functions, subset of relations, there is only one “value” for given “input” – And you can’t just run two proofs in parallel, one trying to prove Q and
User provides the other trying to prove ⌐Q, since KB might not entail either one.
• Constant symbols, which represent individuals in the world We need answers to the following questions
– Mary –3 – Green • How to convert FOL sentences to conjunctive normal form (a.k.a. CNF,
• Function symbols, which map individuals to individuals clause form): normalization and skolemization
– father-of(Mary) = John – color-of(Sky) = Blue • Unify two argument lists, i.e., find most general unifier q: unification
• Predicate symbols, which map individuals to truth values • How to determine which two clauses in KB should be resolved next
– Greater(5,3) – Green(Grass) – Color(Grass, Green) (among all resolvable pairs of clauses) : resolution (search) strategy
FOL Provides
• Variable symbols • Quantifiers
• Connectives (PL) – Universal ꓯx “for all”
– Existential ꓱx “there exists”
• Gives a goal-directed character to the search The Joint Probability Distribution Joint Probability Distribution ▪ Independence lemma A variable is
• If we have a probabilistic knowledge base, we would like to be able to independent that are not its successors given its direct predecessors. ▪
query the probability of any combination of variables Theorem: Bayesian network G, with variables V1 to Vn, only one
• To compute any query, we need a specification of full joint distribution probability distribution consistent with the conditional probability tables.
Once you have the joint probability distribution, you can calculate any The distribution from
probability involving A, B, and C Inferences
▪ Computing posterior probabilities
▪ Diagnostic inference: find the cause
▪ Predictive inference: predict the result
▪ Determining the most informative test
▪ Determining the configuration of highest probability
▪ Determining optimal courses of action
sums up to 1 ▪ Explanation
Updating Beliefs Divorcing: too many parents? Introduce immediate node
• Beliefs are not static; they change with new information Noisy OR: If the causes act independently and in a regular way, we may
• If initial belief in H is P(H) and observe E, then new belief in H = P(H|E) use a generic combination function.
– Robotics Model of an intelligent agent: RoC curve: 2 bell graphs overlapping -> confusion matrix
– Diagnostic systems – An agent has some current beliefs about the possible states of the True Positive Rate (TPR) = TP/P = TP/(TP + FN)
– Disease world. Prior probabilities False Positive Rate (FPR) = FP/N = FP/(FP + TN)
surveillance – Some observation then updates by conditioning on that observation Sensitivity = TPR
– Intelligent tutoring – Continues updating as it makes more observations Specificity = 1 – FPR
Bayesian Networks Normalizing beliefs: ∑target belief/ ∑remaining possible worlds Bays net learning problem
• Bayesian networks help us reason with uncertainty The Problem with Joint Distributions
• They are used in many systems applications • Lots of entries in the table to fill up
• Inferences are explainable • For k Boolean random variables, you need a table of size 2^k
• Decision surface
Types of Classification Algorithms • to use less numbers-> independence
– Find best partition of the
• Memory based Variables A and B are independent if any of the following hold:
space
– Define a distance between samples • P(A,B) = P(A) P(B) Thershold ROC Curve
– Decision trees
– K Nearest Neighbor • P(A | B) = P(A)
– SVM
• Generative models • P(B | A) = P(B) -> knowing outcome of A doesn’t tell outcome of B
– Induce a model and impose a decision rule How is independence useful?
– Bayesian Networks • Suppose you have n coin flips and you want to calculate the joint
distribution P(C1 , …, Cn )
Explainable AI (XAI) • If the coin flips are not independent, you need 2n values in the table
• An AI system for which humans can understand the decisions or • If the coin flips are independent, then P(C1 , …, Cn )= ∏(1-n) P(Ci)
predictions made. Conditional Independence Decision Making
• We need to understand the decisions to build trust in the algorithm Variables A and B are conditionally independent given C if any of the Single stage
• This is also related to liability following hold:
– medicine, defense, finance, law • P(A, B | C) = P(A | C) P(B | C)
• In 2018 the EU introduced a right to explanation • P(A | B, C) = P(A | C) -> C tells everything of A, nothing is gained from B
in the General Data Protection Right (GDPR) • P(B | A, C) = P(B | C)
Types of Models
• Non-interpretable
– Neural nets
– Ensemble techniques (e.g. Random Forest) Multiple Stage
– But tend to be the most powerful
• Interpretable (Explainable)
– Decision trees
– KNN
– Association rules
– Bayesian Networks Bayes’ rule allows us to express the quantity P(H|E), which people often
Probability Basics: Random Variables find hard to assess, in terms of quantities that can be drawn from
• A random variable is the basic element of probability experience. Can think from cause (H) to effect (E).
• Event for which there is some degree of uncertainty as to its outcome Baysian Network Utility Theory
Probability Basics Consists of 1. Directed acyclic graph 2. Table for each node in the graph -Best outcome = 1, worst = 0
• Describe the state of the world with a set of value assignments to a set -Questions about person’s equivalent for number of 50/50 lotteries
of random variables. -Plot the points
• A random variable is a variable that can take on exactly one value from a
set of mutually exclusive and exhaustive values
We will use capital letters (A, B, …) for random variables and lower-case
letters (a, b, …) for their values. E.g. A = a1
– A,B: A and B Parent=has arrow from a node to another while arrow means the starting
– So P(A,B) means the probability of A and B node has direct influence to end node
• We will write A when we mean any value of the random variable A A Set of Tables for Each: Node Each node Xi has a conditional probability
– P(Temperature) means the probability of any value of Temperature distribution P(Xi | Parents(Xi )) that quantifies the effect of the parents on
– P(Temperature = <38) means the probability of a particular value the node The parameters are the probabilities in these conditional
Axioms of probability probability tables (CPTs) Influence Diagram
– P(A) ≥ 0, where A is any proposition For a given combination of values of the parents (B in this example), the
– P(T) = 1 entries for P(C=true | B) and P(C=false | B) must add up to 1 eg. P(C=true |
– P(A or B) = P(A) + P(B) if A and B are mutually exclusive B=false) + P(C=false |B=false )=1 , k parents=2^k+1 probabilities (2^k
Conditional Probability entries in the table)
• P(H = true | E = true) = Out of all the outcomes in which E is true, how If a node has multiple parents, those causes interact in various ways. So
many also have H equal to true we need the probability of the child given all combinations of the parents.
• “Probability of H conditioned on E” or “Probability of H given E” Two important properties: 1. Encodes the conditional independence
• A measure of how relevant E is to H relationships between the variables in the graph structure 2. Is a compact
representation of the joint probability distribution over the variables
Types of Connections
• Serial connections: Evidence may be transmitted through a serial
connection unless the state of the variable in the connection is known.
•Diverging connections: Evidence can be transmitted through a diverging
connection unless it is instantiated.
P(H = true|F = true) =
P(H true,F true)
In general, P(X|Y)=P(X,Y)/P(Y) •Converging connections
P(F true)
Building the structure of a Bayesian network: 1. Represent the hypotheses
as random variables. 2. Identify observable information which can tell us
about the states of the hypotheses. (Observable/information variables) 3.
Determine causal structure between the variables, i.e., events have a
direct causal impact on other events. 4. Quantify the links with CPTs.