Midterm sp09 Solution
Midterm sp09 Solution
Midterm sp09 Solution
For the following questions, a correct answer is worth 2 points, no answer is worth 1 point, and an incorrect
answer is worth 0 points. Circle true or false to indicate your answer.
a) (true or false) If g(s) and h(s) are two admissible A∗ heuristics, then their average f (s) = 12 g(s) + 12 h(s)
must also be admissible.
True. Let h∗ (s) be the true distance from s. We know that g(s) ≤ h∗ (s) and h(s) ≤ h∗ (s), thus f (s) =
1 1 1 ∗ 1 ∗ ∗
2 g(s) + 2 h(s) ≤ 2 h (s) + 2 h (s) = h (s)
b) (true or false) For a search problem, the path returned by uniform cost search may change if we add a
positive constant C to every step cost.
True. Consider that there are two paths from the start state (S) to the goal (G), S → A → G and S → G.
cost(S, A) = 1, cost(A, G) = 1, and cost(S, G) = 3. So the optimal path is through A. Now, if we add 2
to each of the costs, the optimal path is directly from S to G. Since uniform cost search finds the optimal
path, its path will change.
c) (true or false) The running-time of an efficient solver for tree-structured constraint satisfaction problems is
linear in the number of variables.
True. The running time of the algorithm for tree-structured CSPs is O(n · d2 ), where n is the number of
variables and d is the maximum size of any variable’s domain.
d) (true or false) If h1 (s) is a consistent heuristic and h2 (s) is an admissible heuristic, then min(h1 (s), h2 (s))
must be consistent.
False. For instance, if h2 (s) be admissible but inconsistent, and h1 (s) dominate h2 (s), then min(h1 (s), h2 (s)) =
h1 (s), which is inconsistent.
e) (true or false) The amount of memory required to run minimax with alpha-beta pruning is O(bd ) for
branching factor b and depth limit d.
True and False (everyone wins). The memory required is only O(bd), so we accepted False. However, by
definition an algorithm that is O(bd) is also O(bd ), because O denotes upper bounds that may or may not
be tight, so technically this statement is True (but not very useful).
f) (true or false) In a Markov decision process with discount γ = 1, the difference in values for two adjacent
states is bounded by the reward between them: |V (s) − V (s0 )| ≤ maxa R(s, a, s0 ).
False. Let V (s0 ) = 0, and R(s, a, s0 ) = 0 ∀a, but there is an action a0 which takes s to the terminal state T
and gives a reward 100. Thus V (s) ≥ 100, but the inequality above says that |V (s) − 0| ≤ 0.
g) (true or false) Value iteration and policy iteration must always converge to the same policy.
True and False (everyone wins). Both algorithms are guaranteed to converge to the optimal policy, so we
accepted True. If there are multiple policies that are optimal (meaning they yield the same maximal values
for every state), then the algorithms might diverge. Both value iteration and policy iteration will always
lead to the same optimal values.
h) (true or false) In a Bayes’ net, if A ⊥
⊥ B, then A ⊥⊥ B | C for some variable C other than A or B.
False. Consider the Bayes’ net A → C ← B. Clearly, A ⊥⊥ B, but A 6⊥⊥ B | C.
EA!
C
F! G
A C G A C G
2. (15 points) Search: Crawler’s Escape
Whilst Pacman was Q-learning, Crawler snuck into mediumClassic and stole all the dots. Now, it’s trying to
A At each
escape as quickly as possible. C time step,
G Crawler can either
A move C its shoulder
G or its elbow one position
up or down. Both joints s and e have five total positions each (1 through 5) and both begin in position 3.
Upon changing arm positions from (s, e) to (s0 , e0 ), Crawler’s body moves some distance r(s, e, s0 , e0 ), where
C |r(s, e, s0 , e0 )| ≤ 2 meters (negative distance means the crawler moves backwards). Crawler must travel 10
meters to reach the exit.
N Action: increase
elbow position
exit
r(3, 3, 3, 4)
10 meters
In this problem, you will design a search problem for which the optimal solution will allow Crawler to escape
in as few time steps as possible.
(a) (3 pt) Define a state space, start state and goal test to represent this problem.
A state is a 3-tuple consisting of distance to exit, shoulder position, and elbow position.
The start state is (10, 3, 3).
Goal test: distance to exit is less than or equal to zero.
(b) (3 pt) Define the successor function for the start state by listing its (successor state, action, step cost)
triples. Use the actions s+ and s− for increasing and decreasing the shoulder position and e+ and e− for
the elbow.
(c) (2 pt) Would depth-first graph search be complete for the search problem you defined? Explain.
No, depth-first search would not be complete because the state space is infinite. Depth-first search could
forever keep expanding states with increasing distance to the goal.
Distance to exit divided by 2 is admissible because each step can move Crawler at most 2 meters closer
to the goal. A slightly better heuristic would be to round this heuristic up to the nearest integer.
(e) (4 pt) Crawler’s shoulder overheats if it switches direction more than once in any three consecutive time
steps, making progress impossible. Describe how you would change your search problem so that an opti-
mal search algorithm would only return solutions that avoid overheating.
Hired as a consultant, you built a Bayes’ net model of Cheeseboard Pizza customers.
…! Last night, a corporate
X1! all yourXarcs.
spy from Zachary’s Pizza deleted the directions from 2! You still have the
Xn! graph of your
undirected
E1! E2! En-1!
model and a list of dependencies. You now have to recover the direction of the arrows to build a graph that
allows for these dependencies. Note: X 6⊥⊥ Y means X is not independent of Y .
A B
X1!
Distribution
X …! properties
X
(dependencies):
2! n!
E1! E2! En-1!
EA!
C A 6⊥⊥ G
A B
B 6⊥⊥ F
F! E A!
C
G
B 6⊥⊥ G | C
F! G
a) (1 pt) Given the first constraint only, circle all the topologies that are allowed for the triple (A, C, G).
A
AC G
C A
GC G
A C G
A A C G A C G
P!
C
A A C
b) (3 pt) Formulate direction-recovery for this graph as a CSP with explicit binary constraints A G C
only. TheG
D
P! variables and values are provided for you. The variable EA stands for the direction of the arc between A
Action: increase
N
and the center C, where out means the arrow points outward (toward A), and in means the arrow points
elbow position exit
T! C
inward (toward C).
E!
Variables: EA , EB , EF , EG r(3, 3, 3, 4)
10 meters
D Values: out, in
Constraints:
N
An explicit constraint lists all legal tuples of values are listed for a tuple of variables.
exit
T! (EA , EG ) ∈ {(in, out), (out, out), (out, in)}
E! (EB , EF ) ∈ {(in, out), (out, out), (out, in)}
(EB , EG ) r(s
∈ 0, {(in,
e0, s0in)}
, e0+1)
10 meters
c) (1 pt) Draw the constraint graph for this CSP.
EA − EG − EB − EF
d) (2 pt) After selecting EA = in, cross out all values eliminated from EB , EF and EG by forward checking.
EA EB EF EG
in out in out in out in
Forward checking removes the values of variables that directly contradict the assignment EA = in.
e) (2 pt) Cross out all values eliminated by arc consistency applied before any backtracking search.
EA EB EF EG
out in out in out in out in
We first remove EB = out because it is incompatible with any value for EG . Likewise, we remove EG = out.
Now, we can remove EA = in based on EG and EF = in based on EB .
f ) (1 pt) Solve this CSP, then add the correct directions to the arcs of the graph at the top of the page.
EA = out, EB = in, EF = out, and EG = in.
NAME: 5
Now consider a chain structured graph over variables X1 , . . . , Xn . Edges E1 , . . . , En−1 connect adjacent
variables, but their directions are again unknown.
g) (4 pt) Using only binary constraints and two-valued variables, formulate a CSP that is satisfied by only
and all networks that can represent a distribution where X1 6⊥⊥ Xn . Describe what your variables mean in
terms of direction of the arrows in the network.
A B
Variables:
C
Variables E1 , . . . , En−1 correspond to the directions of these edges and take values {lef t, right}.
D E!
Constraints:
Variables:
Variables E1 , . . . , En−1 correspond to the directions of these edges and take values {lef t, right}.
I1 , . . . , In−2 can take values {T, F }.
O1 , . . . , On−2 can take values {T, F }.
Constraints:
Intuitively, we want Ii to be T if Xi → Xi+1 ← Xi+2 , i.e. the triple Xi , Xi+1 , Xi+2 is inactive. Also, we
want Oi to be T if either Ii or Oi−1 is T, i.e. if the current triple is inactive or if any of the prior triples
was inactive. Finally, we want On−2 to be T which would imply that there is some inactive triple.
(Ii , Ei , Ei+1 ) ∈ {(T, right, lef t), (F, right, right), (F, lef t, right), (F, lef t, lef t)} for i ∈ {1, . . . , n − 2}.
(I1 , O1 ) ∈ {(T, T ), (F, F )}
(Oi−1 , Ii , Oi ) ∈ {(F, F, F ), (F, T, T ), (T, F, T ), (T, T, T )} for i ∈ {2, . . . , n − 2}
On−2 = true
6
In the game of Connect-3, players X and O alternate moves, dropping their symbols into columns 1, 2, 3, or 4.
Three-in-a-row wins the game, horizontally, vertically or diagonally. X plays first.
X O
X O
X O X O
O X
O O
X X
O O O
O X O
X X X
O O
X O
X X
O X O X
1 1
2 2
1
3 3
2
4 4
3 14 2 3 4
X X O X
1 factor
(a) (1 pt) What is the maximum branching 2 of3minimax
4 search for Connect-3?
4.
(b) (1 pt) What is the maximum tree depth in plies?
6. According to our terminology, a search ply in a multi-agent search problem includes one move for every
agent. Because the textbook differs in its definition, we also accepted 12.
(c) (1 pt) Give a reasonably tight upper bound on the number of terminal states. Several answers were
accepted:
412 is a reasonable bound on the number of leaves of the search tree.
312 is a reasonable bound on the number of total states in the game, because each square can be empty,
contain an X or contain an O.
A tighter bound is 154 , which follows from the observation that there are only 15 ways to legally fill a
column from the bottom up with X’s and O’s.
Bounds based on the assumption that the entire board would be filled with X’s and O’s were not accepted,
even if they in fact overestimated the true number of terminal states.
(d) (2 pt) Draw the game tree starting from the board shown above right (with O to play next). You may
abbreviate states by drawing only the upper-right region circled. The root node is drawn for you.
X O
!"#
O X O
X X O X
1 2 3 4
X O X O X O
!"# $#
O X O O
X O O X
X O O
X X O
X X O X X X
O O
X X O
1 2 3
1 4
2 3 4 1 2 3 4
X XO X O X X O
X O X O X O X O
!"# $# 1 2 !%#
3 4
O X X
O O
X
O O X
X
O O
O X
O O XO
X O O
O X O
X X O
X X X
O O
X
X X
X
O X
X XO
O OX
X O
X X
X X
O O X
1 2 3
1 4
2 13 2
4 13 24 1 3X 24 3 41 2 3 4
X X O X X O X X O X
X O
X O
X O X O
X O X O
!"# 1 2 3 4
1 2 3 4 !%#
1 2 3 4
O X
O X
O
X X
O O O O X
X
O O
X O X
O O
X X
X O
X
X O
X X
O X
O X O
X
X X
O X
X
O O
X
X O X
1 12 23
1 34
2 43 4 4 1 2X
1 3X
2 4O
13 2
4
X 3
(e) (2 pt) X is the maximizer, while O is the minimizer. X’s utility for terminal states
X X O X is k when X wins and
1 2 3 4 1 2 3 4
−k when O wins, where k is 1 for a horizontal 3-in-a-row win (as in above left), 2 for a vertical win, and
3 for a diagonal win. A tie has value 0. Label each node of your tree with its minimax value.
(f ) (3 pt) Circle all nodes of your tree that will not be explored when using alpha-beta pruning and a move
ordering that maximizes the number of nodes pruned.
X
NAME: O
X O
X O X O 7
5. (21Opoints)O
X MDPs:
O
X Robot
X Soccer
O O O
O X O
X robot
A soccer X A is on X a fast O
O X O
X the X
break toward O X in position
goal, starting O X From positions 1 through 3, it can
1.
either shoot (S) or dribble the ball forward (D); from 4 it can only shoot. If it shoots, it either scores a goal
1 G) or 1
(state 2
1 M ).3
2misses (state
3 2If it dribbles,
4 4
3 4
it 1
either 2
advances a3square or
4 loses the ball, ending up in M .
X X O X
1 2 3 4
X O A D Goal
O X O
1 2 3 4
X X O X
In this MDP, the states are 1, 2, 3, 4, G and M , where G and M are terminal states. The transition model
depends on the parameter y, which is the probability of dribbling success. Assume a discount of γ = 1.
1 2 3 4
T (k, S, G) = k6 T (k, S, M ) = 1 − k6 for k ∈ {1, 2, 3, 4}
T (k, D, k + 1) = y T (k, D, M ) = 1 − y for k ∈ {1, 2, 3}
R(k, S, G) = 1 for k ∈ {1, 2, 3, 4}, and rewards are 0 for all other transitions
(a) (2 pt) What is V π (1) for the policy π that always shoots?
V π (1) = T (1, S, G)R(1, S, G) + T (1, S, M )R(1, S, M ) = 61
(b) (2 pt) What is Q∗ (3, D) in terms of y?
Q∗ (3, S) ≥ Q∗ (3, D)
T (3, S, G) · 1 ≥ T (3, D, 4) · T (4, S, G) · 1
1 2
≥y·
2 3
3
≥y≥0
4
8
The dribble success probability y in fact depend on the presence or absence of a defending robot, D. A has
no way of detecting whether D is present, but does know some statistical properties of its environment.
D is present 23 of the time. When D is absent, y = 34 . When D is present, y = 14 .
(f ) (4 pt) What is the posterior probability that D is present, given that A Dribbles twice successfully from
1 to 3, then Shoots from state 3 and scores.
We can use Bayes’ rule, where D is a random variable denoting the presence of D, and e is the evidence
that A dribbled twice and scored.
P (e|d) · P (d)
P (d|e) =
P (e)
(g) (3 pt) What transition model should A use in order to correctly compute its maximum expected reward
when it doesn’t know whether or not D is present?
To maximize expected total reward, the agent should model the situation as accurately as possible.
We accepted two answers for T (k, D, k + 1). One answer is to claim that the agent should use its marginal
belief that each dribble will be successful, summing over the two possibilities of D’s presence or absense.
In this case, T (k, D, k + 1) = 23 · 14 + 13 · 34 = 12
5 7
, and T (k, D, M ) = 12 .
Another acceptable answer would be to update the beliefs of the agent after every successful dribble.
5
Hence, while T (1, D, 2) = 12 as above, reaching state 2 implies a successful previous dribble, and so
another successful dribble is more likely. In particular, let X1 , X2 and X3 indicate successful first, second
and third dribbles, while D is the presence of the defender. Since X1 ⊥⊥ X2 | D, we have P (x2 |x1 ) =
2 2 1 3 3 11
P
d P (d|x1 )P (x2 |d). Computing similarly to part (f), P (d|x1 ) = 5 , so T (2, D, 3) = 5 · 4 + 5 · 4 = 20 .
We can compute PT (3, D, 4) similarly. X3 is conditionally independent of both X21 and X2 given D, so
P (x3 |x1 , x2 ) = d P (d|x1 , x2 )P (x3 |d). Using our result from (f), P (d|x1 , x2 ) = 11 , and so T (3, D, 4) =
2 1 9 3 29
·
11 4 + ·
11 4 = 44 .
Since a dribble either succeeds or fails, T (k, D, M ) = 1 − T (k, D, k + 1) for all k. Shooting probabilities
are unchanged by the presence of the defender.
T (k, S, G) = k6 T (k, S, M ) = 1 − k
6 for k ∈ {1, 2, 3, 4}
5 7
T (k, D, k + 1) = 12 T (k, D, M ) = 12 for k ∈ {1, 2, 3}
OR
T (k, S, G) = k6 T (k, S, M ) = 1 − k
6 for k ∈ {1, 2, 3, 4}
5 11 29
T (1, D, 2) = 12 T (2, D, 3) = 20 T (3, D, 4) = 44
7 9 15
T (1, D, M ) = 12 T (2, D, M ) = 20 T (3, D, M ) = 44
NAME: 9
(h) (4 pt) What is the optimal policy π ∗ when A doesn’t know whether or not D is present?
5
Using T (k, D, k + 1) = 12 , we find that π ∗ = {1 : S, 2 : S, 3 : S, 4 : S}. We showed from part (e) that for
any y < 4 , shooting is preferable to dribbling from state 3. Therefore, we know that V ∗ (3) = Q∗ (3, S).
3
1
V ∗ (3) = Q∗ (3, S) =
2
π ∗ (3) = S
1
Q∗ (2, S) =
3
∗ 5 7 5
Q (2, D) = · V ∗ (3) + ·0=
12 12 24
1
V ∗ (2) = max(Q∗ (2, S), Q∗ (2, D)) =
3
π ∗ (2) = S
1
Q∗ (1, S) =
6
5 7 5
Q∗ (1, D) = · V ∗ (2) + ·0=
12 12 36
1
V ∗ (1) = max(Q∗ (1, S), Q∗ (1, D)) =
6
π ∗ (1) = S
Under the second answer for part (g), similar computations give π ∗ = {1 : S, 2 : S, 3 : S, 4 : S}.
2
V ∗ (4) =
3
∗ 1
Q (3, S) =
2
∗ 29 15 58
Q (3, D) = · V ∗ (4) + ·0= ≈ 0.44
44 44 132
1
V ∗ (3) = max(Q∗ (3, S), Q∗ (3, D)) =
2
π ∗ (3) = S
1
Q∗ (2, S) =
3
∗ 11 9 11
Q (2, D) = · V ∗ (3) + ·0= ≈ 0.28
20 20 40
1
V ∗ (2) = max(Q∗ (2, S), Q∗ (2, D)) =
3
π ∗ (2) = S
1
Q∗ (1, S) =
6
∗ 5 7 5
Q (1, D) = · V ∗ (2) + ·0=
12 12 36
∗ ∗ ∗ 1
V (1) = max(Q (1, S), Q (1, D)) =
6
π ∗ (1) = S
A B
10 D E!
Pacman doesn’t just eat dots. He also enjoys pears, apples, carrots, nuts, eggs and toast.AEach morning,
C heE!
chooses to eat some subset of these foods. His preferences are modeled by a Bayes’ net with this structure.
A A C E
P!
C
N
T!
E!
a) (2 pt) Factor the probability that Pacman chooses to eat only apples and nuts in terms of conditional
probabilities from this Bayes’ net.
P (¬p, a, ¬c, n, ¬e, ¬t) = P (a) · P (¬p|a) · P (¬c|a) · P (n|¬p, ¬c) · P (¬t|¬c) · P (¬e|n, ¬t)
b) (4 pt) For each of the following properties, circle whether they are true, false or unknown of the distribution
P (P, A, C, N, E, T ) for this Bayes’ net.
Without the conditional probability tables, we do not know if any two variables are dependent. From the
network topology, we can only conclude true or unknown for the questions below.
N⊥ ⊥T (true , false , unknown) unknown: T − C − N is an active path.
P ⊥⊥E |N (true , false , unknown) unknown: P − A − C − T − E is an active path.
P ⊥⊥ N | A, C (true , false , unknown) unknown: P and N are directly connected.
E⊥ ⊥ A | C, N (true , false , unknown) true: All three paths from E to A are inactive.
c) (2 pt) If the arrow from T to E were reversed, list two conditional independence properties that would be
true under the new graph that were not guaranteed under the original graph.
e) (4 pt) You now want to compute the distribution P (A) using variable elimination. List the factors that
remain before and after eliminating the variable N .
Before: The initial factors are just the conditional probability tables of the Bayes’ net.
P (A), P (P |A), P (C|A), P (T |C), P (N |P, C), P (E|T, N )
After: First, all factors that include the variable N are joined together, yielding P (E, N |P, C, T )
Next, the variable N is summed out of this new factor, yielding P (E|P, C, T )
The remaining factors include this new factor and the unused original factors.
P (A), P (P |A), P (C|A), P (T |C), P (E|P, C, T )
Referring to the final factor as m(E, P, C, T ) (like in the textbook) was also accepted.
f ) (2 pt) Pacman’s new diet allows only fruit (P and A) to be eaten, but Pacman only follows the diet
occasionally. Add the new variable D (for whether he follows the diet) to the network below by adding arcs.
Briefly justify your answer.
A A
P! P!
C C
D D
N N
T! T!
E! E!
The best answer is the one on the left, which allows us to express changes to all food preferences based on
the diet. Other answers were accepted with appropriate justification, such as the network on the right, which
specifies that the diet specifically disallows C, N, E and T. Connecting D only to P and/or A is problematic:
for instance, observing P and A would make D independent of C, but the diet should still affect whether
carrots (C) are allowed even when A and P are known.
g) (2 pt) Given the Bayes’ net and the factors below, fill in the table for P (D|¬a) or state that there is not
enough information.
From the tables P (D) and P (A|D), the entries of the factor P (D|A = ¬a) can be computed from Bayes’
rule (which always holds): no additional information about the properties of the distributions are required.
The trick is to observe that while P (¬a) is not explicitly given, it can be computed from P (D) and P (A|D)
using the product rule: ∀a,d : P (a, d) = P (d)P (a|d). Computing P (¬a) in this way is equivalent to applying
the normalization trick.
D A P (A|D)
D P (D) d a 0.8 D A P (D|A = ¬a)
P (¬a|d)·P (d) 0.2·0.4 4
d 0.4 d ¬a 0.2 d ¬a P (¬a|¬d)·P (¬d)+P (¬a|d)·P (d) = 0.5·0.6+0.2·0.4 = 19
¬d 0.6 ¬d a 0.5 P (¬a|¬d)·P (¬d) 0.5·0.6 15
¬d ¬a P (¬a|¬d)·P (¬d)+P (¬a|d)·P (d) = 0.5·0.6+0.2·0.4 = 19
¬d ¬a 0.5