0% found this document useful (0 votes)
52 views36 pages

Unit 2c Game Playing (Compatibility Mode)

The document discusses games and adversarial search as a domain for exploring machine intelligence. It provides two key reasons why games were initially seen as a good domain: 1) success or failure can be easily measured, and 2) they did not require large amounts of knowledge and were thought to be solvable through search. However, the document notes that for all but the simplest games, large amounts of knowledge and sophisticated search techniques are actually required due to huge branching factors and search spaces. It introduces the concept of heuristic functions, minimax search, alpha-beta pruning, and discusses how these techniques allow computers to evaluate positions and select optimal moves in games like chess despite immense search spaces.

Uploaded by

Madhav Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views36 pages

Unit 2c Game Playing (Compatibility Mode)

The document discusses games and adversarial search as a domain for exploring machine intelligence. It provides two key reasons why games were initially seen as a good domain: 1) success or failure can be easily measured, and 2) they did not require large amounts of knowledge and were thought to be solvable through search. However, the document notes that for all but the simplest games, large amounts of knowledge and sophisticated search techniques are actually required due to huge branching factors and search spaces. It introduces the concept of heuristic functions, minimax search, alpha-beta pruning, and discusses how these techniques allow computers to evaluate positions and select optimal moves in games like chess despite immense search spaces.

Uploaded by

Madhav Chaudhary
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Games and adversarial search

World Champion chess player Garry


Kasparov is defeated by IBM’s Deep
Blue chess-playing computer in a © Telegraph Group Unlimited 1997
six-game match in May, 1997
(link)
There were two reasons that games appeared to be a
good domain in which to explore machine intelligence:
1. They provide a structured task in which it is very
easy to measure success or failure.
2. They did not obviously require large amounts of
knowledge. They were thought to be solvable by
straightforward search from the starting state to a
winning position.

The first of these reasons remains valid and accounts for


continued interest in the area of game playing by machine.
The second is not true for any but the
simplest games.

For example, consider chess.


• The average branching factor is around
35.
• In an average game, each player might
make 50 moves.
• So 3550 positions to examine
Games vs. single-agent search
• We don’t know how the opponent will act
– The solution is not a fixed sequence of actions from
start state to goal state, but a strategy or policy
(a mapping from state to best move in that state)
• Efficiency is critical to playing well
– The time to make a move is limited
– The branching factor, search depth, and number of
terminal configurations are huge
• In chess, branching factor ≈ 35 and depth ≈ 100, giving
a search tree of 10154 nodes
– Number of atoms in the observable universe ≈ 1080

– This rules out searching all the way to the end of the game
Types of game environments
Deterministic Stochastic

Perfect information
Chess, checkers, go Backgammon,
(fully observable) monopoly

Imperfect information
Battleships Scrabble, poker,
(partially observable) bridge
Game tree
• A game of tic-tac-toe between two players, “max” and “min”
http://xkcd.com/832/
http://xkcd.com/832/
Heuristic function
For chess: A very simple static evaluation function
based on piece advantage was proposed by Turing
for chess—simply add the values of black's pieces
(B), the values of white's pieces (W), and then
compute the quotient W/B.
• A more sophisticated approach was that taken in
Samuel's checkers program, in which the static
evaluation function was a linear combination of several
simple functions, each of which appeared as though it
might be significant.
• Samuel's functions included, in addition to the obvious
one, piece advantage, such things as capability for
advancement, control of the center, threat of a fork, and
mobility.
These factors were then combined by
attaching to each an appropriate weight
and then adding the terms together. The
complete evaluation function had the
form:

c1* pieceadvantage + c2* advancement +


c3* centercontrol ...
THE MINIMAX SEARCH
PROCEDURE
• The minimax search procedure is a depth-first,
depth-limited search procedure.
• The idea is to start at the current position and
use the plausible-move generator to generate
the set of possible successor positions.
• Apply the static evaluation function to those positions
and simply choose the best one.
• After doing so, we could back that value up to the
starting position to represent our evaluation of it.
• The starting position is exactly as good for us as the position
generated by the best move we can make next.
• Here we assume that the static evaluation function returns large
values to indicate good situations for us, so our goal is to
maximize the value of the static evaluation function of the next
board position.
This example assumes a static evaluation function that returns
values ranging from -10 to 10, with 10 indicating a win for us,
-10 a win for the opponent, and 0 an even match.
• At the level representing the opponent's choice, the
minimum value was chosen and backed up. At the level
representing our choice, the maximum value was
chosen.
• Once the values from the second ply are backed up, it becomes
clear that the correct move for us to make at the first level,
given the information we have available, is C, since there is
nothing the opponent can do from there to produce a value
worse than -2.
• This process can be repeated for as many ply as time allows,
and the more accurate evaluations that are produced can be
used to choose the correct move at the top level.
• The alternation of maximizing and minimizing at alternate ply
when evaluations are being pushed back up corresponds to the
opposing strategies of the two players and gives this method
the name minimax.
A more abstract game tree

Terminal utilities (for MAX)

A two-ply game
Game tree search
3

3 2 2

• Minimax value of a node: the utility (for MAX) of being in


the corresponding state, assuming perfect play on both sides
• Minimax strategy: Choose the move that gives the best
worst-case payoff
Computing the minimax value of a node
3

3 2 2

• Minimax(node) =
 Utility(node) if node is terminal
 maxaction Minimax(Succ(node, action)) if player = MAX
 minaction Minimax(Succ(node, action)) if player = MIN
Optimality of minimax

• The minimax strategy is optimal


against an optimal opponent
• What if your opponent is
suboptimal?
– Your utility can only be higher than if
you were playing an optimal opponent!
– A different strategy may work better for
a sub-optimal opponent, but it will
necessarily be worse against an 11
optimal opponent

Example from D. Klein and P. Abbeel


More general games
4,3,2

4,3,2 1,5,2

4,3,2 7,4,1 1,5,2 7,7,1

• More than two players, non-zero-sum


• Utilities are now tuples
• Each player maximizes their own utility at their node
• Utilities get propagated (backed up) from children to parents
ADDING ALPHA-BETA CUTOFFS
The minimax procedure is a depth-first process.

One path is explored as far as time allows, the static evaluation


function is applied to the game positions at the last step of the
path, and the value can then be passed up the path one level at
a time.
One of the good things about depth-first procedures is that their
efficiency can often be improved by using branch-and-bound
techniques in which partial solutions that are clearly worse
than known solutions can be abandoned early.
It requires the maintenance of two threshold values, one
representing a lower bound on the value that a maximizing
node may ultimately be assigned (we call this alpha) and
another representing an upper bound on the value that a
minimizing node may be assigned (this we call beta).
This modified strategy is called alpha-beta pruning.
• At maximizing levels, we can rule out a move early if it
becomes clear that its value will be less than the current
threshold, while at minimizing levels, search will be
terminated if values that are greater than the current
threshold are discovered.

• search at a minimizing level can be terminated when a


value less than alpha is discovered, while a search at a
maximizing level can be terminated when a value greater
than beta has been found.

http://web.cs.ucla.edu/~rosen/161/notes/alphabeta.html
http://people.cs.pitt.edu/~litman/courses/cs2710/lectures/pruningReview.pdf
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree

3

3
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree

3

3 2
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree

3

3 2 14
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree

3

3 2 5
Alpha-beta pruning
• It is possible to compute the exact minimax decision
without expanding every node in the game tree

3 2 2
Alpha-beta pruning

• α is the value of the best choice


for the MAX player found so far
at any choice point above node n
• We want to compute the
MIN-value at n
• As we loop over n’s children,
the MIN-value decreases
• If it drops below α, MAX will never
choose n, so we can ignore n’s
remaining children
• Analogously, β is the value of the
lowest-utility choice found so far
for the MIN player
Alpha-beta pruning
Function action = Alpha-Beta-Search(node)
node
v = Min-Value(node, −∞, ∞)
return the action from node with value v

α: best alternative available to the Max player action


β: best alternative available to the Min player …
Function v = Min-Value(node, α, β)
if Terminal(node) return Utility(node) Succ(node, action)
v = +∞
for each action from node
v = Min(v, Max-Value(Succ(node, action), α, β))
if v ≤ α return v
β = Min(β, v)
end for
return v
Alpha-beta pruning
Function action = Alpha-Beta-Search(node)
node
v = Max-Value(node, −∞, ∞)
return the action from node with value v

α: best alternative available to the Max player action


β: best alternative available to the Min player …
Function v = Max-Value(node, α, β)
if Terminal(node) return Utility(node) Succ(node, action)
v = −∞
for each action from node
v = Max(v, Min-Value(Succ(node, action), α, β))
if v ≥ β return v
α = Max(α, v)
end for
return v
Alpha-beta pruning
• Pruning does not affect final result
• Amount of pruning depends on move ordering
– Should start with the “best” moves (highest-value for
MAX or lowest-value for MIN)
– For chess, can try captures first, then threats, then
forward moves, then backward moves
– Can also try to remember “killer moves” from other
branches of the tree
• With perfect ordering, the time to find the best
move is reduced to O(bm/2) from O(bm)
– Depth of search is effectively doubled
Evaluation function
• Cut off search at a certain depth and compute the value of an
evaluation function for a state instead of its minimax value
– The evaluation function may be thought of as the probability of winning
from a given state or the expected value of that state
• A common evaluation function is a weighted sum of features:
Eval(s) = w1 f1(s) + w2 f2(s) + … + wn fn(s)
– For chess, wk may be the material value of a piece (pawn = 1,
knight = 3, rook = 5, queen = 9) and fk(s) may be the advantage in
terms of that piece
• Evaluation functions may be learned from game databases or
by having the program play many games against itself
Cutting off search
• Horizon effect: you may incorrectly estimate the
value of a state by overlooking an event that is
just beyond the depth limit
– For example, a damaging move by the opponent that
can be delayed but not avoided
• Possible remedies
– Quiescence search: do not cut off search at
positions that are unstable – for example, are you
about to lose an important piece?
– Singular extension: a strong move that should be
tried when the normal depth limit is reached
Advanced techniques
• Transposition table to store previously expanded
states
• Forward pruning to avoid considering all possible
moves
• Lookup tables for opening moves and endgames
Chess playing systems
• Baseline system: 200 million node evalutions per move
(3 min), minimax with a decent evaluation function and
quiescence search
– 5-ply ≈ human novice
• Add alpha-beta pruning
– 10-ply ≈ typical PC, experienced player
• Deep Blue: 30 billion evaluations per move, singular
extensions, evaluation function with 8000 features,
large databases of opening and endgame moves
– 14-ply ≈ Garry Kasparov
• More recent state of the art (Hydra, ca. 2006): 36 billion
evaluations per second, advanced pruning techniques
– 18-ply ≈ better than any human alive?

You might also like