04 Games PDF
04 Games PDF
04 Games PDF
• Search – no adversary
– Solution is a path from start to goal, or a series of actions from start to goal
– Heuristics and search techniques can find optimal solution
– Evaluation function: estimate of cost from start to goal through given node
– Actions have cost
– Examples: path planning, scheduling activities
• Games – adversary
– Solution is strategy
• strategy specifies move for every possible opponent reply.
– Time limits force an approximate solution
– Evaluation function: evaluate “goodness” of game position
– Board configurations have utility
– Examples: chess, checkers, Othello, backgammon
Solving 2-player Games
Brute-force:
– 1. Generate the whole game tree to leaves
– 2. Apply utility (payoff) function to leaves
– 3. Back-up values from leaves toward the root:
• a Max node computes the max of its child values
• a Min node computes the min of its child values
– 4. When value reaches the root: choose max value and the
corresponding move.
Minimax:
Search the game-tree in a DFS manner to find the value of the root.
Game Trees
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Minimax maximizes the utility for the worst-case outcome for max
• Complete?
– Yes (if tree is finite).
• Optimal?
– Yes (against an optimal opponent).
– Can it be beaten by an opponent playing sub-optimally?
• No. (Why not?)
• Time complexity?
– O(bm)
• Space complexity?
– O(bm) (depth-first search, generate all actions at once)
– O(m) (backtracking search, generate actions one at a time)
Game Tree Size
• Tic-Tac-Toe
– b ≈ 5 legal actions per state on average, total of 9 plies in game.
• “ply” = one action by one player, “move” = two plies.
– 59 = 1,953,125
– 9! = 362,880 (Computer goes first)
– 8! = 40,320 (Computer goes second)
exact solution quite reasonable
• Chess
– b ≈ 35 (approximate average branching factor)
– d ≈ 100 (depth of game tree for “typical” game)
– bd ≈ 35100 ≈ 10154 nodes!!
exact solution completely infeasible
• An Evaluation Function:
– Estimates how good the current board configuration is for a player
– Typically, one figures how good it is for the player, and how good it
is for the opponent, and subtracts the opponents score from the
player
– Othello: Number of white pieces - Number of black pieces
– Chess: Value of all white pieces - Value of all black pieces
• Typical values from -infinity (loss) to +infinity (win) or [-1, +1].
• If the board evaluation is X for a player, it’s -X for the opponent
• Example:
– Evaluating chess boards
– Checkers
– Tic-tac-toe
Applying MiniMax to tic-tac-toe
• Idea:
– Do depth first search to generate partial game tree,
– Give static evaluation function to leaves,
– Compute bound on internal nodes.
• , bounds:
– value for max node means that max real value is at least .
– for min node means that min can guarantee a value no more than .
• Computation:
– Pass current / down to children when expanding a node
– Update (Max)/(Min) when node values are updated
• of MAX node is the max of children seen.
• of MIN node is the min of children seen.
Alpha-Beta Example
Do DF-search until first leaf
[-∞,+∞]
[-∞, +∞]
Alpha-Beta Example (continued)
[-∞,+∞]
[-∞,3]
Alpha-Beta Example (continued)
[-∞,+∞]
[-∞,3]
Alpha-Beta Example (continued)
[3,+∞]
[3,3]
Alpha-Beta Example (continued)
[3,+∞]
This node is
worse for MAX
[3,3] [-∞,2]
Alpha-Beta Example (continued)
[3,14]
[3,5]
[3,3]
[3,3]
Backup Values
Alpha-beta Algorithm
• Prune whenever ≥ .
– Prune below a Max node whose alpha value becomes greater than
or equal to the beta value of its ancestors.
• Max nodes update alpha based on children’s returned values.
– Prune below a Min node whose beta value becomes less than or
equal to the alpha value of its ancestors.
• Min nodes update beta based on children’s returned values.
Alpha-Beta Example Revisited
=−
=+
Alpha-Beta Example (continued)
=−
=+
=−
=3
MIN updates , based on children
Alpha-Beta Example (continued)
=−
=+
=−
=3
MIN updates , based on children.
No change.
Alpha-Beta Example (continued)
3 is returned
as node value.
Alpha-Beta Example (continued)
=3
=+
, , passed to children
=3
=+
Alpha-Beta Example (continued)
=3
=+
MIN updates ,
based on children.
=3
=2
Alpha-Beta Example (continued)
=3
=+
=3 ≥ ,
=2 so prune.
Alpha-Beta Example (continued)
=3
=+ ,
, , passed to children
=3
=+
Alpha-Beta Example (continued)
=3
=+ ,
MIN updates ,
based on children.
=3
=14
Alpha-Beta Example (continued)
=3
=+ ,
MIN updates ,
based on children.
=3
=5
Alpha-Beta Example (continued)
=3
2 is returned
=+ as node value.
2
Alpha-Beta Example (continued)
2
Alpha Beta Practical Implementation
• Idea:
– Do depth first search to generate partial game tree
– Cutoff test :
• Depth limit
• Iterative deepening
• Cutoff when no big changes (quiescent search)
– When cutoff, apply static evaluation function to leaves
– Compute bound on internal nodes
– Run - pruning using estimated values
– IMPORTANT : use node values of previous iteration to order
children during next iteration
Example
5 6
3 4 1 2 7 8
Answer to Example
Min
Max
5 6
3 4 1 2 7 8
Answer: NONE! Because the most favorable nodes for both are
explored last (i.e., in the diagram, are on the right-hand side).
Second Example
(the exact mirror image of the first example)
3 4
6 5 8 7 2 1
Answer to Second Example
(the exact mirror image of the first example)
Min
Max
3 4
6 5 8 7 2 1
Answer: LOTS! Because the most favorable nodes for both are
explored first (i.e., in the diagram, are on the left-hand side).
Effectiveness of Alpha-Beta Search
• Worst-Case
– Branches are ordered so that no pruning takes place. In this case alpha-beta
gives no improvement over exhaustive search
• Best-Case
– Each player’s best move is the left-most alternative (i.e., evaluated first)
– In practice, performance is closer to best rather than worst-case
• E.g., sort moves by the remembered move values found last time.
• E.g., expand captures first, then threats, then forward moves, etc.
• E.g., run Iterative Deepening search, sort by value last iteration.
• Multiplayer games often involve alliances: If A and B are in a weak position they can
collaborate and act against C
• If games are not zero-sum, collaboration can also occur in two-game plays: if (1000,1000_
Is a best payoff for both, then they will cooperate towards getting there and not towards minimax value.
In real life there are
many unpredictable
external events
MAX
a1 a2 a1 a2
.9 .1 .9 .1 .9 .1 .9 .1
MIN 2 3 1 4 20 30 1 400
2 2 3 3 1 1 4 4 20 20 30 30 1 1 400 400