ai4
ai4
ai4
Dr Sean Holden
How might an agent a t when the out omes of its a tions are not
known be ause an adversary is trying to hinder it ?
Say we have two players. Traditionally, they are alled Max and Min
for reasons that will be ome lear.
This is exa tly the same game format as hess, Go, draughts and so
on.
Perfe t de isions in a two-person game
Max to move
. . .
. . .
. . .
And so on...
This an be ontinued to represent all possibilities for the game.
Perfe t de isions in a two-person game
. . .
. . .
−1
+1
0
At the leaves a player has won or there are no spa es. Leaves are
labelled using the utility fun tion.
Perfe t de isions in a two-person game
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
There are two moves: Max then Min. Game theorists would all this
one move, or two ply deep.
The minimax algorithm allows us to infer the best move that the
urrent player an make, given the utility fun tion, by working ba k-
ward from the leaves.
2 6 1 4
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
As Min plays the last move, she minimises the utility available to
Max.
The minimax algorithm
2 6 1 4
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
In general:
Generate the omplete tree and label the leaves a ording to the
utility fun tion.
Working from the leaves of the tree upward, label the nodes de-
pending on whether Max or Min is to move.
If Min is to move label the urrent node with the minimum
utility of any des endant.
If Max is to move label the urrent node with the maximum
utility of any des endant.
We need to avoid sear hing all the way to the end of the tree. So:
The evaluation fun tion attempts to measure the expe ted utility of
the urrent game position.
Making imperfe t de isions
Let's say we want to design one for hess by giving ea h pie e its
material value: pawn = 1, knight/bishop = 3, rook = 5 and so
on.
De ne the evaluation of a position to be the di eren e between
the material value of bla k's and white's pie es
X X
eval(position) = value of pi − value of qi
bla k's pie es pi white's pie es qi
where the wi are weights and the fi represent features of the position.
In this example
fi = value of the ith pie e
wi = number of ith pie es on the board
where bla k and white pie es are regarded as di erent and the fi are
positive for one and negative for the other.
The evaluation fun tion
Evaluation fun tions of this type are very ommon in game playing.
There is no systemati method for their design.
Weights an be hosen by allowing the game to play itself and using
learning te hniques to adjust the weights to improve performan e.
By using more arefully rafted features we an give di erent eval-
uations to individual positions .
α − β pruning
Even with a good evaluation fun tion and ut-o test, the time om-
plexity of the minimax algorithm makes it impossible to write a good
hess program without some further improvement.
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
2 6 ≤1
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
Then we note: if Max plays move 3 then Min an rea h a leaf with
utility at most 1.
So: we don't need to sear h any further under Max's opening
move 3. This is be ause the sear h has already established that
Max an do better by making opening move 2.
α − β pruning in general
Tree
= Player
m′ n
Tree
= Player
m′ n
The sear h is depth- rst, so we're only ever looking at one path
through the tree .
We need to keep tra k of the values α and β where
highest utility seen so far on the path for Max
α = the
β = the lowest utility seen so far on the path for Min
Assume Max begins . Initial values for α and β are
α = −∞
and
β = +∞.
α − β pruning in general
max(alpha,beta,node)
{
if (node is at cut-off)
return evaluation(node);
else
{
for (each successor n’ of node)
{
alpha = maximum(alpha,min(alpha,beta,n’));
if (alpha >= beta)
return beta; // pruning happens here.
}
return alpha;
}
}
α − β pruning in general
min(alpha,beta,node)
{
if (node is at cut-off)
return evaluation(node);
else
{
for (each successor n’ of node)
{
beta = minimum(beta,max(alpha,beta,n’));
if (beta <= alpha)
return alpha; // pruning happens here.
}
return beta;
}
}
α − β pruning in general
Applying this to the earlier example and keeping tra k of the values
for α and β you should obtain:
α = −∞ = 2 = 6
Return 2 β = +∞
2 6 Return 6
Return 6
4 5 2 20 20 15 6 7 1
α = −∞ α=2 α=6
β = +∞ = 2 β = +∞ = 6 β = +∞ = 1
How e e tive is α − β pruning?