ai4

Arti ial Intelligen e I
Dr Sean Holden
Notes on games (adversarial sear h)
Copyright Sean Holden 2002-2010.

Solving problems by sear h: playing games
How might an agent a t when the out omes of its a tions are not
known be ause an adversary is trying to hinder it ?
This is essentially a more realisti kind of sear h problem be ause

we do not know the exa t out ome of an a tion.
This is a ommon situation when playing games : in hess, draughts,
and so on an opponent responds to our moves.
We don't know what their response will be, and so the out ome
of our moves is not lear.
Game playing has been of interest in AI be ause it provides an ide-

alisation of a world in whi h two agents a t to redu e ea h other's
well-being.
Playing games: sear h against an adversary
Despite the fa t that games are an idealisation, game playing an be

an ex ellent sour e of hard problems. For instan e with hess:
The average bran hing fa tor is roughly 35.

Games an rea h 50 moves per player.
So a rough al ulation gives the sear h tree 35100 nodes.
Even if only di erent, legal positions are onsidered it's about
1040.
So: in addition to the un ertainty due to the opponent:

We an't make a omplete sear h to nd the best move...
... so we have to a t even though we're not sure about the best
thing to do.
And hess isn't even very hard:
Go is mu h harder than hess.

The bran hing fa tor is about 360.
Until very re ently it has resisted all attempts to produ e a good AI

player.
See:
senseis.xmp.net/?MoGo
and others.
It seems that games are a step loser to the omplexities inherent in

the world around us than are the standard sear h problems onsidered
so far.
The study of games has led to some of the most elebrated appli a-
tions and te hniques in AI.
We now look at:
How game-playing an be modelled as sear h .

The minimax algorithm for game-playing.
Some problems inherent in the use of minimax.
The on ept of α − β pruning .
Reading: Russell and Norvig hapter 6.

Perfe t de isions in a two-person game
Say we have two players. Traditionally, they are alled Max and Min
for reasons that will be ome lear.
We'll use noughts and rosses as an initial example.

Max moves rst.
The players alternate until the game ends.
At the end of the game, prizes are awarded. (Or punishments
administered|EVIL ROBOT is starting up his favourite hain-
saw...)
This is exa tly the same game format as hess, Go, draughts and so
on.
Games like this an be modelled as sear h problems as follows:
There is an initial state .
Max to move
There is a set of operators . Here, Max an pla e a ross in any

empty square, or Min a nought.
There is a terminal test . Here, the game ends when three noughts
or three rosses are in a row, or there are no unused spa es.
There is a utility or payo fun tion. This tells us, numeri ally,
what the out ome of the game is.
This is enough to model the entire game.

We an onstru t a tree to represent a game. From the initial state

Max an make nine possible moves:
. . .
Then it's Min's turn...

For ea h of Max's opening moves Min has eight replies:
. . .
. . .
And so on...
This an be ontinued to represent all possibilities for the game.
. . .
. . .
−1
+1
0
At the leaves a player has won or there are no spa es. Leaves are
labelled using the utility fun tion.
How an Max use this tree to de ide on a move? Consider a mu h

simpler tree:
Labels on the leaves denote utility.
High values are preferred by Max.
Low values are preferred by Min.
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
If Max is rational he will play to rea h a position with the biggest

utility possible
But if Min is rational she will play to minimise the utility available
to Max.
The minimax algorithm
There are two moves: Max then Min. Game theorists would all this
one move, or two ply deep.
The minimax algorithm allows us to infer the best move that the
urrent player an make, given the utility fun tion, by working ba k-
ward from the leaves.
2 6 1 4
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
As Min plays the last move, she minimises the utility available to
Max.
Min takes the nal move:
If Min is in game position 1, her best hoi e is move 3. So from

Max's point of view this node has a utility of 2.
Moving one further step up the tree:
2 6 1 4
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
We an see that Max's best opening move is move 2, as this leads to

the node with highest utility.
In general:
Generate the omplete tree and label the leaves a ording to the
utility fun tion.
Working from the leaves of the tree upward, label the nodes de-
pending on whether Max or Min is to move.
If Min is to move label the urrent node with the minimum
utility of any des endant.
If Max is to move label the urrent node with the maximum
utility of any des endant.
If the game is p ply and at ea h point there are q available moves

then this pro ess has (surprise, surprise) O(qp) time omplexity and
spa e omplexity linear in p and q.
Making imperfe t de isions
We need to avoid sear hing all the way to the end of the tree. So:
We generate only part of the tree: instead of testing whether a

node is a leaf we introdu e a ut-o test telling us when to stop.
Instead of a utility fun tion we introdu e an evaluation fun tion
for the evaluation of positions for an in omplete game.
The evaluation fun tion attempts to measure the expe ted utility of
the urrent game position.
Making imperfe t de isions
How an this be justi ed?
This is a strategy that humans learly sometimes make use of.

For example, when using the on ept of material value in hess.
The e e tiveness of the evaluation fun tion is riti al ...
... but it must be omputable in a reasonable time.
(In prin iple it ould just be done using minimax.)
The importan e of the evaluation fun tion an not be understated|it

is probably the most important part of the design.
The evaluation fun tion
Designing a good evaluation fun tion an be extremely tri ky:
Let's say we want to design one for hess by giving ea h pie e its
material value: pawn = 1, knight/bishop = 3, rook = 5 and so
on.
De ne the evaluation of a position to be the di eren e between
the material value of bla k's and white's pie es
X X
eval(position) = value of pi − value of qi
bla k's pie es pi white's pie es qi
This seems like a reasonable rst attempt. Why might it go wrong?

Consider what happens at the start of a game:
Until the rst apture the evaluation fun tion gives 0, so in fa t

we have a ategory ontaining many di erent game positions with
equal estimated utility.
For example, all positions where white is one pawn ahead.
The evaluation fun tion for su h a ategory should perhaps rep-
resent the probability that a position hosen at random from it
leads to a win.
So in fa t this seems highly naive...

Ideally, we should onsider individual positions .

If on the basis of past experien e a position has 50% han e of win-
ning, 10% han e of losing and 40% han e of rea hing a draw, we
might give it an evaluation of
eval(position) = (0.5 × 1) + (0.1 × −1) + (0.4 × 0) = 0.4.
Extending this to the evaluation of ategories, we should then weight
the positions in the ategory a ording to their likelihood of o ur-
ring.
Of ourse, we don't know what any of these likelihoods are...
Using material value an be thought of as giving us a weighted linear

evaluation fun tion
n
X
eval(position) = wifi
i=1
where the wi are weights and the fi represent features of the position.
In this example
fi = value of the ith pie e
wi = number of ith pie es on the board
where bla k and white pie es are regarded as di erent and the fi are
positive for one and negative for the other.
Evaluation fun tions of this type are very ommon in game playing.
There is no systemati method for their design.
Weights an be hosen by allowing the game to play itself and using
learning te hniques to adjust the weights to improve performan e.
By using more arefully rafted features we an give di erent eval-
uations to individual positions .
α − β pruning
Even with a good evaluation fun tion and ut-o test, the time om-
plexity of the minimax algorithm makes it impossible to write a good
hess program without some further improvement.
Assuming we have 150 se onds to make ea h move, for hess we

would be limited to a sear h of about 3 to 4 ply whereas...
...even an average human player an manage 6 to 8.
Lu kily, it is possible to prune the sear h tree without a e ting the

out ome and without having to examine all of it .
α − β pruning
Returning for a moment to the earlier, simpli ed example:
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
The sear h is depth- rst and left to right.

α − β pruning
The sear h ontinues as previously for the rst 8 leaves.
2 6 ≤1
4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4
Then we note: if Max plays move 3 then Min an rea h a leaf with
utility at most 1.
So: we don't need to sear h any further under Max's opening
move 3. This is be ause the sear h has already established that
Max an do better by making opening move 2.
α − β pruning in general
Tree
= Player
If n<m or n < m′ here

= Opponent
then this node will never be rea hed.
m′ n
So: on e you've established that n is suÆ iently small, you don't

need to explore any more of the orresponding node's hildren.
Tree
= Player
If n>m or n > m′ here

= Opponent
then this node will never be rea hed.
m′ n
So: on e you've established that n is suÆ iently large, you don't

need to explore any more of the orresponding node's hildren.
The sear h is depth- rst, so we're only ever looking at one path
through the tree .
We need to keep tra k of the values α and β where
highest utility seen so far on the path for Max
α = the
β = the lowest utility seen so far on the path for Min
Assume Max begins . Initial values for α and β are
α = −∞
and
β = +∞.
So: we start with the fun tion all

max(−∞, +∞, root)
where max is the fun tion
max(alpha,beta,node)
{
if (node is at cut-off)
return evaluation(node);
else
{
for (each successor n’ of node)
{
alpha = maximum(alpha,min(alpha,beta,n’));
if (alpha >= beta)
return beta; // pruning happens here.
}
return alpha;
}
}
The fun tion min is
min(alpha,beta,node)
{
if (node is at cut-off)
return evaluation(node);
else
{
for (each successor n’ of node)
{
beta = minimum(beta,max(alpha,beta,n’));
if (beta <= alpha)
return alpha; // pruning happens here.
}
return beta;
}
}
Applying this to the earlier example and keeping tra k of the values
for α and β you should obtain:
α = −∞ = 2 = 6
Return 2 β = +∞
2 6 Return 6
Return 6
4 5 2 20 20 15 6 7 1
α = −∞ α=2 α=6
β = +∞ = 2 β = +∞ = 6 β = +∞ = 1
How e e tive is α − β pruning?
(Warning: the theoreti al results that follow are somewhat idealised.)

A qui k inspe tion should onvin e you that the order in whi h
moves are arranged in the tree is riti al.
So, it seems sensible to try good moves rst:
If you were to have a perfe t move-ordering te hnique then α − β

pruning would be O(qp/2) as opposed to O(qp).
√
so the bran hing fa tor would e e tively be q instead of q.
We would therefore expe t to be able to sear h ahead twi e as
many moves as before .
However, this is not realisti : if you had su h an ordering te hnique

you'd be able to play perfe t games!
How e e tive is α − β pruning?
If moves are arranged at random then α − β pruning is:
O((q/ log q)p) asymptoti ally when q > 1000 or...

...about O(q3p/4) for reasonable values of q.
In pra ti e simple ordering te hniques an get lose to the best ase.

For example, if we try aptures, then threats, then moves forward
et .
Alternatively, we an implement an iterative deepening approa h and
use the order obtained at one iteration to drive the next.
A further optimisation: the transposition table
Finally, note that many games orrespond to graphs rather than

trees be ause the same state an be arrived at in di erent ways.
This is essentially the same e e t we saw in heuristi sear h: re all

graph sear h versus tree sear h .
It an be addressed in a similar way: store a state with its evalua-
tion in a hash table|generally alled a transposition table |the
34
rst time it is seen.
The transposition table is essentially equivalent to the losed list

introdu ed as part of graph sear h.
This an vastly in rease the e e tiveness of the sear h pro ess, be-
ause we don't have to evaluate a single state multiple times.

ai4

Uploaded by

Copyright:

Available Formats

ai4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ai4

Uploaded by

Copyright:

Available Formats

Arti ial Intelligen e I

Notes on games (adversarial sear h)

Copyright Sean Holden 2002-2010.

 This is essentially a more realisti kind of sear h problem be ause

Game playing has been of interest in AI be ause it provides an ide-

Despite the fa t that games are an idealisation, game playing an be

 The average bran hing fa tor is roughly 35.

So: in addition to the un ertainty due to the opponent:

And hess isn't even very hard:

 Go is mu h harder than hess.

Until very re ently it has resisted all attempts to produ e a good AI

It seems that games are a step loser to the omplexities inherent in

 How game-playing an be modelled as sear h .

Reading: Russell and Norvig hapter 6.

 We'll use noughts and rosses as an initial example.

Games like this an be modelled as sear h problems as follows:

 There is an initial state .

 There is a set of operators . Here, Max an pla e a ross in any

This is enough to model the entire game.

We an onstru t a tree to represent a game. From the initial state

Then it's Min's turn...

For ea h of Max's opening moves Min has eight replies:

How an Max use this tree to de ide on a move? Consider a mu h

Labels on the leaves denote utility.

High values are preferred by Max.

Low values are preferred by Min.

If Max is rational he will play to rea h a position with the biggest

Min takes the nal move:

 If Min is in game position 1, her best hoi e is move 3. So from

Moving one further step up the tree:

We an see that Max's best opening move is move 2, as this leads to

If the game is p ply and at ea h point there are q available moves

 We generate only part of the tree: instead of testing whether a

How an this be justi ed?

 This is a strategy that humans learly sometimes make use of.

The importan e of the evaluation fun tion an not be understated|it

Designing a good evaluation fun tion an be extremely tri ky:

This seems like a reasonable rst attempt. Why might it go wrong?

Consider what happens at the start of a game:

 Until the rst apture the evaluation fun tion gives 0, so in fa t

So in fa t this seems highly naive...

Ideally, we should onsider individual positions .

Using material value an be thought of as giving us a weighted linear

 Assuming we have 150 se onds to make ea h move, for hess we

Lu kily, it is possible to prune the sear h tree without a e ting the

Returning for a moment to the earlier, simpli ed example:

The sear h is depth- rst and left to right.

The sear h ontinues as previously for the rst 8 leaves.

If n<m or n < m′ here

So: on e you've established that n is suÆ iently small, you don't

If n>m or n > m′ here

So: on e you've established that n is suÆ iently large, you don't

So: we start with the fun tion all

The fun tion min is

(Warning: the theoreti al results that follow are somewhat idealised.)

 If you were to have a perfe t move-ordering te hnique then α − β

However, this is not realisti : if you had su h an ordering te hnique

If moves are arranged at random then α − β pruning is:

 O((q/ log q)p) asymptoti ally when q > 1000 or...

In pra ti e simple ordering te hniques an get lose to the best ase.

This is essentially a more realisti kind of sear h problem be ause

The average bran hing fa tor is roughly 35.

Go is mu h harder than hess.

How game-playing an be modelled as sear h .

We'll use noughts and rosses as an initial example.

There is an initial state .

There is a set of operators . Here, Max an pla e a ross in any

If Min is in game position 1, her best hoi e is move 3. So from

We generate only part of the tree: instead of testing whether a

This is a strategy that humans learly sometimes make use of.

Until the rst apture the evaluation fun tion gives 0, so in fa t

Assuming we have 150 se onds to make ea h move, for hess we

If you were to have a perfe t move-ordering te hnique then α − β

O((q/ log q)p) asymptoti ally when q > 1000 or...

This is essentially the same e e t we saw in heuristi sear h: re all