ai4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Arti ial Intelligen e I

Dr Sean Holden

Notes on games (adversarial sear h)

Copyright Sean Holden 2002-2010.


Solving problems by sear h: playing games

How might an agent a t when the out omes of its a tions are not
known be ause an adversary is trying to hinder it ?

 This is essentially a more realisti kind of sear h problem be ause


we do not know the exa t out ome of an a tion.
 This is a ommon situation when playing games : in hess, draughts,
and so on an opponent responds to our moves.
 We don't know what their response will be, and so the out ome
of our moves is not lear.

Game playing has been of interest in AI be ause it provides an ide-


alisation of a world in whi h two agents a t to redu e ea h other's
well-being.
Playing games: sear h against an adversary

Despite the fa t that games are an idealisation, game playing an be


an ex ellent sour e of hard problems. For instan e with hess:

 The average bran hing fa tor is roughly 35.


 Games an rea h 50 moves per player.
 So a rough al ulation gives the sear h tree 35100 nodes.
 Even if only di erent, legal positions are onsidered it's about
1040.

So: in addition to the un ertainty due to the opponent:


 We an't make a omplete sear h to nd the best move...
 ... so we have to a t even though we're not sure about the best
thing to do.
Playing games: sear h against an adversary

And hess isn't even very hard:

 Go is mu h harder than hess.


 The bran hing fa tor is about 360.

Until very re ently it has resisted all attempts to produ e a good AI


player.
See:
senseis.xmp.net/?MoGo
and others.
Playing games: sear h against an adversary

It seems that games are a step loser to the omplexities inherent in


the world around us than are the standard sear h problems onsidered
so far.
The study of games has led to some of the most elebrated appli a-
tions and te hniques in AI.
We now look at:

 How game-playing an be modelled as sear h .


 The minimax algorithm for game-playing.
 Some problems inherent in the use of minimax.
 The on ept of α − β pruning .

Reading: Russell and Norvig hapter 6.


Perfe t de isions in a two-person game

Say we have two players. Traditionally, they are alled Max and Min
for reasons that will be ome lear.

 We'll use noughts and rosses as an initial example.


 Max moves rst.
 The players alternate until the game ends.
 At the end of the game, prizes are awarded. (Or punishments
administered|EVIL ROBOT is starting up his favourite hain-
saw...)

This is exa tly the same game format as hess, Go, draughts and so
on.
Perfe t de isions in a two-person game

Games like this an be modelled as sear h problems as follows:

 There is an initial state .

Max to move

 There is a set of operators . Here, Max an pla e a ross in any


empty square, or Min a nought.
 There is a terminal test . Here, the game ends when three noughts
or three rosses are in a row, or there are no unused spa es.
 There is a utility or payo fun tion. This tells us, numeri ally,
what the out ome of the game is.

This is enough to model the entire game.


Perfe t de isions in a two-person game

We an onstru t a tree to represent a game. From the initial state


Max an make nine possible moves:

. . .

Then it's Min's turn...


Perfe t de isions in a two-person game

For ea h of Max's opening moves Min has eight replies:

. . .

. . .

And so on...
This an be ontinued to represent all possibilities for the game.
Perfe t de isions in a two-person game

. . .

. . .

−1

+1
0

At the leaves a player has won or there are no spa es. Leaves are
labelled using the utility fun tion.
Perfe t de isions in a two-person game

How an Max use this tree to de ide on a move? Consider a mu h


simpler tree:

Labels on the leaves denote utility.

High values are preferred by Max.

Low values are preferred by Min.

4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4

If Max is rational he will play to rea h a position with the biggest


utility possible
But if Min is rational she will play to minimise the utility available
to Max.
The minimax algorithm

There are two moves: Max then Min. Game theorists would all this
one move, or two ply deep.
The minimax algorithm allows us to infer the best move that the
urrent player an make, given the utility fun tion, by working ba k-
ward from the leaves.

2 6 1 4

4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4

As Min plays the last move, she minimises the utility available to
Max.
The minimax algorithm

Min takes the nal move:

 If Min is in game position 1, her best hoi e is move 3. So from


Max's point of view this node has a utility of 2.
 If Min is in game position 2, her best hoi e is move 3. So from
Max's point of view this node has a utility of 6.
 If Min is in game position 3, her best hoi e is move 1. So from
Max's point of view this node has a utility of 1.
 If Min is in game position 4, her best hoi e is move 4. So from
Max's point of view this node has a utility of 4.
The minimax algorithm

Moving one further step up the tree:

2 6 1 4

4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4

We an see that Max's best opening move is move 2, as this leads to


the node with highest utility.
The minimax algorithm

In general:

 Generate the omplete tree and label the leaves a ording to the
utility fun tion.
 Working from the leaves of the tree upward, label the nodes de-
pending on whether Max or Min is to move.
 If Min is to move label the urrent node with the minimum
utility of any des endant.
 If Max is to move label the urrent node with the maximum
utility of any des endant.

If the game is p ply and at ea h point there are q available moves


then this pro ess has (surprise, surprise) O(qp) time omplexity and
spa e omplexity linear in p and q.
Making imperfe t de isions

We need to avoid sear hing all the way to the end of the tree. So:

 We generate only part of the tree: instead of testing whether a


node is a leaf we introdu e a ut-o test telling us when to stop.
 Instead of a utility fun tion we introdu e an evaluation fun tion
for the evaluation of positions for an in omplete game.

The evaluation fun tion attempts to measure the expe ted utility of
the urrent game position.
Making imperfe t de isions

How an this be justi ed?

 This is a strategy that humans learly sometimes make use of.


 For example, when using the on ept of material value in hess.
 The e e tiveness of the evaluation fun tion is riti al ...
 ... but it must be omputable in a reasonable time.
 (In prin iple it ould just be done using minimax.)

The importan e of the evaluation fun tion an not be understated|it


is probably the most important part of the design.
The evaluation fun tion

Designing a good evaluation fun tion an be extremely tri ky:

 Let's say we want to design one for hess by giving ea h pie e its
material value: pawn = 1, knight/bishop = 3, rook = 5 and so
on.
 De ne the evaluation of a position to be the di eren e between
the material value of bla k's and white's pie es
X X
eval(position) = value of pi − value of qi
bla k's pie es pi white's pie es qi

This seems like a reasonable rst attempt. Why might it go wrong?


The evaluation fun tion

Consider what happens at the start of a game:

 Until the rst apture the evaluation fun tion gives 0, so in fa t


we have a ategory ontaining many di erent game positions with
equal estimated utility.
 For example, all positions where white is one pawn ahead.
 The evaluation fun tion for su h a ategory should perhaps rep-
resent the probability that a position hosen at random from it
leads to a win.

So in fa t this seems highly naive...


The evaluation fun tion

Ideally, we should onsider individual positions .


If on the basis of past experien e a position has 50% han e of win-
ning, 10% han e of losing and 40% han e of rea hing a draw, we
might give it an evaluation of
eval(position) = (0.5 × 1) + (0.1 × −1) + (0.4 × 0) = 0.4.
Extending this to the evaluation of ategories, we should then weight
the positions in the ategory a ording to their likelihood of o ur-
ring.
Of ourse, we don't know what any of these likelihoods are...
The evaluation fun tion

Using material value an be thought of as giving us a weighted linear


evaluation fun tion
n
X
eval(position) = wifi
i=1

where the wi are weights and the fi represent features of the position.
In this example
fi = value of the ith pie e
wi = number of ith pie es on the board
where bla k and white pie es are regarded as di erent and the fi are
positive for one and negative for the other.
The evaluation fun tion

Evaluation fun tions of this type are very ommon in game playing.
There is no systemati method for their design.
Weights an be hosen by allowing the game to play itself and using
learning te hniques to adjust the weights to improve performan e.
By using more arefully rafted features we an give di erent eval-
uations to individual positions .
α − β pruning

Even with a good evaluation fun tion and ut-o test, the time om-
plexity of the minimax algorithm makes it impossible to write a good
hess program without some further improvement.

 Assuming we have 150 se onds to make ea h move, for hess we


would be limited to a sear h of about 3 to 4 ply whereas...
 ...even an average human player an manage 6 to 8.

Lu kily, it is possible to prune the sear h tree without a e ting the


out ome and without having to examine all of it .
α − β pruning

Returning for a moment to the earlier, simpli ed example:

4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4

The sear h is depth- rst and left to right.


α − β pruning

The sear h ontinues as previously for the rst 8 leaves.

2 6 ≤1

4 5 2 20 20 15 6 7 1 4 10 9 5 8 5 4

Then we note: if Max plays move 3 then Min an rea h a leaf with
utility at most 1.
So: we don't need to sear h any further under Max's opening
move 3. This is be ause the sear h has already established that
Max an do better by making opening move 2.
α − β pruning in general

Tree

= Player

If n<m or n < m′ here


= Opponent
then this node will never be rea hed.

m′ n

So: on e you've established that n is suÆ iently small, you don't


need to explore any more of the orresponding node's hildren.
α − β pruning in general

Tree

= Player

If n>m or n > m′ here


= Opponent
then this node will never be rea hed.

m′ n

So: on e you've established that n is suÆ iently large, you don't


need to explore any more of the orresponding node's hildren.
α − β pruning in general

The sear h is depth- rst, so we're only ever looking at one path
through the tree .
We need to keep tra k of the values α and β where
highest utility seen so far on the path for Max
α = the
β = the lowest utility seen so far on the path for Min
Assume Max begins . Initial values for α and β are
α = −∞
and
β = +∞.
α − β pruning in general

So: we start with the fun tion all


max(−∞, +∞, root)
where max is the fun tion

max(alpha,beta,node)
{
if (node is at cut-off)
return evaluation(node);
else
{
for (each successor n’ of node)
{
alpha = maximum(alpha,min(alpha,beta,n’));
if (alpha >= beta)
return beta; // pruning happens here.
}
return alpha;
}
}
α − β pruning in general

The fun tion min is

min(alpha,beta,node)
{
if (node is at cut-off)
return evaluation(node);
else
{
for (each successor n’ of node)
{
beta = minimum(beta,max(alpha,beta,n’));
if (beta <= alpha)
return alpha; // pruning happens here.
}
return beta;
}
}
α − β pruning in general

Applying this to the earlier example and keeping tra k of the values
for α and β you should obtain:

α = −∞ = 2 = 6

Return 2 β = +∞

2 6 Return 6
Return 6

4 5 2 20 20 15 6 7 1

α = −∞ α=2 α=6

β = +∞ = 2 β = +∞ = 6 β = +∞ = 1
How e e tive is α − β pruning?

(Warning: the theoreti al results that follow are somewhat idealised.)


A qui k inspe tion should onvin e you that the order in whi h
moves are arranged in the tree is riti al.
So, it seems sensible to try good moves rst:

 If you were to have a perfe t move-ordering te hnique then α − β


pruning would be O(qp/2) as opposed to O(qp).

 so the bran hing fa tor would e e tively be q instead of q.
 We would therefore expe t to be able to sear h ahead twi e as
many moves as before .

However, this is not realisti : if you had su h an ordering te hnique


you'd be able to play perfe t games!
How e e tive is α − β pruning?

If moves are arranged at random then α − β pruning is:

 O((q/ log q)p) asymptoti ally when q > 1000 or...


 ...about O(q3p/4) for reasonable values of q.

In pra ti e simple ordering te hniques an get lose to the best ase.


For example, if we try aptures, then threats, then moves forward
et .
Alternatively, we an implement an iterative deepening approa h and
use the order obtained at one iteration to drive the next.
A further optimisation: the transposition table

Finally, note that many games orrespond to graphs rather than


trees be ause the same state an be arrived at in di erent ways.

 This is essentially the same e e t we saw in heuristi sear h: re all


graph sear h versus tree sear h .
 It an be addressed in a similar way: store a state with its evalua-
tion in a hash table|generally alled a transposition table |the
34

rst time it is seen.

The transposition table is essentially equivalent to the losed list


introdu ed as part of graph sear h.
This an vastly in rease the e e tiveness of the sear h pro ess, be-
ause we don't have to evaluate a single state multiple times.

You might also like