AI-Lecture 6 (Adversarial Search)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 68

Artificial Intelligence

Lecture 6

Bicol University College of Science


1st Semester, 2021-2022
Adversarial Search
What are and why study games?

Games are a form of multi-agent environment



What do other agents do and how do they affect our success?

Cooperative vs. competitive multi-agent environments.

Competitive multi-agent environments give rise to adversarial
problems a.k.a. games

Why study games?



Fun; historically entertaining

Interesting subject of study because they are hard

Easy to represent and agents restricted to small number of actions
Adversarial Search
Adversarial search problems == games
They occur in multiagent competitive environments
There is an opponent we can’t control planning
against us!
Game vs. search: optimal solution is not a sequence
of actions but a strategy (policy) If opponent does a,
agent does b, else if opponent does c, agent does d,
etc.
Tedious and fragile if hard-coded (i.e., implemented
with rules)
Good news: Games are modeled as search
problems and use heuristic evaluation functions.
Games: hard topic
Games are a big deal in AI
Games are interesting to AI because they are too
hard to solve
100
Chess❑≈10154
has a branching factor of 35, with 35
nodes ~~10,154
Need to make some decision even when the
optimal decision is infeasible
Adversarial Search
Checkers:
Checkers
Chinook ended 40-year-reign of human world
champion Marion Tinsley in 1994.
Used an endgame database defining perfect play
for all positions involving 8 or fewer pieces on the
board, a total of 443,748,401,247 positions.
Adversarial Search
Chess:
In 1949, Caude E. Shannon in his paper “Programming a Computer for
Playing Chess”, suggested Chess as an AI problem for the community.
Deep Blue defeated human world champion Gary Kasparov in a six-
game match in 1997.
In 2006, Vladmir Kramnik, the undisputed world champion, was
defeated 4-2 by Deep Fritz.
Adversarial Search
Go: b > 300! Google Deep mind Project AlphaGo. In 2016,
AlphaGo beat both Fan Hui, the European Go champion and Lee
Sedol the worlds best player.

Othello: Several computer othello exists and human champions


refuse to compete against computers, that are too good.
Relation of Games to Search

Search – no adversary

Solution is (heuristic) method for finding goal

Heuristics technique can find optimal solution

Evaluation function: estimate of cost from start to goal through given node

Examples: path planning, scheduling activities
Games – adversary

Solution is strategy (strategy specifies move for every possible opponent
reply).

Time limits force an approximate solution

Evaluation function: evaluate “goodness” of game position

Examples: chess, checkers, Othello, backgammon
Types of Games

We are mostly interested in deterministic games, fully


observable environments, zero-sum, where two agents act
alternately.
 Games with perfect information. No randomness is involved.

 Games with imperfect information. Random factors are part


of the game.
Zero-sum Games

Adversarial: Pure competition.


Agents have different values on the outcomes.
One agent maximizes one single value, while the
other minimizes it.
Each move by one of the players is called a “ply.”
One function: one agents maximizes it and one
minimizes it!
Embedded thinking...

One agent is trying to figure out what to do.


How to decide? He thinks about the consequences of the
possible actions.
He needs to think about his opponent as well...
The opponent is also thinking about what to do etc.
Each will imagine what would be the response from the
opponent to their actions.
This entails an embedded thinking.
Game setup

Two players: MAX and MIN


MAX moves first and they take turns until the game
is over. Winner gets award, looser gets penalty.
Formulate game as search problem
MAX uses search tree to determine next move.
Searching in a two player game

Problem Formulation:
Initial state: board configurations and the player to
move.
Successor function: list of pairs (move, state)
specifying legal moves and their resulting states.
(moves + initial state = game tree)
A terminal test: decide if the game has finished.
A utility function: produces a numerical value for
(only) the terminal states. Example: In chess, outcome
= win/loss/draw, with values +1, -1, 0 respectively.

Players need search tree to determine next move.


Partial Game Tree for Tic-Tac-Toe
Minimax

Find the optimal strategy for Max:


– Depth-first search of the game tree
– An optimal leaf node could appear at any depth of
the tree
– Minimax principle: compute the utility of being in a
state assuming both players play optimally from
there until the end of the game
– Propagate minimax values up the tree once
terminal nodes are discovered
Optimal strategies

Find the contingent strategy for MAX assuming an


infallible MIN opponent.
Assumption: Both players play optimally !!
Given a game tree, the optimal strategy can be
determined by using the minimax value of each node:
MINIMAX-VALUE(n)=
UTILITY(n) If n is a terminal
maxs  successors(n) MINIMAX-VALUE(s) If n is a max node
mins  successors(n) MINIMAX-VALUE(s) If n is a min node
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree
Two-Ply Game Tree

The minimax decision

Minimax maximizes the worst-case outcome for max.


Partial Game Tree for Tic-Tac-Toe
Partial game tree for Tic-Tac-Toe
Partial game tree for Tic-Tac-Toe
Partial game tree for Tic-Tac-Toe
Partial game tree for Tic-Tac-Toe
What if MIN does not play optimally?

Definition of optimal play for MAX assumes


MIN plays optimally: maximizes worst-case
outcome for MAX.

But if MIN does not play optimally, MAX will


do even better.
Minimax Algorithm
function MINIMAX-DECISION(state) returns an action
inputs: state, current state in game
vMAX-VALUE(state)
return the action in SUCCESSORS(state) with value v
function MAX-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v∞
for a,s in SUCCESSORS(state) do
v  MAX(v,MIN-VALUE(s))
return v
function MIN-VALUE(state) returns a utility value
if TERMINAL-TEST(state) then return UTILITY(state)
v∞
for a,s in SUCCESSORS(state) do
v  MIN(v,MAX-VALUE(s))
return v
Properties of Minimax

Criterion Minimax

Complete? Yes

Time O(bm)

Space O(bm)

Optimal? Yes

Minimax Algorithm

Minimax algorithm
Perfect for deterministic, 2-player game
One opponent tries to maximize score (Max)
One opponent tries to minimize score (Min)
Goal: move to position of highest minimax
value
Identify best achievable payoff against best
play
Multiplayer games

Games allow more than two players


Single minimax values become vectors
Problem of minimax search

Number of game’s states is exponential to


the number of moves.

Solution: Do not examine every node

==> Alpha-beta pruning

Alpha = value of best choice found so far at any
choice point along the MAX path

Beta = value of best choice found so far at any
choice point along the MIN path
Alpha-beta Game Playing
Basic idea:
If you have an idea that is surely bad, don't take the
time to see how truly awful it is.” -- Pat Winston

Some branches will never be played by rational players since


they include sub-optimal decisions (for either player).

>=2
• We don’t need to compute
=2 <=1 the value at this node.
• No matter what it is, it can’t
affect the value of the root
node.
2 7 1 ?
Alpha-Beta Example

Do DF-search until first leaf


Range of possible values

[-∞,+∞]

[-∞, +∞]
Alpha-Beta Example (continued)

[-∞,+∞]

[-∞,3]
Alpha-Beta Example (continued)

[-∞,+∞]

[-∞,3]
Alpha-Beta Example (continued)

[3,+∞]

[3,3]
Alpha-Beta Example (continued)

[3,+∞]
This node is worse
for MAX

[3,3] [-∞,2]
Alpha-Beta Example (continued)

[3,14] ,

[3,3] [-∞,2] [-∞,14]


Alpha-Beta Example (continued)

[3,5] ,

[3,3] [−∞,2] [-∞,5]


Alpha-Beta Example (continued)

[3,3]

[3,3] [−∞,2] [2,2]


Alpha-Beta Example (continued)

[3,3]

[3,3] [-∞,2] [2,2]


Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Game Playing: Adversarial Search
Alpha-Beta Pruning Algorithm
function ALPHA-BETA-SEARCH(state) returns an action
inputs: state, current state in game
v← MAX-VALUE(state, - ∞ , +∞)
return the action in SUCCESSORS(state) with value v

function MAX-value (n, alpha, beta) return utility value


if n is a leaf node then return f(n);
for each child n’ of n do
alpha :=max{alpha, MIN-value(n’, alpha, beta)};
if alpha >= beta then return beta /* pruning */
end{do}
return alpha

function MIN-value (n, alpha, beta) return utility value


if n is a leaf node then return f(n);
for each child n’ of n do
beta :=min{beta, MAX-value(n’, alpha, beta)};
if beta <= alpha then return alpha /* pruning */
end{do}
return beta
General alpha-beta pruning

Consider a node n
somewhere in the tree
If player has a better choice
at

Parent node of n

Or any choice point further up
n will never be reached in
actual play.
Hence when enough is
known about n, it can be
pruned.
Final Comments about Alpha-Beta
Pruning

Pruning does not affect final results


Entire subtrees can be pruned.
Good move ordering improves effectiveness of pruning
With “perfect ordering,” time complexity is O(b m/2)

Branching factor of sqrt(b) !!

Alpha-beta pruning can look twice as far as minimax in the same amount of
time

Repeated states are again possible.



Store them in memory = transposition table
Games of imperfect information

Minimax and alpha-beta pruning require too


much leaf-node evaluations.
May be impractical within a reasonable
amount of time.
SHANNON (1950):

Cut off search earlier (replace TERMINAL-TEST by CUTOFF-
TEST)

Apply heuristic evaluation function EVAL (replacing utility
function of alpha-beta)
Cutting off search

Change:
if TERMINAL-TEST(state) then return UTILITY(state)
into
if CUTOFF-TEST(state,depth) then return EVAL(state)

Introduces a fixed-depth limit depth


Is selected so that the amount of time will not exceed what the rules of the game
allow.
When cutoff occurs, the evaluation is performed.
Evaluation Function
Evaluation function is a heuristic function, and
it is where the domain experts’ knowledge
resides.
Performed at search cutoff point
Must have same terminal/goal states as utility
function
Tradeoff between accuracy and time →
reasonable complexity
Accurate
 Performance of game-playing system dependent on
accuracy/goodness of evaluation
 Evaluation of nonterminal states strongly correlated
with actual chances of winning
Heuristic EVAL(uation) Function

Idea: produce an estimate of the expected utility of


the game from a given position.
Performance depends on quality of EVAL.
Requirements:
EVAL should order terminal-nodes in the same way as UTILITY.
Computation may not take too long.
For non-terminal states the EVAL should be strongly correlated with the
actual chance of winning.
Only useful for quiescent (no wild swings in value in
near future) states
Heuristic EVAL example

For Tic-Tac-Toe:
f(s) = [# of 3-lengths open for me] - [# of 3-lengths open for you]
where a 3-length is a complete row, column, or diagonal.

For chess, typically linear weighted sum of features


Eval(s) = w1f1(s)+w2f2(s)+ … +wnfn(s)
e.g., Alan Turing’s function for chess
 f(s)=w(s)/b(s) where w(s) = sum of the point value of
white’s pieces and b(s) is sum for black

Example features for chess are piece count, piece placement,
squares controlled, etc.

Deep Blue has about 6,000 features in its evaluation function.
Heuristic EVAL example

Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)


Heuristic EVAL example

Addition assumes
independence
Eval(s) = w1 f1(s) + w2 f2(s) + … + wnfn(s)
End

You might also like