Real Time Heuristic Search

Real-Time
Heuristic Search
Richard E. Korf, UCLA
Siddhartha Dutta
Shivam Garg
Click to edit
Master subtitle style
Aditya Nambiar
January 29, 2015
Real-Time Heuristic Search
Introduction
Heuristic search is a fundamental problem-solving
method in AI
Sequence of steps to reach goal isnt known a priori
Systematic trial and error exploration of alternatives
Performance of search algorithms is greatly improved by
heuristic evaluation functions
Heuristic evaluator is an inexpensive function to compute
and estimate the merit of different states relative to the
goal
Heuristic Search
Implemented in two types of domains:
1. Single-agent problems
2. Two-player games
Single-agent heuristic search
(n2-1) puzzle problems

Heuristic: Manhattan Distance
Single-agent heuristic search: Real life example
Autonomous navigation
in a network of roads or
arbitrary terrain
Heuristic: Euclidean
Distance or airline
distance
A* - Best known single agent heuristic search algorithm

f(n) = g(n) + h(n)
where,
f(n) = Merit of a node
g(n) = Actual cost of reaching that node from initial state
h(n) = Estimated cost of reaching goal state from that node
Pro: Always finds optimal solution if heuristic function

doesnt over estimate actual cost
Con: Requires exponential space (in depth) in practice
Iterative deepening A* (IDA*)

Modification of A*
Reduces space complexity from exponential to linear
Performs a series of DFS
Branch cut when off when f of frontier node exceeds a
threshold
Threshold starts at h of initial state
Solution is optimal like A*
Expands same number of nodes, asymptotically, as A*
on exponential tree but uses only linear space
Drawback of A* and IDA*

1. Exponential time to run in practice(in depth)
An unavoidable cost of obtaining optimal solutions
Restricted to relatively small problems in practice
A* with Manhattan Distance can solve Eight Puzzle
IDA* can solve Fifteen Puzzle
Rarely optimal solutions are required
2. Must search all the way to a solution before even committing to
first move
A* and IDA* run to completion in planning/ simulation phase
before committing to first move
Two-player games
Canonical example: Chess
Two-player games
Classical algorithm: Full-width, fixed-depth minimax
search with alpha-beta pruning
Search tree of chess: 1043 positions
Cant search all the way to checkmate positions
Limited search horizon, suboptimal decisions made
(for eg. in chess tournaments moves have to be made in
certain time limits and the moves cant be revoked)
10
Two-player games
Research in single-agent problems
Focussed on increasing size of problems for which
optimal solutions can be found
Research in two-player games
Focussed on making best possible decisions possible
given limited amount of computation available
between moves
11
Goal of this paper

Applying assumptions of two-player games i.e.
limited search horizon
commitment to moves in constant time
to single-agent heuristic searches
Cause for limited search horizon
computational or informational limitation
Commitment to moves in constant time
backtracking moves counted in solution length
12
Minimin lookahead search

Specializing the minimax algorithm for two-player
games to single problem-solving agent
Assumption - All edges have same cost
Search forward from current state to a fixed depth
Apply heuristic functions to nodes in the frontier
In two-player game, values were minimised up the tree
Here, move made in direction of child that had best
frontier
Strategy of least commitment
13
Interleaving planning and execution phase

Minimin lookahead search occurs in a planning mode
where moves are not actually made but simulated
After one complete lookahead search, best move found
is actually executed
This is followed by another lookahead simulation and
another move and so on.
14
Minimin lookahead search: Non-uniform cost

Take into account the cost of the path from current state
to frontier in addition to the heuristic estimate of the
remaining cost
15
Minimin lookahead search: Termination

If goal reached before search horizon
path is terminated
heuristic value of zero is set for goal node
If path ends in a non-goal dead-end before the horizon
is reached
heuristic value of infinity is returned for dead-end
node
guarantees the path will not be chosen
16
Alpha Pruning
An obvious question is whether every frontier node must be
examined to find one with minimum cost?
If we allow heuristic evaluations of interior nodes, then
substantial pruning is possible if the cost function is monotonic
Since f is monotonic, the f values of the frontier nodes below
that node can only be greater than or equal to the f value of
that node, and hence cannot be lower than the value of the
frontier node responsible for the current value of .
17
Algorithm
Maintain in a variable the lowest f value of any node
encountered on the search horizon so far
For each interior node , compute its f value and terminate search
of the corresponding branch when its f value equals or exceeds
.
As each frontier node is generated, compute its f value as well
and if it is less than , replace with this value.
18
Node Ordering
As in alpha-beta pruning, the efficiency of alpha pruning ought
be improved by node ordering.
The idea is that we may reach a lower alpha value earlier and
hence prune more branches.
This can be done by visiting the children of the node in
increasing order of thier f values.
How ever it was experimentally observed this lead to only a 5%
reduction of nodes generated which is much less than the
overhead per node required to do the ordering.
19
Other Real Time Algorithms

The key property of minimin search is that it takes constant
time to run
Note that even though its complexity is an exponential
function of the search horizon, the search horizon is bounded
by a constant and hence so is the running time of the
algorithm.
This property, however, is not unique to this algorithm, and is
shared by at least two other more obvious algorithms.
20
Time Limited A*
The idea is to run A* until time runs out, and then to make a
single move in the direction of the best node on the OPEN list,
breaking ties in favor of smaller h values.
The main drawback of this algorithm is the same as that of A*:
its memory requirement is exponential in the search depth and
thus in practice it will exhaust the available memory, unless
the amount of time allotted per move is quite small.
21
Threshold Limited IDA*

The idea is to run a single iteration of IDA* with a threshold
which is always a constant amount greater than the heuristic
estimate of the current state. At the end of the iteration, all
the frontier nodes will have the same f value, and hence a
move would be made in the direction of the minimum h value.
22
Real Time A*
Click to edit Master subtitle style
23
Real Time A*
So far we have only addressed the issue of how to make
individual move decisions but not how to make a sequence of
move decisions to arrive at the solution.
Repeating the min-min algorithm for each decision wont work
since we may visit a previously visited state and loop forever.
Simply excluding previously visited states won't work either
since all successors of the current state may have already been
visited.
Decisions are made on limited info
24
Real Time A*
In RTA* the merit of a node f(n) as in A*. However
g(n) in RTA* => is the actual distance of node n from the
current state of the problem solver rather than the initial
state.
RTA * in other words is best-first search given this slightly
different cost function.
In principle, it could be implemented by storing on an OPEN
list the h values of all previously visited states, and every time
a move is made, updating the g values of all states on OPEN to
accurately reflect their actual distance from the new current
state
25
Real Time A*
The algorithm maintains in a hash table a list of those nodes that
have been visited by an actual move of the problem solver, together
with an h value for them.
At each cycle of the algorithm, the current state is expanded,
generating its neighbors, and the heuristic function, possibly
augmented by lookahead search( this could be any heuristic
function), is applied to each state which is not in the hash table for
those not in the hash table for those in the hash table the h value in
the hash table is used instead.
The above gives h + cost of edge to each neighboring state gives f
value for each neighbour of current state.

26
Real Time A*
The node with the minimum f value is chosen for the new
current state and a move to that state is executed.
At the same time, the previous current state is stored in the
hash table, and associated with it is the second best f value.

27
Theorems Of RTA*
Theorem 1. In a finite problem space with positive edge costs
and finite heuristic values, in which a goal state is reachable
from every state, RTA* will find a solution.
Theorem 2. Each move made by RTA* on a tree is along a path
whose estimated cost of reaching a goal is a minimum, based
on the cumulative search frontier at the time.
28
Learning RTA*
Successively solving multiple problem instances in the same
problem space with the same goal but different start states
Goal of learning: After learning phase, the heuristic values
should converge to h*
Information saved from one trial to the next: Hash table of
values recorded for previously visited states (h values)
29
Information stored in hash map

RTA* algorithm records the second best estimate with the
previous state, which represents an accurate estimate of the
previous state looking back from the perspective of the next
state
What if THE best estimate turns out to be correct?
In this case, storing the second best value can result in inflated
values for some states
Modify the algorithm to store the best value in the previous
state instead of the second best
30
Consequences of modification
LRTA* is no more guaranteed to make locally optimum
decisions
Consider the example:
If LRTA* starts in one of the central 1 states in the Figure, then

it will bounce back and forth between the 1 states until the
original 1 values equal or exceed 5
31
Consequences of modification(contd)
LRTA* retains the completeness property of RTA* (Similar
proof works)
What do we gain from this modification?
Convergence!
Theorem: In a finite problem space with positive edge costs,
and admissible initial heuristic values, in which a goal state is
reachable from every state, over repeated trials of LRTA*, the
heuristic values will eventually converge to their exact values
along every optimal path.
32
Efficiency of LRTA*
How long does it take to converge?
The answer depends on the structure of the problem space,
the distribution of initial and goal states, and the initial
heuristic values
Let us take an example
Problem space: Square grid with N units on a side, with all
edges having unit cost
Start state: Top left corner
Goal state: Bottom right corner
Initial heuristic value: 0 for all states
33
Efficiency of LRTA* (contd.)

On the average, each move of each trial increases the value of
one state by one unit
The average final value of a state is N, with the range being
between zero and 2N-2
Since there are N2, states, the average learning time is O(N3)
moves
This result is borne out by experimental results
Empirically, solution times very close to the optimal value are
actually achieved very early in the learning episode.
34
THANKS!
35

Real Time Heuristic Search

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Real Time Heuristic Search

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Real Time Heuristic Search

Uploaded by

Copyright:

Available Formats

Real-Time

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Single-agent heuristic search

(n2-1) puzzle problems

Single-agent heuristic search: Real life example

Real-Time Heuristic Search

A* - Best known single agent heuristic search algorithm

Pro: Always finds optimal solution if heuristic function

Iterative deepening A* (IDA*)

Drawback of A* and IDA*

Canonical example: Chess

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Goal of this paper

Minimin lookahead search

Interleaving planning and execution phase

Real-Time Heuristic Search

Minimin lookahead search: Non-uniform cost

Real-Time Heuristic Search

Minimin lookahead search: Termination

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Other Real Time Algorithms

Real-Time Heuristic Search

Real-Time Heuristic Search

Threshold Limited IDA*

Real-Time Heuristic Search

Click to edit Master subtitle style

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Information stored in hash map

Real-Time Heuristic Search

If LRTA* starts in one of the central 1 states in the Figure, then

Real-Time Heuristic Search

Real-Time Heuristic Search

Real-Time Heuristic Search

Efficiency of LRTA* (contd.)

Real-Time Heuristic Search

Real-Time Heuristic Search

You might also like