Real Time Heuristic Search
Real Time Heuristic Search
Real Time Heuristic Search
Heuristic
Search
Richard E. Korf, UCLA
Siddhartha
Dutta
Shivam
Garg
Click
to
edit
Master
subtitle
style
Aditya
Nambiar
January
29,
2015
Introduction
Heuristic search is a fundamental problem-solving
method in AI
Sequence of steps to reach goal isnt known a priori
Systematic trial and error exploration of alternatives
Performance of search algorithms is greatly improved by
heuristic evaluation functions
Heuristic evaluator is an inexpensive function to compute
and estimate the merit of different states relative to the
goal
Heuristic
Search
Implemented in two types of domains:
1. Single-agent problems
2. Two-player games
Autonomous navigation
in a network of roads or
arbitrary terrain
Heuristic: Euclidean
Distance or airline
distance
Two-player games
Two-player
games
Classical algorithm: Full-width, fixed-depth minimax
search with alpha-beta pruning
Search tree of chess: 1043 positions
Cant search all the way to checkmate positions
Limited search horizon, suboptimal decisions made
(for eg. in chess tournaments moves have to be made in
certain time limits and the moves cant be revoked)
10
Two-player
games
Research in single-agent problems
Focussed on increasing size of problems for which
optimal solutions can be found
Research in two-player games
Focussed on making best possible decisions possible
given limited amount of computation available
between moves
11
12
13
14
15
16
Alpha
Pruning
An
obvious
question
is
whether
every
frontier
node
must
be
examined
to
find
one
with
minimum
cost?
If
we
allow
heuristic
evaluations
of
interior
nodes,
then
substantial
pruning
is
possible
if
the
cost
function
is
monotonic
Since
f
is
monotonic,
the
f
values
of
the
frontier
nodes
below
that
node
can
only
be
greater
than
or
equal
to
the
f
value
of
that
node,
and
hence
cannot
be
lower
than
the
value
of
the
frontier
node
responsible
for
the
current
value
of
.
17
Algorithm
Maintain
in
a
variable
the
lowest
f
value
of
any
node
encountered
on
the
search
horizon
so
far
For
each
interior
node
,
compute
its
f
value
and
terminate
search
of
the
corresponding
branch
when
its
f
value
equals
or
exceeds
.
As
each
frontier
node
is
generated,
compute
its
f
value
as
well
and
if
it
is
less
than
,
replace
with
this
value.
18
Node
Ordering
As
in
alpha-beta
pruning,
the
efficiency
of
alpha
pruning
ought
be
improved
by
node
ordering.
The
idea
is
that
we
may
reach
a
lower
alpha
value
earlier
and
hence
prune
more
branches.
This
can
be
done
by
visiting
the
children
of
the
node
in
increasing
order
of
thier
f
values.
How
ever
it
was
experimentally
observed
this
lead
to
only
a
5%
reduction
of
nodes
generated
which
is
much
less
than
the
overhead
per
node
required
to
do
the
ordering.
19
20
Time
Limited
A*
The
idea
is
to
run
A*
until
time
runs
out,
and
then
to
make
a
single
move
in
the
direction
of
the
best
node
on
the
OPEN
list,
breaking
ties
in
favor
of
smaller
h
values.
The
main
drawback
of
this
algorithm
is
the
same
as
that
of
A*:
its
memory
requirement
is
exponential
in
the
search
depth
and
thus
in
practice
it
will
exhaust
the
available
memory,
unless
the
amount
of
time
allotted
per
move
is
quite
small.
21
22
Real Time A*
23
Real
Time
A*
So
far
we
have
only
addressed
the
issue
of
how
to
make
individual
move
decisions
but
not
how
to
make
a
sequence
of
move
decisions
to
arrive
at
the
solution.
Repeating
the
min-min
algorithm
for
each
decision
wont
work
since
we
may
visit
a
previously
visited
state
and
loop
forever.
Simply
excluding
previously
visited
states
won't
work
either
since
all
successors
of
the
current
state
may
have
already
been
visited.
Decisions
are
made
on
limited
info
24
Real
Time
A*
In
RTA*
the
merit
of
a
node
f(n)
as
in
A*.
However
g(n)
in
RTA*
=>
is
the
actual
distance
of
node
n
from
the
current
state
of
the
problem
solver
rather
than
the
initial
state.
RTA
*
in
other
words
is
best-first
search
given
this
slightly
different
cost
function.
In
principle,
it
could
be
implemented
by
storing
on
an
OPEN
list
the
h
values
of
all
previously
visited
states,
and
every
time
a
move
is
made,
updating
the
g
values
of
all
states
on
OPEN
to
accurately
reflect
their
actual
distance
from
the
new
current
state
25
Real
Time
A*
The
algorithm
maintains
in
a
hash
table
a
list
of
those
nodes
that
have
been
visited
by
an
actual
move
of
the
problem
solver,
together
with
an
h
value
for
them.
At
each
cycle
of
the
algorithm,
the
current
state
is
expanded,
generating
its
neighbors,
and
the
heuristic
function,
possibly
augmented
by
lookahead
search(
this
could
be
any
heuristic
function),
is
applied
to
each
state
which
is
not
in
the
hash
table
for
those
not
in
the
hash
table
for
those
in
the
hash
table
the
h
value
in
the
hash
table
is
used
instead.
The
above
gives
h
+
cost
of
edge
to
each
neighboring
state
gives
f
value
for
each
neighbour
of
current
state.
Real-Time
Heuristic
Search
26
Real
Time
A*
The
node
with
the
minimum
f
value
is
chosen
for
the
new
current
state
and
a
move
to
that
state
is
executed.
At
the
same
time,
the
previous
current
state
is
stored
in
the
hash
table,
and
associated
with
it
is
the
second
best
f
value.
27
Theorems
Of
RTA*
Theorem
1.
In
a
finite
problem
space
with
positive
edge
costs
and
finite
heuristic
values,
in
which
a
goal
state
is
reachable
from
every
state,
RTA*
will
find
a
solution.
Theorem
2.
Each
move
made
by
RTA*
on
a
tree
is
along
a
path
whose
estimated
cost
of
reaching
a
goal
is
a
minimum,
based
on
the
cumulative
search
frontier
at
the
time.
28
Learning
RTA*
Successively
solving
multiple
problem
instances
in
the
same
problem
space
with
the
same
goal
but
different
start
states
Goal
of
learning:
After
learning
phase,
the
heuristic
values
should
converge
to
h*
Information
saved
from
one
trial
to
the
next:
Hash
table
of
values
recorded
for
previously
visited
states
(h
values)
29
30
Consequences
of
modification
LRTA*
is
no
more
guaranteed
to
make
locally
optimum
decisions
Consider
the
example:
31
Consequences
of
modification(contd)
LRTA*
retains
the
completeness
property
of
RTA*
(Similar
proof
works)
What
do
we
gain
from
this
modification?
Convergence!
Theorem:
In
a
finite
problem
space
with
positive
edge
costs,
and
admissible
initial
heuristic
values,
in
which
a
goal
state
is
reachable
from
every
state,
over
repeated
trials
of
LRTA*,
the
heuristic
values
will
eventually
converge
to
their
exact
values
along
every
optimal
path.
32
Efficiency
of
LRTA*
How
long
does
it
take
to
converge?
The
answer
depends
on
the
structure
of
the
problem
space,
the
distribution
of
initial
and
goal
states,
and
the
initial
heuristic
values
Let
us
take
an
example
Problem
space:
Square
grid
with
N
units
on
a
side,
with
all
edges
having
unit
cost
Start
state:
Top
left
corner
Goal
state:
Bottom
right
corner
Initial
heuristic
value:
0
for
all
states
33
34
THANKS!
35