Annurev Control 071520 120123
Annurev Control 071520 120123
Annurev Control 071520 120123
Autonomous Systems
Markov Chain–Based
Stochastic Strategies for
Robotic Surveillance
Xiaoming Duan and Francesco Bullo
Department of Mechanical Engineering and Center for Control, Dynamical Systems, and
Computation, University of California, Santa Barbara, California 93106-5070, USA;
email: xiaomingduan@ucsb.edu, bullo@ucsb.edu
243
1. INTRODUCTION
This article presents a stochastic approach for the design of robotic surveillance algorithms. We
adopt Markov chains as the main algorithmic building block and discuss various properties rel-
evant in surveillance applications. This approach leads to easily implementable and intrinsically
unpredictable surveillance algorithms in which the robots’ visit frequency at different locations
can be conveniently specified in a probabilistic fashion. In particular, the unpredictability of a ran-
dom walk over a graph is a highly desirable characteristic for surveillance strategies in adversarial
settings, because it makes it difficult for potential intruders to take advantage of the robots’ motion
patterns.
In persistent surveillance tasks, mobile robots patrol and visit sites of interest in the environ-
ment to collect sensing data, detect anomalies, or intercept intrusions. Persistent surveillance is
a key component in numerous applications, such as environmental monitoring (1, 2), infrastruc-
ture inspection (3, 4), and urban security (5). In contrast to traditional coverage or search-and-
exploration problems, surveillance tasks require robots to repeatedly visit different locations and
achieve continuous monitoring. Surveillance strategies specify in which order and how often the
robots should visit different locations so that desirable performance is maintained. In this survey,
we adopt a model where the physical environment is discretized and modeled by a connected
graph: Each node of the graph represents a location of interest in the environment, and the edges
in the graph represent the paths between locations. Numerous studies have examined scenarios
where mobile robots move in continuous space (e.g., 6–9) or considered (stochastic) dynamics at
the locations of interest (e.g., 10–13).
1.2.1. Metric-based design. Commonly used design metrics for stochastic strategies in robotic
surveillance include visit frequency, speed, and unpredictability. Cannata & Sgorbissa (23) pro-
posed a Markov chain–based algorithm for a team of robots to achieve continuous coverage of
an environment with a desirable visit frequency. Smart nodes deployed at different locations are
responsible for recording the visits by robots and directing robots to the next site to be visited. To
obtain fast strategies for the surveillance agent, Patel et al. (24) studied Markov chains with the
minimum weighted mean hitting time. Here, travel times between different locations are given as
edge weights on the graph. The minimization of the weighted mean hitting time is transcribed into
a convex optimization problem for the special case where candidate Markov chains are assumed
to be reversible. The problem also has a semidefinite reformulation and can be solved efficiently.
Later, Patel et al. (25) proposed and studied a similar notion of mean hitting time for a multirobot
system.
1.2.2. Intruder model–based design. In contrast to the metric-based designs, which do not
assume any explicit intruder behaviors, intruder model–based designs leverage available informa-
tion on the intruder to achieve improved or guaranteed performance. Agmon et al. (30) proposed
a multirobot perimeter patrol problem in which a randomized strategy is used to defend against a
strategic adversary. The adversary knows both the surveillance strategy and the robots’ locations
on the perimeter, and it aims to maximize its probability of success by attacking the weakest spot
along the perimeter. Sless Lin et al. (31) considered coordinated sequential attacks in the mul-
tirobot perimeter patrol problem. Basilico et al. (32) defined a patrolling security game where a
Markovian surveillance robot tries to capture an intruder that has perfect knowledge of the loca-
tion and the strategy of the surveillance agent. The intruder has the freedom to choose where and
when to attack in order to minimize the probability of being captured. Asghar & Smith (33) dis-
cussed various intruder models that characterize different manners in which the intruders attack
the locations. They designed pattern search algorithms to solve for a Markov chain that minimizes
the expected reward for the intruders. Variations of intruder models where intruders have limited
observation time and varying attack duration were studied by Asghar & Smith (34) and Yang et al.
(35), respectively.
Many of the above-mentioned models can be formalized as Stackelberg games. The Stackel-
berg game framework has been successfully applied in practical security domain applications (36).
However, only limited progress has been made on efficient computational methods with optimal-
ity guarantees for surveillance problems with graph constraints.
In this article, we focus on the metric-based design of stochastic surveillance strategies. The
concepts discussed here also appear in other settings in the literature.
1.3. Organization
First, we review the basics of Markov chains and relevant design metrics in Section 2. We then
discuss fast and unpredictable surveillance strategies in various scenarios in Sections 3 and 4, re-
spectively. We conclude the article in Section 5.
1.4. Notation
Let R, Rn , and Rm×n be the set of real numbers, real vectors of dimension n, and real matrices,
respectively. The set of elementwise positive vectors and nonnegative matrices are denoted by
Rn>0 and Rm×n
≥0 . We denote the vector of 1s in R by 1n and the ith standard unit vector by ei ,
n
whose dimension will be clear from the context. For a vector v ∈ Rn , diag(v) ∈ Rn×n is a diagonal
matrix with diagonal elements being v; for a matrix S ∈ Rn×n , diag(S) ∈ Rn×n is a diagonal matrix
with diagonal elements being the same as that of S. For a matrix S ∈ Rm×n , the vectorization
vec(S) ∈ Rmn is constructed by stacking the columns of S on top of one another. The indicator
function is denoted by 1{·} .
1
t
as t→∞
1{Xk =i} −−−−→ πi almost surely. 1.
t + 1 k=0
The physical meaning of Equation 1 is that for an irreducible Markov chain, the stationary distri-
bution encodes the visit frequency to different states in the long run. A Markov chain is reversible
if πi pi j = π j p ji for all i, j ࢠ {1, . . . , n}. One can verify that if a positive vector π ∈ Rn>0 satisfies the
reversibility condition for a transition matrix P, then it must be a stationary distribution of P. We
refer readers to References 38 and 39 for more about Markov chains. In this article, we focus on
discrete-time homogeneous irreducible Markov chains with finite states.
In robotic surveillance applications, the environment is sometimes modeled by a weighted
directed graph G = (V, E, W ), where the elements of the weight matrix W = [w ij ] represent the
travel distances or times between locations. A Markov chain defined over a weighted directed
graph G = (V, E, W ) has a transition diagram that is consistent with G—i.e., for all i, j ࢠ {1, . . . , n},
the transition probability pij is greater than or equal to zero if (i, j) ∈ E and equal to zero otherwise.
Moreover, in this case, the weights specify the amount of time it takes for the Markov chain to
transition from one state to another.
The hitting times measure how fast a Markov chain transitions from one state to another. If we
require the Markov chains to be as fast as possible, then the speed can be conveniently quantified
by hitting times and their expectations.
To quantify the randomness in Markovian strategies, two different notions of entropy have been
used in the literature.
2.3.1. Entropy rate. A classic notion that measures the randomness of a Markov chain is the
entropy rate. For a Markov chain with the initial distribution being its stationary distribution, its
entropy rate is defined by (40, theorem 4.2.4)
n
n
Hrate (P) = − πi pi j log pi j . 4.
i=1 j=1
The entropy rate measures the randomness in the sequence of states visited by a Markov chain
(26, 27).
2.3.2. Return-time entropy. The entropy of Markov trajectories was first proposed and studied
by Ekroot & Cover (41); in their work, a trajectory Ti j from a state i to a state j is a path with initial
state i, final state j, and no other states being j. They showed that the entropy of the random return
trajectory H(Tii ) is equal to πi Hrate (P) for all i ࢠ {1, . . . , n}. In robotic surveillance applications,
we are more concerned with the amount of time it takes for the surveillance agent to come back
than with the exact trajectory followed by the surveillance agent. This is because intruders often
may have access to the local observations of the intervisit times to a location, and they could attack
accordingly based on this information. We therefore define the return-time entropy of a Markov
chain as a weighted sum of the entropy of return times to locations, with the weights being the
stationary distribution, i.e.,
n
Hret-time (P) = πi H(Tii ), 5.
i=1
where Tii is the return time (a discrete random variable) defined by Equation 2. We will see how
this definition can be generalized to the cases when there are travel times on the surveillance
graph.
4 5 6
7 8 9
Figure 1
Two running examples: (a) a 3 × 3 grid graph with self-loops at all locations and (b) a region map of San Francisco [created with uMap
(42)] modeled by a fully connected graph where the number of crimes during a time period is recorded at 12 locations.
later sections. We consider a 3 × 3 grid graph and a region map of San Francisco (18, sec.
6.2), as shown in Figure 1. In the grid graph, we assume unit travel times (unweighted), and
the surveillance agent is expected to visit all locations with the same frequency, i.e., π = 19 19 .
In the map, there are 12 locations forming a fully connected weighted graph. The weights on
the edges, as documented in Figure 2, are the quantized by-car travel times (in minutes) be-
tween locations. Moreover, each location is associated with a number that indicates the num-
ber of crimes recorded at that location during a certain period of time. The surveillance agent
is expected to have a visit frequency to different locations proportional to the crime rates, i.e.,
π = 133 , 90 , 89 , 87 , 83 , 83 , 74 , 64 , 48 , 43 , 38 , 34 .
866 866 866 866 866 866 866 866 866 866 866 866
Location A B C D E F G H I J K L
A 1 3 3 5 4 6 3 5 7 4 6 6
B 3 1 5 4 2 4 4 5 5 3 5 5
C 3 5 1 7 6 8 3 4 9 4 8 7
D 6 4 7 1 5 6 4 7 5 6 6 7
E 4 3 6 5 1 3 5 5 6 3 4 4
F 6 4 8 5 3 1 6 7 3 6 2 3
G 2 5 3 5 6 7 1 5 7 5 7 8
H 3 5 2 7 6 7 3 1 9 3 7 5
I 8 6 9 4 6 4 6 9 1 8 5 7
J 4 3 4 6 3 5 5 3 7 1 5 3
K 6 4 8 6 4 2 6 6 4 5 1 3
L 6 4 6 6 3 3 6 4 5 3 2 1
Figure 2
Quantized pairwise by-car travel times (in minutes).
Due to its independence of the initial state i, the mean hitting time M(P) is also known as
Kemeny’s constant, and it has been extensively studied in the literature (44, 45). We can also write
the mean hitting time as
n
n
n
n
M(P) = πi π j mi j = πi π j mi j ,
i=1 j=1 i=1 j=1
which clearly shows that M(P) measures the expected time it takes for the surveillance agent to
transition between two locations randomly selected according to the stationary distribution. The
mean hitting time directly relates to the eigenvalues of the transition matrix as follows (equation
9 in Reference 44): Let λ1 = 1 and |λi | ≤ 1 for i ࢠ {2, . . . , n} be the eigenvalues of an irreducible
transition matrix P; then,
n
1
M(P) = 1 + . 8.
j=2
1 − λj
In robotic surveillance applications, if we require the surveillance robot to visit different lo-
cations as fast as possible, then we could adopt a surveillance strategy that minimizes the mean
hitting time, which we obtain by solving the following optimization problem.
Figure 3
Optimal strategy with minimum mean hitting time M(P) = 6.78, corresponding to Problem 1 in a 3 × 3
grid, showing (a) a pixel image of P and (b) a graphical image of P.
subject to P1n = 1n ,
π P = π ,
pi j ≥ 0 if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
Problem 1 is a difficult nonconvex optimization problem, and the optimal solution is available
only in special cases (45). As an illustration, we solve Problem 1 for the grid graph using the
fmincon function in MATLAB. Since the problem is nonconvex, we start from 100 different initial
conditions and pick the best to be the optimal solution. A pixel image of the obtained solution
is shown in Figure 3, where each row has the transition probabilities from the corresponding
location to other locations and the darkness of a pixel is proportional to the magnitude of the
transition probability. One prominent feature of the optimal strategy is sparsity. Intuitively, it can
be thought of approximately as a Hamiltonian tour that also satisfies the visit frequency.
Problem 1 becomes tractable if we restrict the set of candidate Markov chains to be reversible,
in which case the mean hitting time is a convex function of the transition matrix in the domain.
In fact, by using Equation 8, we can write the mean hitting time as (24, theorem 2)
1 1 √ √
M(P) = Tr((In − diag(π) 2 Pdiag(π)− 2 + π π )−1 ),
where Tr(·) is the trace of a matrix and the square root is elementwise. Based on this, we formulate
a convex optimization problem and solve for the Markov strategy with the minimum mean hitting
time.
Problem 2 (mean-hitting-time minimization for reversible Markov chains). Given
a strongly connected directed graph G = (V, E ) and a stationary distribution π, find a re-
versible Markov chain with the stationary distribution π that has the minimum mean hitting
Figure 4
Optimal reversible strategy with minimum mean hitting time M(P) = 12.43, corresponding to Problem 2 in
a 3 × 3 grid, showing (a) a pixel image of P and (b) a graphical image of P.
subject to P1n = 1n ,
pi j = 0 if (i, j) ∈
/ E.
The convexity of the objective function in Problem 2 follows from the fact that the trace of
the inverse of a positive definite matrix is convex (46). Moreover, Problem 2 has a semidefinite
reformulation and can be solved efficiently (24, problem 2). Note that in both Problem 1 and
Problem 2, although we do not impose combinatorial irreducibility constraints on the Markov
chain, the resulting solution must be irreducible as long as the constraint set is feasible and the
given graph is strongly connected. This is because a reducible solution leads to infinite mean
hitting time. In Reference 47, the Markov chain with minimum mean hitting time is applied
to the quickest detection of environmental anomalies. As a comparison, we solved Problem 2
using CVX (48) and plotted the optimal solution in Figure 4. The optimal strategy given in
Figure 4 is denser than that given in Figure 3. Moreover, Kemeny’s constant is almost doubled
in the reversible case.
n
n
mwi j = pi j wi j + pik (wik + mwk j ) = pik wik + pik mwk j − pi j mwj j . 9.
k= j k=1 k=1
It can be shown that the following relationship between the weighted mean hitting time and the
mean hitting time holds true (24, theorem 8):
Mw (P) = (π (P ° W )1n ) · M(P),
where the impact of the travel times is factored out. To efficiently obtain a fast Markov chain over a
weighted graph, we still restrict the set of candidate Markov chains to be reversible and formulate
the following optimization problem.
Problem 3 (weighted mean-hitting-time minimization for Markov chains). Given a
strongly connected weighted directed graph G = (V, E, W ) and a stationary distribution π,
find a reversible Markov chain with the stationary distribution π that has the minimum
weighted mean hitting time—i.e., solve the following optimization problem:
1 1 √ √
minimize (π (P ° W )1n ) · Tr((In − diag(π) 2 Pdiag(π)− 2 + π π )−1 )
P∈Rn×n
subject to P1n = 1n ,
πi pi j = π j p ji for all (i, j) ∈ E,
pi j ≥ 0 if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
It turns out that by a change of variable, Problem 3 can also be recast to an equivalent semidef-
inite programming and thus solved efficiently. The solutions to Problem 3 and its nonreversible
version for the San Francisco map data set are illustrated in Figure 5. Similar to the unweighted
cases discussed in Section 3.1, the optimal solution to Problem 3 is denser than its nonreversible
counterpart and has a higher (worse) optimal value.
It is worth mentioning that the computational tractability of Problems 2 and 3 relies critically
on the reversibility condition on the set of candidate Markov chains. However, reversible Markov
chains are known to be slow (49), which is also illustrated by our numerical examples. Therefore,
how to efficiently solve Problem 1 and its weighted version with optimality guarantees remains
an open and interesting problem.
Figure 5
Optimal strategies with minimum mean hitting time in the San Francisco map, showing (a) a nonreversible
solution with Mw (P) = 22.19 and (b) a reversible solution Mw (P) = 44.77, corresponding to Problem 3.
the patrolling graphs have a common set of nodes but possibly different sets of edges. The hitting
time for this team of robots from their current locations i1 , . . .iN to a location j is defined as the
first time at least one of the robots arrives at location j, i.e.,
where Xkh is the location of robot h at time k. Let mi1 ...iN , j be the expectation of Ti1 ...iN , j . Then,
similar to the case of single robot before, we have that
N
mi1 ...iN , j = 1 − (1 − phih j ) + ··· p1i1 k1 · · · pNiN kN (1 + mk1 ...kN , j )
h=1 k1 = j kN = j
13.
=1+ ··· p1i1 k1 · · · pNiN kN mk1 ...kN , j .
k1 = j kN = j
N
To write Equation 13 in matrix form, we collect mi1 ...iN , j into M ∈ Rn ×n , where each row of M
contains mean hitting times from a specific robot team configuration (a set of locations currently
occupied by the team of robots) to different locations. The rows of M are arranged by cycling
through the possible locations of robots. Concretely, the mean-hitting-time matrix M has the
following structure:
⎡ ⎤
m1···11,1 m1···11,2 · · · m1···11,n
⎢m ⎥
⎢ 1···12,1 m1···12,2 · · · m1···12,n ⎥
⎢ . . . . ⎥
⎢ . .. .. .. ⎥
⎢ . ⎥
⎢ ⎥
⎢m1···1n,1 m1···1n,2 · · · m1···1n,n ⎥
M=⎢ ⎢ ⎥.
⎥
⎢m1···21,1 m1···21,2 · · · m1···21,n ⎥
⎢ . .. .. .. ⎥
⎢ . ⎥
⎢ . . . . ⎥
⎢ ⎥
⎣mn···nn,1 mn···nn,2 · · · mn···nn,n ⎦
M = 1nN 1
n + (P ⊗ · · · ⊗ P )(M − Md ),
1 N
14.
nN ×n
where ࣹ is the Kronecker product and Md ∈ R has nonzero elements mi1 ...iN , j only at locations
(i1 . . .iN , j) that satisfy ih = j for at least one h ࢠ {1, . . . , N}. To calculate the mean hitting times, we
vectorize both sides of Equation 14 and obtain
Once again, we can organize mij into a matrix M ∈ Rn×n and obtain
M = 1n 1
n + P (M − diag(M ))P .
p e
17.
Unlike previous cases, where the irreducibility of individual strategies is sufficient to ensure the
finiteness of the mean hitting times, the mean meeting time may be infinite even if both P p and P e
are irreducible. In fact, for two Markovian agents to meet in finite time from any initial positions,
there must exist paths of equal length that lead to a common position for the agents from their
respective initial positions; the simple example shown in Figure 6 provides an illustration. The
finiteness of meeting times can be studied systematically by looking at the Kronecker product of
graphs (50); for more details, we refer interested readers to Reference 43.
With the mean-meeting-time matrix M and the stationary distributions πp and πe for the
surveillance agent and the evader, we can define the expected meeting time as
Pe 1 2 Pe 1 2
Figure 6
A simple example of (a) finite and (b) infinite meeting times for a pursuer and evader. In panel a, the pursuer
and evader have finite meeting times, because starting from any positions for the agents, there exist paths to
the common locations (1, 1) and (2, 2). In panel b, the pursuer and evader have infinite meeting times,
because there exist no paths to common locations from locations (1, 2) and (2, 1).
Since we do not require the strategies for the surveillance agent and the evader to be irreducible,
the stationary distributions πp and πe may not be unique. However, the expected meeting time
in Equation 18 can still be defined as long as all entries of M are finite and one set of stationary
distributions is specified. Given the evader’s strategy P e and a stationary distribution πe , we can
solve the following optimization problem to find a surveillance strategy P p that captures (meets)
the evader as fast as possible.
Problem 4 (expected-meeting-time minimization). Given a strongly connected di-
rected graph G = (V, E ), an evader’s strategy P e with a stationary distribution πe , and a
stationary distribution πp for the surveillance agent, find a Markov chain with the station-
ary distribution πp that has the minimum expected meeting time—i.e., solve the following
optimization problem:
minimize
p
πp Mπe
P ∈Rn×n
Figure 7
Optimal strategy with minimum expected meeting time M(P) = 9.86 against a randomly walking evader,
corresponding to Problem 4 in a 3 × 3 grid, showing (a) a pixel image of P and (b) a graphical image of P.
strategies that are maximally unpredictable and thus hard for potential intruders to exploit. We
utilize two different notions of entropy to quantify the unpredictability of Markov chains. The
optimization of the entropy rate in Section 4.1 is studied in Reference 26, and the treatment of
the return-time entropy in Section 4.2 follows from Reference 27.
subject to P1n = 1n ,
π P = π ,
pi j ≥ 0 if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
Figure 8
Optimal strategy with maximum entropy rate Hrate (P) = 1.27, corresponding to Problem 5 in a 3 × 3 grid,
showing (a) a pixel image of P and (b) a graphical image of P.
the return times depend not only on the Markov strategy but also on the travel times between
locations. In this section, we consider strongly connected directed graphs with integer-valued edge
weights. Assuming integer-valued travel times is not restrictive because we can always quantize and
approximate real-valued travel times reasonably well by choosing appropriate quantization units.
Moreover, it is much more convenient to characterize the return-time distributions if the travel
times are integer valued.
To ease the notation, we introduce the shorthand Fk ∈ Rn×n , where the (i, j)-th element of Fk
represents the probability that the hitting time Ti wj takes value k, i.e., Fk (i, j) = P(Ti wj = k). We
then have the following recursion:
Fk (i, j) = pi j 1{k=wi j } + pih Fk−wih (h, j), 19.
h= j
with the initial conditions Fk = 0n×n for k ≤ 0. Equation 19 can also be arranged in a vectorized
form as
n
n
vec(Fk ) = vec(P ° 1{k1n 1n =W } ) + pi j (E j ⊗ ij )vec(Fk−wi j ), 20.
i=1 j=1
where 1{k1n 1n =W } is a binary matrix whose (i, j)-th element is 1 if w ij = k and Ei = diag(1n − i).
Despite its seemingly complicated form, the hitting-time probabilities in Equation 20 actually
satisfy a linear system with time delays and transition probabilities as inputs. Moreover, the inputs
to this system are directly related to the system matrix, and the hitting-time probabilities are
polynomial functions of the transition probabilities.
Using the notation introduced above, we can explicitly rewrite the return-time entropy of a
Markov chain with travel times as
n ∞
Hret-time (P) = − πi Fk (i, i) log Fk (i, i). 21.
i=1 k=1
Our goal is to find a Markov chain that has the maximum return-time entropy by solving the
following optimization problem.
Problem 6 (return-time entropy maximization). Given a strongly connected directed
weighted graph G = (V, E, W ) and a stationary distribution π, pick a minimum transition
subject to P1n = 1n ,
π P = π ,
pi j ≥ if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
We should clarify several technical details in Problem 6. First, the minimum transition prob-
ability ensures the irreducibility of Markov chains; moreover, it also ensures that the constraint
set of Problem 6 is a compact set of irreducible Markov chains. Second, the objective function
Hret-time (P) as given by Equation 21 is an infinite series; fortunately, by analyzing the linear dy-
namical system Equation 20, we can show that the return-time distributions Fk (i, i) decay expo-
nentially fast if P is irreducible. Therefore, the infinite series is summable and well defined. In fact,
the objective function continuously depends on the transition probabilities.
The above clarifications imply that in Problem 6, a continuous function is optimized over a
compact set. Therefore, there exists at least one optimal solution, and Problem 6 is well posed.
We should also point out that, due to the stationary distribution constraint and by Equation 11,
the return times have bounded expected values, and the surveillance robot will come back to each
location within some time on average.
The difficulty of Problem 6 comes partly from the fact that a closed form of Hret-time (P) as a
function of the transition probabilities is not available. However, in the particular case of a com-
plete graph with unit travel times, the return times can be made to follow geometric distributions,
and their entropy has a closed form. Moreover, by the maximum entropy principle for discrete
random variables (52), the optimal solution is given by P = 1n π .
As in Reference 41, it is also possible to establish a connection between the return-time entropy
(in unweighted graphs) and the entropy rate as follows:
Hrate (P) ≤ Hret-time (P) ≤ nHrate (P), 22.
where the inequalities are tight in the sense that they become equalities in certain cases. This
relationship Equation 22 implies that Hret-time (P) and Hrate (P) are two distinct metrics and are not
interchangeable. In fact, they may differ by a factor of n, which is the dimension of the graph.
Since the closed form of the return-time entropy Hret-time (P) is not available, we truncate the
infinite series and obtain a truncated objective function for computation and optimization pur-
poses. To control the difference between the original and the truncated objective function, for
a given accuracy level η, we use a Markov inequality to select the truncation position Nη such
that the discarded tail return-time probabilities for any location are under η. We then define the
truncated objective function as
n
Nη
projection algorithm to compute a locally optimal solution. We plot the found optimal solution for
the San Francisco map in Figure 9, where Nη is taken to be 2,292, so that the discarded probability
Figure 9
Optimal strategy with maximum return-time entropy Hret
trunc (P) = 5.00, corresponding to Problem 6 in
-time
the San Francisco map.
η is 0.1. Reference 27 provides a performance comparison with other Markov chains in capturing
intruders with certain behaviors.
5. SUMMARY
In this article, we have presented a stochastic approach to the design of robotic surveillance al-
gorithms over graphs. This approach exhibits numerous promising properties that are relevant in
the surveillance applications. We formulated various optimization problems in different settings
when particular metrics are concerned. These formulations accommodate scenarios where differ-
ent characteristics of surveillance strategies are desirable. To facilitate the computation of many
of the strategies discussed in this paper, we have developed software packages available in both
MATLAB and Julia (53).
Stochastic robotic surveillance is still an active area of research, and many problems remain
open. First, regarding the modeling, it is becoming increasingly popular to incorporate adversary
models in the problem formulation. By doing so, we may expect improved surveillance perfor-
mance due to the tailored design against the adversarial behaviors. Game theory is a common
tool that will fit well in the picture. Second, there is still a lack of efficient algorithms to com-
pute surveillance strategies with optimality guarantees. Although effective heuristics have been
proposed in different scenarios, progress in advancing customized computational algorithms is
critical. Finally, most of the problems discussed in this article involve a single surveillance robot;
studies on how to exploit robot teams and design cooperative surveillance systems are of tremen-
dous value in both research and practice.
DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that
might be perceived as affecting the objectivity of this review.
ACKNOWLEDGMENTS
This work was supported in part by Air Force Office of Scientific Research award FA9550-15-1-
0138. The authors thank Pushkarini Agharkar, Andrea Carron, Mishel George, Saber Jafarpour,
and Rushabh Patel for fruitful past collaborations on the themes of this article.