Annurev Control 071520 120123

Annual Review of Control, Robotics, and
Autonomous Systems
Markov Chain–Based
Stochastic Strategies for
Robotic Surveillance
Xiaoming Duan and Francesco Bullo
Department of Mechanical Engineering and Center for Control, Dynamical Systems, and
Computation, University of California, Santa Barbara, California 93106-5070, USA;
email: xiaomingduan@ucsb.edu, bullo@ucsb.edu
Annu. Rev. Control Robot. Auton. Syst. 2021. Keywords

4:243–64
robotic surveillance, Markov chains, hitting times, entropy, optimization
First published as a Review in Advance on
October 28, 2020
Abstract
The Annual Review of Control, Robotics, and
Autonomous Systems is online at This article surveys recent advancements in strategy designs for persistent
control.annualreviews.org robotic surveillance tasks, with a focus on stochastic approaches. The prob-
https://doi.org/10.1146/annurev-control-071520- lem describes how mobile robots stochastically patrol a graph in an efficient
120123 way, where the efficiency is defined with respect to relevant underlying per-
Copyright © 2021 by Annual Reviews. formance metrics. We start by reviewing the basics of Markov chains, which
All rights reserved are the primary motion models for stochastic robotic surveillance. We then
discuss the two main criteria regarding the speed and unpredictability of
surveillance strategies. The central objects that appear throughout the treat-
ment are the hitting times of Markov chains, their distributions, and their
expectations. We formulate various optimization problems based on the rel-
evant metrics in different scenarios and establish their respective properties.
243
1. INTRODUCTION
This article presents a stochastic approach for the design of robotic surveillance algorithms. We
adopt Markov chains as the main algorithmic building block and discuss various properties rel-
evant in surveillance applications. This approach leads to easily implementable and intrinsically
unpredictable surveillance algorithms in which the robots’ visit frequency at different locations
can be conveniently specified in a probabilistic fashion. In particular, the unpredictability of a ran-
dom walk over a graph is a highly desirable characteristic for surveillance strategies in adversarial
settings, because it makes it difficult for potential intruders to take advantage of the robots’ motion
patterns.
In persistent surveillance tasks, mobile robots patrol and visit sites of interest in the environ-
ment to collect sensing data, detect anomalies, or intercept intrusions. Persistent surveillance is
a key component in numerous applications, such as environmental monitoring (1, 2), infrastruc-
ture inspection (3, 4), and urban security (5). In contrast to traditional coverage or search-and-
exploration problems, surveillance tasks require robots to repeatedly visit different locations and
achieve continuous monitoring. Surveillance strategies specify in which order and how often the
robots should visit different locations so that desirable performance is maintained. In this survey,
we adopt a model where the physical environment is discretized and modeled by a connected
graph: Each node of the graph represents a location of interest in the environment, and the edges
in the graph represent the paths between locations. Numerous studies have examined scenarios
where mobile robots move in continuous space (e.g., 6–9) or considered (stochastic) dynamics at
the locations of interest (e.g., 10–13).
1.1. Deterministic Strategy

In the design of deterministic surveillance strategies, the performance criterion of typical interest
is idleness, also known as refresh time or latency. At any given time instant, the idleness of a
location is the time elapsed since the last time the location was visited by a surveillance robot. The
worst idleness at that time instant is the largest value of idleness for all locations, and the average
idleness at that time instant is the average value of idleness for all locations. One then seeks a
strategy that minimizes the largest worst (or average) idleness over the entire (possibly infinite)
surveillance period (14).
In an early work, Chevaleyre (15) showed that a cyclic strategy where a robot travels with the
maximum speed on a shortest closed route through all locations is optimal in the sense that it
minimizes the maximum worst idleness. The author also proposed a partition-based approach
for the multirobot case where the environment is partitioned into disjoint regions so that each
robot patrols a single region. Elmaliach et al. (16) placed a multirobot system composed of ho-
mogeneous robots on a cyclic route in the environment and achieved a uniform visit frequency
to all locations by carefully designing the displacements between robots. Pasqualetti et al. (17)
studied team strategies with a minimum refresh time for the line, tree, and cyclic graphs over a
finite surveillance period and obtained optimal and constant factor suboptimal strategies. When
locations have different priorities, one can define the natural notion of weighted worst idleness,
whereby idleness is penalized proportionally to the location priority. Alamdari et al. (18) stud-
ied single-robot surveillance strategies that minimize the maximum weighted idleness. The au-
thors designed two approximate algorithms to compute strategies whose costs are within factors
of the optimal cost; these factors depend on the distribution of weights or the dimension of the
graph. Given constraints on the weighted idleness at all locations, Asghar et al. (19) proposed
an approximate algorithm to compute the minimum number of robots required to satisfy the
244 Duan • Bullo

idleness requirements. Recently, Afshani et al. (20) proposed surveillance plans for a fixed num-
ber of robots that minimize the maximum weighted idleness. As these studies show, optimization
problems concerning the idleness metrics and deterministic surveillance strategies are usually of a
combinatorial nature, and in most cases, approximation algorithms and suboptimal solutions are
sought.
While leaving any location of interest in the environment unattended for an extended period of
time should clearly be avoided, there are two main challenges for deterministic surveillance strate-
gies. First, as pointed out by Srivastava et al. (21), it is not always possible to assign an arbitrary visit
frequency to different locations in the graph (for recent related work, see Reference 22). Second,
deterministic strategies are fully predictable and can be easily exploited by potential intruders in
adversarial settings. In this case, stochastic strategies become particularly appealing.
1.2. Stochastic Strategy

In adversarial settings, potential intruders may be able to observe and learn the strategies of the
surveillance robot and devise intrusion plans accordingly. For example, when a surveillance robot
adopts a deterministic strategy, the intruder can confidently attack a location immediately after
the surveillance robot leaves that location, because it knows for certain that the robot will not
return to that location for a deterministic known duration. In such scenarios, robotic surveillance
problems must be tackled by resorting to randomized approaches.
One popular technique in such approaches is to describe the surveillance strategy as a Markov
chain. There are several advantages to using Markov chains as stochastic strategies—for exam-
ple, the intrinsic stochasticity of Markov chains leads to more effective strategies against rational
intruders, the visit frequency to different locations can be easily assigned through the station-
ary distribution, and Markov chains as routing algorithms are lightweight and require a minimal
amount of resources to implement and execute. Although high-order Markov chains are more
versatile, their state space grows exponentially fast with the order (memory length), which results
in high computational complexity or intractability. Therefore, first-order Markov chains are the
default choices in the design.
Depending on whether an intruder model is explicitly specified or not, there are two different
common formulations of Markov chain–based surveillance problems in the literature. When no
intruder model is assumed, stochastic strategies are designed based on performance metrics, such
as speed and unpredictability. When a specific intruder model is adopted, tailored strategies to the
intruder behaviors are carefully constructed.
1.2.1. Metric-based design. Commonly used design metrics for stochastic strategies in robotic
surveillance include visit frequency, speed, and unpredictability. Cannata & Sgorbissa (23) pro-
posed a Markov chain–based algorithm for a team of robots to achieve continuous coverage of
an environment with a desirable visit frequency. Smart nodes deployed at different locations are
responsible for recording the visits by robots and directing robots to the next site to be visited. To
obtain fast strategies for the surveillance agent, Patel et al. (24) studied Markov chains with the
minimum weighted mean hitting time. Here, travel times between different locations are given as
edge weights on the graph. The minimization of the weighted mean hitting time is transcribed into
a convex optimization problem for the special case where candidate Markov chains are assumed
to be reversible. The problem also has a semidefinite reformulation and can be solved efficiently.
Later, Patel et al. (25) proposed and studied a similar notion of mean hitting time for a multirobot
system.
www.annualreviews.org • Stochastic Surveillance 245

To obtain maximally unpredictable surveillance strategies, George et al. (26) designed Markov
chains with a maximum entropy rate, establishing numerous properties of the maxentropic Markov
chain and proposing a fast algorithm to compute the optimal strategy. Duan et al. (27) recently in-
troduced and characterized a new notion that quantifies the unpredictability of surveillance strate-
gies through the entropy in return-time distributions. This new concept is particularly relevant
and useful in cases when only local observations are available to potential intruders. Carpin and
colleagues (28, 29) conducted similar studies that proposed inserting random delays into the return
times.
1.2.2. Intruder model–based design. In contrast to the metric-based designs, which do not
assume any explicit intruder behaviors, intruder model–based designs leverage available informa-
tion on the intruder to achieve improved or guaranteed performance. Agmon et al. (30) proposed
a multirobot perimeter patrol problem in which a randomized strategy is used to defend against a
strategic adversary. The adversary knows both the surveillance strategy and the robots’ locations
on the perimeter, and it aims to maximize its probability of success by attacking the weakest spot
along the perimeter. Sless Lin et al. (31) considered coordinated sequential attacks in the mul-
tirobot perimeter patrol problem. Basilico et al. (32) defined a patrolling security game where a
Markovian surveillance robot tries to capture an intruder that has perfect knowledge of the loca-
tion and the strategy of the surveillance agent. The intruder has the freedom to choose where and
when to attack in order to minimize the probability of being captured. Asghar & Smith (33) dis-
cussed various intruder models that characterize different manners in which the intruders attack
the locations. They designed pattern search algorithms to solve for a Markov chain that minimizes
the expected reward for the intruders. Variations of intruder models where intruders have limited
observation time and varying attack duration were studied by Asghar & Smith (34) and Yang et al.
(35), respectively.
Many of the above-mentioned models can be formalized as Stackelberg games. The Stackel-
berg game framework has been successfully applied in practical security domain applications (36).
However, only limited progress has been made on efficient computational methods with optimal-
ity guarantees for surveillance problems with graph constraints.
In this article, we focus on the metric-based design of stochastic surveillance strategies. The
concepts discussed here also appear in other settings in the literature.
1.3. Organization
First, we review the basics of Markov chains and relevant design metrics in Section 2. We then
discuss fast and unpredictable surveillance strategies in various scenarios in Sections 3 and 4, re-
spectively. We conclude the article in Section 5.
1.4. Notation
Let R, Rn , and Rm×n be the set of real numbers, real vectors of dimension n, and real matrices,
respectively. The set of elementwise positive vectors and nonnegative matrices are denoted by
Rn>0 and Rm×n
≥0 . We denote the vector of 1s in R by 1n and the ith standard unit vector by ei ,
n
whose dimension will be clear from the context. For a vector v ∈ Rn , diag(v) ∈ Rn×n is a diagonal
matrix with diagonal elements being v; for a matrix S ∈ Rn×n , diag(S) ∈ Rn×n is a diagonal matrix
with diagonal elements being the same as that of S. For a matrix S ∈ Rm×n , the vectorization
vec(S) ∈ Rmn is constructed by stacking the columns of S on top of one another. The indicator
function is denoted by 1{·} .
246 Duan • Bullo

2. MARKOV CHAINS AND RELEVANT METRICS
In this section, we review the basics of discrete-time homogeneous Markov chains and the relevant
metrics in robotic surveillance applications.
2.1. Markov Chains

A first-order discrete-time homogeneous Markov chain defined over the state space {1, . . . , n} is a
sequence of random variables Xk for k ≥ 0 that satisfies the Markov property: P(Xk+1 = ik+1 | Xk =
ik , . . . , X0 = i0 ) = P(Xk+1 = ik+1 | Xk = ik ) for all k ≥ 0 and i0 , . . . , ik + 1 ࢠ {1, . . . , n}. The Markov
chain {Xk }k ≥ 0 has an associated row-stochastic transition matrix P ∈ Rn×n such that the (i, j)-th
element pi j = P(Xk+1 = j | Xk = i) for i, j ࢠ {1, . . . , n}. The transition diagram of a Markov chain
is a directed graph G = (V, E ), where V = {1, . . . , n} is the set of nodes and E ⊂ V × V is the set of
edges, and (i, j) ∈ E if and only if pij > 0. A Markov chain is irreducible if its transition diagram is
strongly connected. A finite-state irreducible Markov chain P has a unique stationary distribution
π ∈ Rn that satisfies π P = π and πi > 0 for i ࢠ {1, . . . , n}. Moreover, the stationary distribution
π has the interpretation that regardless of the initial condition (37, theorem 2.1),
1
t
as t→∞
1{Xk =i} −−−−→ πi almost surely. 1.
t + 1 k=0
The physical meaning of Equation 1 is that for an irreducible Markov chain, the stationary distri-
bution encodes the visit frequency to different states in the long run. A Markov chain is reversible
if πi pi j = π j p ji for all i, j ࢠ {1, . . . , n}. One can verify that if a positive vector π ∈ Rn>0 satisfies the
reversibility condition for a transition matrix P, then it must be a stationary distribution of P. We
refer readers to References 38 and 39 for more about Markov chains. In this article, we focus on
discrete-time homogeneous irreducible Markov chains with finite states.
In robotic surveillance applications, the environment is sometimes modeled by a weighted
directed graph G = (V, E, W ), where the elements of the weight matrix W = [w ij ] represent the
travel distances or times between locations. A Markov chain defined over a weighted directed
graph G = (V, E, W ) has a transition diagram that is consistent with G—i.e., for all i, j ࢠ {1, . . . , n},
the transition probability pij is greater than or equal to zero if (i, j) ∈ E and equal to zero otherwise.
Moreover, in this case, the weights specify the amount of time it takes for the Markov chain to
transition from one state to another.
2.2. Hitting Times

For a finite-state discrete-time Markov chain, the first hitting time from state i to state j is a random
variable defined by
Ti j = min{k | X0 = i, Xk = j, k ≥ 1}. 2.
This notion can be extended to Markov chains defined over a weighted graph G = (V, E, W ), in
which case the hitting time Ti wj satisfies
k−1

Ti j = min
w
wX ,X+1 | X0 = i, Xk = j, k ≥ 1 . 3.
=0
The hitting times measure how fast a Markov chain transitions from one state to another. If we
require the Markov chains to be as fast as possible, then the speed can be conveniently quantified
by hitting times and their expectations.

2.3. Entropy
Although Markov chains usually come with unpredictability, different Markov chains have differ-
ent levels of randomness, as measured by some notion of entropy. For example, a Hamiltonian
tour in a graph (if one exists) can be represented by a Markov chain, with the transition matrix
being an irreducible permutation matrix. However, a Hamiltonian tour is clearly deterministic
and thus fully predictable. We choose to use entropy to quantify the unpredictability of Markov
chains because (a) entropy is a well-established fundamental concept that characterizes the ran-
domness of probability distributions, and (b) when the surveillance agent is highly entropic, its
strategy (motion pattern) is potentially hard for intruders to learn.
The entropy H(X ) of a discrete random variable X taking values in {1, . . . , n} with distribution
P(X = i) = pi for i ࢠ {1, . . . , n} is defined by

n
H(X ) = − pi log pi .
i=1
To quantify the randomness in Markovian strategies, two different notions of entropy have been
used in the literature.
2.3.1. Entropy rate. A classic notion that measures the randomness of a Markov chain is the
entropy rate. For a Markov chain with the initial distribution being its stationary distribution, its
entropy rate is defined by (40, theorem 4.2.4)

n
n
Hrate (P) = − πi pi j log pi j . 4.
i=1 j=1
The entropy rate measures the randomness in the sequence of states visited by a Markov chain
(26, 27).
2.3.2. Return-time entropy. The entropy of Markov trajectories was first proposed and studied
by Ekroot & Cover (41); in their work, a trajectory Ti j from a state i to a state j is a path with initial
state i, final state j, and no other states being j. They showed that the entropy of the random return
trajectory H(Tii ) is equal to πi Hrate (P) for all i ࢠ {1, . . . , n}. In robotic surveillance applications,
we are more concerned with the amount of time it takes for the surveillance agent to come back
than with the exact trajectory followed by the surveillance agent. This is because intruders often
may have access to the local observations of the intervisit times to a location, and they could attack
accordingly based on this information. We therefore define the return-time entropy of a Markov
chain as a weighted sum of the entropy of return times to locations, with the weights being the
stationary distribution, i.e.,

n
Hret-time (P) = πi H(Tii ), 5.
i=1
where Tii is the return time (a discrete random variable) defined by Equation 2. We will see how
this definition can be generalized to the cases when there are travel times on the surveillance
graph.
2.4. Running Examples

Before we dive into the details of the design of Markov strategies, we introduce two running
examples with which we will illustrate and visualize the optimal strategies to be discussed in
248 Duan • Bullo

a b
1 2 3
4 5 6
7 8 9
Figure 1
Two running examples: (a) a 3 × 3 grid graph with self-loops at all locations and (b) a region map of San Francisco [created with uMap
(42)] modeled by a fully connected graph where the number of crimes during a time period is recorded at 12 locations.
later sections. We consider a 3 × 3 grid graph and a region map of San Francisco (18, sec.
6.2), as shown in Figure 1. In the grid graph, we assume unit travel times (unweighted), and
the surveillance agent is expected to visit all locations with the same frequency, i.e., π = 19 19 .
In the map, there are 12 locations forming a fully connected weighted graph. The weights on
the edges, as documented in Figure 2, are the quantized by-car travel times (in minutes) be-
tween locations. Moreover, each location is associated with a number that indicates the num-
ber of crimes recorded at that location during a certain period of time. The surveillance agent
is expected to have a visit frequency to different locations proportional to the crime rates, i.e.,

π = 133 , 90 , 89 , 87 , 83 , 83 , 74 , 64 , 48 , 43 , 38 , 34 .
866 866 866 866 866 866 866 866 866 866 866 866
Location A B C D E F G H I J K L
A 1 3 3 5 4 6 3 5 7 4 6 6
B 3 1 5 4 2 4 4 5 5 3 5 5
C 3 5 1 7 6 8 3 4 9 4 8 7
D 6 4 7 1 5 6 4 7 5 6 6 7
E 4 3 6 5 1 3 5 5 6 3 4 4
F 6 4 8 5 3 1 6 7 3 6 2 3
G 2 5 3 5 6 7 1 5 7 5 7 8
H 3 5 2 7 6 7 3 1 9 3 7 5
I 8 6 9 4 6 4 6 9 1 8 5 7
J 4 3 4 6 3 5 5 3 7 1 5 3
K 6 4 8 6 4 2 6 6 4 5 1 3
L 6 4 6 6 3 3 6 4 5 3 2 1
Figure 2
Quantized pairwise by-car travel times (in minutes).

3. FAST SURVEILLANCE: MINIMIZATION OF MEAN HITTING TIME
Because hitting times measure how fast a Markov chain travels on a graph, it is natural to formulate
a hitting-time minimization problem for robotic surveillance scenarios where the surveillance
agent is expected to visit different locations in the graph as fast as possible. We first consider
the case when the travel times are unitary and introduce the well-known Kemeny’s constant. We
then investigate how the formulation changes when arbitrary positive real-valued travel times are
present. The development in Sections 3.1 and 3.2 follows from Reference 24, the multi-vehicle
case in Section 3.3 is studied in Reference 25, and the meeting times of Markovian agents in
Section 3.4 are discussed in Reference 43.
3.1. Unit Travel Times

Let mij be the expectation of the hitting time Tij defined in Equation 2 for an irreducible Markov
chain P, i.e., mi j = E[Ti j ]. Then the Markov property implies the following recursive relation:

n
mi j = pi j + pik (1 + mk j ) = 1 + pik mk j − pi j m j j . 6.
k= j k=1
We can write Equation 6 in matrix form as

M = 1n 1
n + P(M − diag(M )). 7.
Two observations are immediate from Equation 7. First, by multiplying from the left the stationary
distribution π of P on both sides of Equation 7 and using π P = π , we have mii = π1i for all
i ࢠ {1, . . . , n} or diag(M ) = diag(π)−1 . Second, by multiplying from the right π on both sides of
Equation 7 and using diag(M ) = diag(π)−1 , we have (I − P)Mπ = 0, which implies that Mπ is a
constant multiple of 1n given that P is irreducible. The mean hitting time of a Markov chain is
then defined for any i ࢠ {1, . . . , n} by

n
M(P) = π j mi j .
j=1
Due to its independence of the initial state i, the mean hitting time M(P) is also known as
Kemeny’s constant, and it has been extensively studied in the literature (44, 45). We can also write
the mean hitting time as

n
n
n
n
M(P) = πi π j mi j = πi π j mi j ,
i=1 j=1 i=1 j=1
which clearly shows that M(P) measures the expected time it takes for the surveillance agent to
transition between two locations randomly selected according to the stationary distribution. The
mean hitting time directly relates to the eigenvalues of the transition matrix as follows (equation
9 in Reference 44): Let λ1 = 1 and |λi | ≤ 1 for i ࢠ {2, . . . , n} be the eigenvalues of an irreducible
transition matrix P; then,

n
1
M(P) = 1 + . 8.
j=2
1 − λj
In robotic surveillance applications, if we require the surveillance robot to visit different lo-
cations as fast as possible, then we could adopt a surveillance strategy that minimizes the mean
hitting time, which we obtain by solving the following optimization problem.
250 Duan • Bullo

a 1 2 3 4 5 6 7 8 9 b 1 2 3
1
2
3
4
4 5 6
5
6
7
8 7 8 9
9
Figure 3
Optimal strategy with minimum mean hitting time M(P) = 6.78, corresponding to Problem 1 in a 3 × 3
grid, showing (a) a pixel image of P and (b) a graphical image of P.
Problem 1 (mean-hitting-time minimization for general Markov chains). Given a

strongly connected directed graph G = (V, E ) and a stationary distribution π, find a Markov
chain with the stationary distribution π that has the minimum mean hitting time—i.e., solve
the following optimization problem:
minimize M(P)
P∈Rn×n
subject to P1n = 1n ,
π P = π ,
pi j ≥ 0 if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
Problem 1 is a difficult nonconvex optimization problem, and the optimal solution is available
only in special cases (45). As an illustration, we solve Problem 1 for the grid graph using the
fmincon function in MATLAB. Since the problem is nonconvex, we start from 100 different initial
conditions and pick the best to be the optimal solution. A pixel image of the obtained solution
is shown in Figure 3, where each row has the transition probabilities from the corresponding
location to other locations and the darkness of a pixel is proportional to the magnitude of the
transition probability. One prominent feature of the optimal strategy is sparsity. Intuitively, it can
be thought of approximately as a Hamiltonian tour that also satisfies the visit frequency.
Problem 1 becomes tractable if we restrict the set of candidate Markov chains to be reversible,
in which case the mean hitting time is a convex function of the transition matrix in the domain.
In fact, by using Equation 8, we can write the mean hitting time as (24, theorem 2)
1 1 √ √
M(P) = Tr((In − diag(π) 2 Pdiag(π)− 2 + π π )−1 ),
where Tr(·) is the trace of a matrix and the square root is elementwise. Based on this, we formulate
a convex optimization problem and solve for the Markov strategy with the minimum mean hitting
time.
Problem 2 (mean-hitting-time minimization for reversible Markov chains). Given
a strongly connected directed graph G = (V, E ) and a stationary distribution π, find a re-
versible Markov chain with the stationary distribution π that has the minimum mean hitting

a 1 2 3 4 5 6 7 8 9 b
1 1 2 3
2
3
4
5 4 5 6
6
7
8
9 7 8 9
Figure 4
Optimal reversible strategy with minimum mean hitting time M(P) = 12.43, corresponding to Problem 2 in
a 3 × 3 grid, showing (a) a pixel image of P and (b) a graphical image of P.
time—i.e., solve the following optimization problem:

1 1 √ √ −1
minimize Tr((In − diag(π) 2 P diag(π)− 2 + π π ) )
P∈Rn×n
πi pi j = π j p ji for all (i, j) ∈ E,

pi j ≥ 0 if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
The convexity of the objective function in Problem 2 follows from the fact that the trace of
the inverse of a positive definite matrix is convex (46). Moreover, Problem 2 has a semidefinite
reformulation and can be solved efficiently (24, problem 2). Note that in both Problem 1 and
Problem 2, although we do not impose combinatorial irreducibility constraints on the Markov
chain, the resulting solution must be irreducible as long as the constraint set is feasible and the
given graph is strongly connected. This is because a reducible solution leads to infinite mean
hitting time. In Reference 47, the Markov chain with minimum mean hitting time is applied
to the quickest detection of environmental anomalies. As a comparison, we solved Problem 2
using CVX (48) and plotted the optimal solution in Figure 4. The optimal strategy given in
Figure 4 is denser than that given in Figure 3. Moreover, Kemeny’s constant is almost doubled
in the reversible case.
3.2. Arbitrary Travel Times

When there are travel times between neighboring locations in the graph, we can define the
weighted mean hitting time similarly to the unweighted case in Section 3.1. Let mwi j be the ex-
pectation of the hitting time Ti wj defined in Equation 3; then,

n
n
mwi j = pi j wi j + pik (wik + mwk j ) = pik wik + pik mwk j − pi j mwj j . 9.
k= j k=1 k=1
252 Duan • Bullo

We can write Equation 9 in matrix form as
M w = (P ° W )1n 1
n + P(M − diag(M )),
w w
10.

where ° is the elementwise product. By left multiplying the stationary distribution π on both
sides of Equation 10, we have that
diag(M w ) = (π (P ° W )1n ) · diag(π)−1 , 11.
which implies that the mean return time to location i is still inversely proportional to πi , as in

the unweighted case. In contrast to the case of unit travel times, the weighted sum nj=1 π j mwi j
depends on the initial state i, and the weighted mean hitting time is then defined as

n
n
Mw (P) = πi π j mwi j .
i=1 j=1
It can be shown that the following relationship between the weighted mean hitting time and the
mean hitting time holds true (24, theorem 8):
Mw (P) = (π (P ° W )1n ) · M(P),
where the impact of the travel times is factored out. To efficiently obtain a fast Markov chain over a
weighted graph, we still restrict the set of candidate Markov chains to be reversible and formulate
the following optimization problem.
Problem 3 (weighted mean-hitting-time minimization for Markov chains). Given a
strongly connected weighted directed graph G = (V, E, W ) and a stationary distribution π,
find a reversible Markov chain with the stationary distribution π that has the minimum
weighted mean hitting time—i.e., solve the following optimization problem:
1 1 √ √
minimize (π (P ° W )1n ) · Tr((In − diag(π) 2 Pdiag(π)− 2 + π π )−1 )
P∈Rn×n
πi pi j = π j p ji for all (i, j) ∈ E,
pi j ≥ 0 if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
It turns out that by a change of variable, Problem 3 can also be recast to an equivalent semidef-
inite programming and thus solved efficiently. The solutions to Problem 3 and its nonreversible
version for the San Francisco map data set are illustrated in Figure 5. Similar to the unweighted
cases discussed in Section 3.1, the optimal solution to Problem 3 is denser than its nonreversible
counterpart and has a higher (worse) optimal value.
It is worth mentioning that the computational tractability of Problems 2 and 3 relies critically
on the reversibility condition on the set of candidate Markov chains. However, reversible Markov
chains are known to be slow (49), which is also illustrated by our numerical examples. Therefore,
how to efficiently solve Problem 1 and its weighted version with optimality guarantees remains
an open and interesting problem.
3.3. Hitting Times for Multiple Surveillance Robots

In this section, we consider a team of robots indexed by {1, . . . , N}, where each robot i patrols a
strongly connected graph G i = (V, E i ) according to an irreducible Markov strategy P i . Note that

a A B C D E F G H I J K L b A B C D E F G H I J K L
A A
B B
C C
D D
E E
F F
G G
H H
I I
J J
K K
L L
Figure 5
Optimal strategies with minimum mean hitting time in the San Francisco map, showing (a) a nonreversible
solution with Mw (P) = 22.19 and (b) a reversible solution Mw (P) = 44.77, corresponding to Problem 3.
the patrolling graphs have a common set of nodes but possibly different sets of edges. The hitting
time for this team of robots from their current locations i1 , . . .iN to a location j is defined as the
first time at least one of the robots arrives at location j, i.e.,
Ti1 ...iN , j = min{k ≥ 1 | Xk1 = j or Xk2 = j . . . or XkN = j,

12.
X0h = ih for h ∈ {1, . . . , N }},
where Xkh is the location of robot h at time k. Let mi1 ...iN , j be the expectation of Ti1 ...iN , j . Then,
similar to the case of single robot before, we have that

N
mi1 ...iN , j = 1 − (1 − phih j ) + ··· p1i1 k1 · · · pNiN kN (1 + mk1 ...kN , j )
h=1 k1 = j kN = j
13.

=1+ ··· p1i1 k1 · · · pNiN kN mk1 ...kN , j .
k1 = j kN = j
N
To write Equation 13 in matrix form, we collect mi1 ...iN , j into M ∈ Rn ×n , where each row of M
contains mean hitting times from a specific robot team configuration (a set of locations currently
occupied by the team of robots) to different locations. The rows of M are arranged by cycling
through the possible locations of robots. Concretely, the mean-hitting-time matrix M has the
following structure:
⎡ ⎤
m1···11,1 m1···11,2 · · · m1···11,n
⎢m ⎥
⎢ 1···12,1 m1···12,2 · · · m1···12,n ⎥
⎢ . . . . ⎥
⎢ . .. .. .. ⎥
⎢ . ⎥
⎢ ⎥
⎢m1···1n,1 m1···1n,2 · · · m1···1n,n ⎥
M=⎢ ⎢ ⎥.
⎥
⎢m1···21,1 m1···21,2 · · · m1···21,n ⎥
⎢ . .. .. .. ⎥
⎢ . ⎥
⎢ . . . . ⎥
⎢ ⎥
⎣mn···nn,1 mn···nn,2 · · · mn···nn,n ⎦
254 Duan • Bullo

Given the structure of M, we can write Equation 13 in matrix form as (25, lemma 3.1)
M = 1nN 1
n + (P ⊗ · · · ⊗ P )(M − Md ),
1 N
14.
nN ×n
where ࣹ is the Kronecker product and Md ∈ R has nonzero elements mi1 ...iN , j only at locations
(i1 . . .iN , j) that satisfy ih = j for at least one h ࢠ {1, . . . , N}. To calculate the mean hitting times, we
vectorize both sides of Equation 14 and obtain
vec(M ) = (InN +1 − (In ⊗ P 1 ⊗ · · · ⊗ P N )(InN +1 − E ))−1 1nN +1 , 15.

nN +1 ×nN +1
where E ∈ R is a binary matrix that satisfies Evec(M) = vec(Md ). The matrix inversion in
Equation 15 is well defined since the robots’ surveillance strategies are all irreducible.
In the multirobot case, the problem dimension grows exponentially fast when we scale up the
number of surveillance robots. Therefore, it is not obvious how the strategies for a team of robots
could be jointly optimized in an efficient manner. One can certainly define a mean hitting time for
the team of robots analogously to the single-robot case. However, the interpretation is not as clean,
and the optimization simply becomes intractable. Designing multirobot surveillance strategies as
a whole is an ongoing problem.
3.4. Meeting Times of Two Markovian Agents

In this section, we study the meeting times of two Markovian agents. The development still has
the hitting times as the core elements and is also similar to the case of multiple surveillance agents.
We consider a surveillance robot with a Markov strategy P p and an evader with a Markov strategy
P e , both defined over a common strongly connected directed graph G = (V, E ). Starting from two
initial positions i, j ࢠ V, the meeting time of the two agents is defined by
p p
Ti j = min{k ≥ 1 | Xk = Xke , X0 = i, X0e = j},
p
where Xk and Xke are the locations of the surveillance robot and the evader at time k ≥ 0, respec-
tively. It should be emphasized that here, the subscripts i and j of Tij represent the initial locations
of two different agents. If the evader is stationary and its strategy is given by Pe = In , then the meet-
ing time Tij corresponds exactly to the hitting time of the surveillance agent with the strategy Pp .
Let mij be the expectation of Tij ; it then follows that

n
p
p
p
mi j = pik pejk + pik1 pejh1 (1 + mk1 h1 ) = 1 + pik1 pejh1 mk1 h1 . 16.
k=1 k1 =h1 k1 =h1
Once again, we can organize mij into a matrix M ∈ Rn×n and obtain
M = 1n 1
n + P (M − diag(M ))P .
p e
17.
Unlike previous cases, where the irreducibility of individual strategies is sufficient to ensure the
finiteness of the mean hitting times, the mean meeting time may be infinite even if both P p and P e
are irreducible. In fact, for two Markovian agents to meet in finite time from any initial positions,
there must exist paths of equal length that lead to a common position for the agents from their
respective initial positions; the simple example shown in Figure 6 provides an illustration. The
finiteness of meeting times can be studied systematically by looking at the Kronecker product of
graphs (50); for more details, we refer interested readers to Reference 43.
With the mean-meeting-time matrix M and the stationary distributions πp and πe for the
surveillance agent and the evader, we can define the expected meeting time as
M = πp Mπe . 18.

a b
Pp 1 2 Pp 1 2
Pe 1 2 Pe 1 2
Figure 6
A simple example of (a) finite and (b) infinite meeting times for a pursuer and evader. In panel a, the pursuer
and evader have finite meeting times, because starting from any positions for the agents, there exist paths to
the common locations (1, 1) and (2, 2). In panel b, the pursuer and evader have infinite meeting times,
because there exist no paths to common locations from locations (1, 2) and (2, 1).
Since we do not require the strategies for the surveillance agent and the evader to be irreducible,
the stationary distributions πp and πe may not be unique. However, the expected meeting time
in Equation 18 can still be defined as long as all entries of M are finite and one set of stationary
distributions is specified. Given the evader’s strategy P e and a stationary distribution πe , we can
solve the following optimization problem to find a surveillance strategy P p that captures (meets)
the evader as fast as possible.
Problem 4 (expected-meeting-time minimization). Given a strongly connected di-
rected graph G = (V, E ), an evader’s strategy P e with a stationary distribution πe , and a
stationary distribution πp for the surveillance agent, find a Markov chain with the station-
ary distribution πp that has the minimum expected meeting time—i.e., solve the following
optimization problem:
minimize
p
πp Mπe
P ∈Rn×n
subject to πp P p = πp ,

P p 1n = 1 n ,
p
pi, j ≥ 0 if (i, j) ∈ E,
p
pi, j = 0 if (i, j) ∈
/ E.
Problem 4 (expected-meeting-time minimization) has Problem 1 (mean-hitting-time min-

imization) as a special case and thus is also difficult to solve. In fact, we can interpret the
mean-hitting-time minimization problem as an expected-meeting-time minimization problem:
A Markovian surveillance agent is chasing a stationary evader whose strategy is given by P e =
In and πe = πp . With the exception of the special cases discussed in Reference 43, obtaining a
locally optimal solution requires the use of numerical solvers. We plotted the optimal strategy
for a pursuer chasing a randomly walking evader in Figure 7, where the evader moves from its
current location to neighboring locations (including a self-loop) with equal probabilities. The pur-
suer’s strategy has a strong sparsity pattern similar to that shown in Figure 3. Note that in this
case, the pursuer has a uniform stationary distribution, while the evader’s stationary distribution
is proportional to the node degrees.
4. UNPREDICTABLE SURVEILLANCE: MAXIMIZATION OF ENTROPY

One of the motivations for adopting Markov chains as surveillance strategies is to introduce
stochasticity into the motion plans of the surveillance robot. In this section, we design surveillance
256 Duan • Bullo

a 1 2 3 4 5 6 7 8 9 b
1 1 2 3
2
3
4
5 4 5 6
6
7
8
9 7 8 9
Figure 7
Optimal strategy with minimum expected meeting time M(P) = 9.86 against a randomly walking evader,
corresponding to Problem 4 in a 3 × 3 grid, showing (a) a pixel image of P and (b) a graphical image of P.
strategies that are maximally unpredictable and thus hard for potential intruders to exploit. We
utilize two different notions of entropy to quantify the unpredictability of Markov chains. The
optimization of the entropy rate in Section 4.1 is studied in Reference 26, and the treatment of
the return-time entropy in Section 4.2 follows from Reference 27.
4.1. Maximization of Entropy Rate

As introduced in Section 2.3.1, the entropy rate is a classic metric that measures the randomness
of the random sequence generated by Markov chains. In robotic surveillance, this sequence is
composed of locations visited by the surveillance robot. Since the travel times on the graph do
not affect the entropy rate of the strategy, we consider unweighted graphs in this section. More-
over, we allow self-loops at all locations and assume that the graph is symmetric, as it turns out
that the optimal strategy in this setting can be computed very efficiently. To find a Markov chain
that maximizes the entropy rate defined in Equation 4, we formulate the following optimization
problem.
Problem 5 (entropy-rate maximization). Given a strongly connected directed graph G =
(V, E ) with self-loops and a symmetric adjacency matrix and a stationary distribution π, find
a Markov chain with the stationary distribution π that has the maximum entropy rate—i.e.,
solve the following optimization problem:
maximize Hrate (P)

P∈Rn×n
π P = π ,
pi j ≥ 0 if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
As in previous problems, a stationary distribution is specified in Problem 5. A similar problem

where only the graph structure is given was considered in Reference 51, where the optimal solution

is carefully constructed. One may recognize that Problem 5 is a convex problem even without
requiring the graph to be symmetric and have self-loops, and it can be solved by generic convex
programming solvers. However, Problem 5 exhibits several intriguing properties and allows for
faster iterative algorithms. Let A ∈ Rn×n be the unit-diagonal symmetric adjacency matrix of the
strongly connected graph G = (V, E ). We define the maxentropic vector map φ : Rn>0 → Rn>0 by
φ(x) = diag(x)Ax.
Moreover, we define the maxentropic matrix map : Rn>0 → R≥0
n×n
by
(x) = diag(Ax)−1 Adiag(x).
These two maps are essential for constructing the optimal solution to Problem 5, and many nice
properties of them were summarized by George et al. (26, theorems 2 and 3). To solve Problem 5,
we take the following steps:
1. Find x∗ ∈ Rn>0 such that φ(x∗ ) = π—i.e., solve the inverse maxentropic vector map x∗ =
φ −1 (π).
2. Construct the optimal solution to Problem 5 as P∗ = (x∗ ) = diag(Ax∗ )−1 Adiag(x∗ ).
3. Compute the optimal value by Hrate (P ∗ ) = −2x∗ Adiag(x∗ ) log x∗ + π log π, where the
logarithm is taken elementwise.
In the procedure outlined above, once the inverse maxentropic vector map x∗ = φ −1 (π) in step 1
is solved, the optimal solution and its value can be obtained straightforwardly in steps 2 and 3.
First of all, as ensured by one of the properties of the inverse maxentropic vector map, step 1 is
well defined in the sense that for any π ∈ Rn>0 , there exists a unique x∗ that satisfies φ(x∗ ) = π.
Moreover, let
⎧ ⎫
⎨ n ⎬
√ 1
η = max ai j π j , x0 = n π;
i ⎩ ⎭
j=1 maxi { j=1 ai j π j }
then, the following linear iteration converges to x∗ (26, theorem 4):

1
xk+1 = xk − (diag(xk )Axk − π), for k ≥ 0.
2η
George et al. (26) provided numerical comparisons and a computational complexity analysis
that clearly showed the efficiency of the proposed linear iteration. It is also worth mentioning
that the optimal solution to Problem 5 is automatically reversible (26, theorem 6). Here, we solve
Problem 5 using the linear iteration and plot the solution in Figure 8. The solution has a sim-
ilar pattern as the reversible chain with a minimum Kemeny’s constant. However, the transition
probabilities are shaped differently, and the strategy with maximum entropy rate has self-loops at
all locations.
4.2. Maximization of Return-Time Entropy

The entropy rate is a global measure of unpredictability of Markov chains—i.e., it is concerned
with the entire sequence of locations visited by the surveillance robot. However, from a local
observer’s point of view, the intervisit times at a location of interest may be the only available
information. For Markov chains, these intervisit times correspond exactly to the first return times
Tii for locations i ࢠ {1, . . . , n}. We propose that, to make it difficult for the observers (potential
intruders) to predict when the surveillance robot will come back after it leaves the current location,
one should maximize the unpredictability in the return times measured by the entropy. Note that
258 Duan • Bullo

a 1 2 3 4 5 6 7 8 9 b
1 1 2 3
2
3
4
5 4 5 6
6
7
8
9 7 8 9
Figure 8
Optimal strategy with maximum entropy rate Hrate (P) = 1.27, corresponding to Problem 5 in a 3 × 3 grid,
showing (a) a pixel image of P and (b) a graphical image of P.
the return times depend not only on the Markov strategy but also on the travel times between
locations. In this section, we consider strongly connected directed graphs with integer-valued edge
weights. Assuming integer-valued travel times is not restrictive because we can always quantize and
approximate real-valued travel times reasonably well by choosing appropriate quantization units.
Moreover, it is much more convenient to characterize the return-time distributions if the travel
times are integer valued.
To ease the notation, we introduce the shorthand Fk ∈ Rn×n , where the (i, j)-th element of Fk
represents the probability that the hitting time Ti wj takes value k, i.e., Fk (i, j) = P(Ti wj = k). We
then have the following recursion:

Fk (i, j) = pi j 1{k=wi j } + pih Fk−wih (h, j), 19.
h= j
with the initial conditions Fk = 0n×n for k ≤ 0. Equation 19 can also be arranged in a vectorized
form as

n
n
vec(Fk ) = vec(P ° 1{k1n 1n =W } ) + pi j (E j ⊗ ij )vec(Fk−wi j ), 20.
i=1 j=1
where 1{k1n 1n =W } is a binary matrix whose (i, j)-th element is 1 if w ij = k and Ei = diag(1n − i).
Despite its seemingly complicated form, the hitting-time probabilities in Equation 20 actually
satisfy a linear system with time delays and transition probabilities as inputs. Moreover, the inputs
to this system are directly related to the system matrix, and the hitting-time probabilities are
polynomial functions of the transition probabilities.
Using the notation introduced above, we can explicitly rewrite the return-time entropy of a
Markov chain with travel times as

n ∞
Hret-time (P) = − πi Fk (i, i) log Fk (i, i). 21.
i=1 k=1
Our goal is to find a Markov chain that has the maximum return-time entropy by solving the
following optimization problem.
Problem 6 (return-time entropy maximization). Given a strongly connected directed
weighted graph G = (V, E, W ) and a stationary distribution π, pick a minimum transition

probability > 0 and find a Markov chain with the stationary distribution π that has the
maximum return-time entropy—i.e., solve the following optimization problem:
maximize Hret-time (P)
P∈Rn×n
π P = π ,
pi j ≥ if (i, j) ∈ E,
pi j = 0 if (i, j) ∈
/ E.
We should clarify several technical details in Problem 6. First, the minimum transition prob-
ability ensures the irreducibility of Markov chains; moreover, it also ensures that the constraint
set of Problem 6 is a compact set of irreducible Markov chains. Second, the objective function
Hret-time (P) as given by Equation 21 is an infinite series; fortunately, by analyzing the linear dy-
namical system Equation 20, we can show that the return-time distributions Fk (i, i) decay expo-
nentially fast if P is irreducible. Therefore, the infinite series is summable and well defined. In fact,
the objective function continuously depends on the transition probabilities.
The above clarifications imply that in Problem 6, a continuous function is optimized over a
compact set. Therefore, there exists at least one optimal solution, and Problem 6 is well posed.
We should also point out that, due to the stationary distribution constraint and by Equation 11,
the return times have bounded expected values, and the surveillance robot will come back to each
location within some time on average.
The difficulty of Problem 6 comes partly from the fact that a closed form of Hret-time (P) as a
function of the transition probabilities is not available. However, in the particular case of a com-
plete graph with unit travel times, the return times can be made to follow geometric distributions,
and their entropy has a closed form. Moreover, by the maximum entropy principle for discrete
random variables (52), the optimal solution is given by P = 1n π .
As in Reference 41, it is also possible to establish a connection between the return-time entropy
(in unweighted graphs) and the entropy rate as follows:
Hrate (P) ≤ Hret-time (P) ≤ nHrate (P), 22.
where the inequalities are tight in the sense that they become equalities in certain cases. This
relationship Equation 22 implies that Hret-time (P) and Hrate (P) are two distinct metrics and are not
interchangeable. In fact, they may differ by a factor of n, which is the dimension of the graph.
Since the closed form of the return-time entropy Hret-time (P) is not available, we truncate the
infinite series and obtain a truncated objective function for computation and optimization pur-
poses. To control the difference between the original and the truncated objective function, for
a given accuracy level η, we use a Markov inequality to select the truncation position Nη such
that the discarded tail return-time probabilities for any location are under η. We then define the
truncated objective function as

n
Nη
Hret -time (P) = −

trunc
πi Fk (i, i) log Fk (i, i).
i=1 k=1
The objective function in Problem 6 is replaced with Hrettrunc

-time (P) when solving the optimization
problem. Based on the gradient formulas for Hret-time (P) (27, lemma 18), we utilize the gradient
trunc
projection algorithm to compute a locally optimal solution. We plot the found optimal solution for
the San Francisco map in Figure 9, where Nη is taken to be 2,292, so that the discarded probability
260 Duan • Bullo

A B C D E F G H I J K L
A
B
C
D
E
F
G
H
I
J
K
L
Figure 9
Optimal strategy with maximum return-time entropy Hret
trunc (P) = 5.00, corresponding to Problem 6 in
-time
the San Francisco map.
η is 0.1. Reference 27 provides a performance comparison with other Markov chains in capturing
intruders with certain behaviors.
5. SUMMARY
In this article, we have presented a stochastic approach to the design of robotic surveillance al-
gorithms over graphs. This approach exhibits numerous promising properties that are relevant in
the surveillance applications. We formulated various optimization problems in different settings
when particular metrics are concerned. These formulations accommodate scenarios where differ-
ent characteristics of surveillance strategies are desirable. To facilitate the computation of many
of the strategies discussed in this paper, we have developed software packages available in both
MATLAB and Julia (53).
Stochastic robotic surveillance is still an active area of research, and many problems remain
open. First, regarding the modeling, it is becoming increasingly popular to incorporate adversary
models in the problem formulation. By doing so, we may expect improved surveillance perfor-
mance due to the tailored design against the adversarial behaviors. Game theory is a common
tool that will fit well in the picture. Second, there is still a lack of efficient algorithms to com-
pute surveillance strategies with optimality guarantees. Although effective heuristics have been
proposed in different scenarios, progress in advancing customized computational algorithms is
critical. Finally, most of the problems discussed in this article involve a single surveillance robot;
studies on how to exploit robot teams and design cooperative surveillance systems are of tremen-
dous value in both research and practice.
DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that
might be perceived as affecting the objectivity of this review.
ACKNOWLEDGMENTS
This work was supported in part by Air Force Office of Scientific Research award FA9550-15-1-
0138. The authors thank Pushkarini Agharkar, Andrea Carron, Mishel George, Saber Jafarpour,
and Rushabh Patel for fruitful past collaborations on the themes of this article.

LITERATURE CITED
1. Yuan C, Zhang Y, Liu Z. 2015. A survey on technologies for automatic forest fire monitoring, detection,
and fighting using unmanned aerial vehicles and remote sensing techniques. Can. J. For. Res. 45:783–92.
https://doi.org/10.1139/cjfr-2014-0347
2. Shukla A, Karki H. 2016. Application of robotics in offshore oil and gas industry—a review part II. Robot.
Auton. Syst. 75:508–24. https://doi.org/10.1016/j.robot.2015.09.013
3. Di Paola D, Milella A, Cicirelli G, Distante A. 2010. An autonomous mobile robotic system for surveil-
lance of indoor environments. Int. J. Robot. Res. 7:19–26. https://doi.org/10.5772/7254
4. Witwicki S, Castillo JC, Messias J, Capitán J, Melo FS, et al. 2017. Autonomous surveillance robots: a
decision-making framework for networked muiltiagent systems. IEEE Robot. Autom. Mag. 24(3):52–64.
https://doi.org/10.1109/MRA.2017.2662222
5. Räty TD. 2010. Survey on contemporary remote surveillance systems for public safety. IEEE Trans. Syst.
Man Cybernet. C 40:493–515. https://doi.org/10.1109/TSMCC.2010.2042446
6. Pasqualetti F, Durham JW, Bullo F. 2012. Cooperative patrolling via weighted tours: performance anal-
ysis and distributed algorithms. IEEE Trans. Robot. 28:1181–88. https://doi.org/10.1109/TRO.2012.
2201293
7. Cassandras CG, Lin X, Ding X. 2013. An optimal control approach to the multi-agent persistent moni-
toring problem. IEEE Trans. Autom. Control 58:947–61. https://doi.org/10.1109/TAC.2012.2225539
8. Zhou N, Yu X, Anderson SB, Cassandras CG. 2018. Optimal event-driven multiagent persistent moni-
toring of a finite set of data sources. IEEE Trans. Autom. Control 63:4204–17. https://doi.org/10.1109/
TAC.2018.2829469
9. Wang Y, Wei Y, Liu X, Zhou N, Cassandras CG. 2019. Optimal persistent monitoring using second-order
agents with physical constraints. IEEE Trans. Autom. Control 64:3239–52. https://doi.org/10.1109/TAC.
2018.2879946
10. Smith SL, Schwager M, Rus D. 2012. Persistent robotic tasks: monitoring and sweeping in changing
environments. IEEE Trans. Robot. 28:410–26. https://doi.org/10.1109/TRO.2011.2174493
11. Yu J, Karaman S, Rus D. 2015. Persistent monitoring of events with stochastic arrivals at multiple stations.
IEEE Trans. Robot. 31:521–35. https://doi.org/10.1109/TRO.2015.2409453
12. Rezazadeh N, Kia SS. 2019. A sub-modular receding horizon approach to persistent monitoring for a
group of mobile agents over an urban area. In 8th IFAC Workshop on Distributed Estimation and Control
in Networked Systems, ed. D Gayme, pp. 217–22. IFAC-PapersOnLine Vol. 52(20). Amsterdam: Elsevier.
https://doi.org/10.1016/j.ifacol.2019.12.161
13. Yu X, Andersson SB, Zhou N, Cassandras CG. 2020. Scheduling multiple agents in a persistent monitor-
ing task using reachability analysis. IEEE Trans. Autom. Control 65:1499–513. https://doi.org/10.1109/
TAC.2019.2922506
14. Machado A, Ramalho G, Zucker JD, Drogoul A. 2003. Multi-agent patrolling: an empirical analysis
of alternative architectures. In Multi-Agent-Based Simulation II, ed. J Simão Sichman, F Bousquet, P
Davidsson, pp. 155–70. Berlin: Springer. https://doi.org/10.1007/3-540-36483-8_11
15. Chevaleyre Y. 2004. Theoretical analysis of the multi-agent patrolling problem. In IEEE/WIC/ACM In-
ternational Conference on Intelligent Agent Technology, 2004, pp. 302–8. Piscataway, NJ: IEEE. https://doi.
org/10.1109/IAT.2004.1342959
16. Elmaliach Y, Agmon N, Kaminka GA. 2009. Multi-robot area patrol under frequency constraints. Ann.
Math. Artif. Intell. 57:293–320. https://doi.org/10.1007/s10472-010-9193-y
17. Pasqualetti F, Franchi A, Bullo F. 2012. On cooperative patrolling: optimal trajectories, complexity analy-
sis and approximation algorithms. IEEE Trans. Robot. 28:592–606. https://doi.org/10.1109/TRO.2011.
2179580
18. Alamdari S, Fata E, Smith SL. 2014. Persistent monitoring in discrete environments: minimizing the
maximum weighted latency between observations. Int. J. Robot. Res. 33:138–54. https://doi.org/10.1177/
0278364913504011
19. Asghar AB, Smith SL, Sundaram S. 2019. Multi-robot routing for persistent monitoring with latency
constraints. In 2019 American Control Conference, pp. 2620–25. Piscataway, NJ: IEEE. https://doi.org/
10.23919/ACC.2019.8814485
262 Duan • Bullo

20. Afshani P, Berg MD, Buchin K, Gao J, Loffler M, et al. 2020. Approximation algorithms for multi-robot
patrol-scheduling with min-max latency. arXiv:2005.02530 [cs.DS]. https://arxiv.org/pdf/2005.02530
21. Srivastava K, Stipanovi DM, Spong MW. 2009. On a stochastic robotic surveillance problem. In Pro-
ceedings of the 48th IEEE Conference on Decision and Control Held Jointly with 2009 28th Chinese Control
Conference, pp. 8567–74. Piscataway, NJ: IEEE. https://doi.org/10.1109/CDC.2009.5400569
22. Kempe D, Schulman LJ, Tamuz O. 2018. Quasi-regular sequences and optimal schedules for security
games. In ACM-SIAM Symposium on Discrete Algorithms, pp. 1625–44. Philadelphia: Soc. Ind. Appl. Math.
https://doi.org/10.1137/1.9781611975031.106
23. Cannata G, Sgorbissa A. 2011. A minimalist algorithm for multirobot continuous coverage. IEEE Trans.
Robot. 27:297–312. https://doi.org/10.1109/TRO.2011.2104510
24. Patel R, Agharkar P, Bullo F. 2015. Robotic surveillance and Markov chains with minimal weighted
Kemeny constant. IEEE Trans. Autom. Control 60:3156–67. https://doi.org/10.1109/TAC.2015.
2426317
25. Patel R, Carron A, Bullo F. 2016. The hitting time of multiple random walks. SIAM J. Matrix Anal. Appl.
37:933–54. https://doi.org/10.1137/15M1010737
26. George M, Jafarpour S, Bullo F. 2019. Markov chains with maximum entropy for robotic surveillance.
IEEE Trans. Autom. Control 64:1566–80. https://doi.org/10.1109/TAC.2018.2844120
27. Duan X, George M, Bullo F. 2020. Markov chains with maximum return time entropy for robotic surveil-
lance. IEEE Trans. Autom. Control 65:72–86. https://doi.org/10.1109/TAC.2019.2906473
28. Alvarenga CD, Basilico N, Carpin S. 2019. Time-varying graph patrolling against attackers with locally
limited and imperfect observation models. In 2019 IEEE/RSJ International Conference on Intelligent Robots
and Systems, pp. 4869–76. Piscataway, NJ: IEEE. https://doi.org/10.1109/IROS40897.2019.8967770
29. Basilico N, Carpin S. 2020. Balancing unpredictability and coverage in adversarial patrolling settings.
In Algorithmic Foundations of Robotics XIII, ed. M Morales, L Tapia, G Sánchez-Ante, S Hutchinson,
pp. 762–77. Cham, Switz.: Springer. https://doi.org/10.1007/978-3-030-44051-0_44
30. Agmon N, Kaminka GA, Kraus S. 2011. Multi-robot adversarial patrolling: facing a full-knowledge op-
ponent. J. Artif. Intell. Res. 42:887–916. https://www.jair.org/index.php/jair/article/view/10742
31. Sless Lin E, Agmon N, Kraus S. 2019. Multi-robot adversarial patrolling: handling sequential attacks.
Artif. Intell. 274:1–25. https://doi.org/10.1016/j.artint.2019.02.004
32. Basilico N, Gatti N, Amigoni F. 2012. Patrolling security games: definition and algorithms for solving
large instances with single patroller and single intruder. Artif. Intell. 184–85:78–123. https://doi.org/10.
1016/j.artint.2012.03.003
33. Asghar AB, Smith SL. 2016. Stochastic patrolling in adversarial settings. In 2016 American Control Con-
ference, pp. 6435–40. Piscataway, NJ: IEEE. https://doi.org/10.1109/ACC.2016.7526682
34. Asghar AB, Smith SL. 2018. A patrolling game for adversaries with limited observation time. In 2018
IEEE Conference on Decision and Control, pp. 3305–10. Piscataway, NJ: IEEE. https://doi.org/10.1109/
CDC.2018.8619136
35. Yang H, Tsai S, Liu KS, Lin S, Gao J. 2019. Patrol scheduling against adversaries with varying attack
durations. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems,
pp. 1179–88. Richland, SC: Int. Found. Auton. Agents Multiagent Syst. https://dl.acm.org/doi/abs/10.
5555/3306127.3331819
36. Sinha A, Fang F, An B, Kiekintveld C, Tambe M. 2018. Stackelberg Security Games: looking beyond a
decade of success. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence,
pp. 5494–501. Calif.: IJCAI. https://doi.org/10.24963/ijcai.2018/775
37. Aldous D, Fill JA. 2002. Reversible Markov chains and random walks on graphs. Unfinished monograph,
recompiled 2014. https://www.stat.berkeley.edu/∼aldous/RWG/book.html
38. Kemeny JG, Snell JL. 1976. Finite Markov Chains. New York: Springer
39. Norris JR. 1997. Markov Chains. Cambridge, UK: Cambridge Univ. Press. https://doi.org/10.1017/
CBO9780511810633
40. Cover TM, Thomas JA. 2012. Elements of Information Theory. New York: Wiley & Sons
41. Ekroot L, Cover TM. 1993. The entropy of Markov trajectories. IEEE Trans. Inf. Theory 39:1418–21.
https://doi.org/10.1109/18.243461

42. OpenStreetMap. 2020. uMap. OpenStreetMap Wiki. https://wiki.openstreetmap.org/wiki/UMap
43. Duan X, George M, Patel R, Bullo F. 2020. Robotic surveillance based on the meeting time of random
walks. IEEE Trans. Robot. 36:1356–62. https://doi.org/10.1109/TRO.2020.2990362
44. Hunter JJ. 2014. The role of Kemeny’s constant in properties of Markov chains. Commun. Stat. Theory
Methods 43:1309–21. https://doi.org/10.1080/03610926.2012.741742
45. Kirkland S. 2014. On the Kemeny constant and stationary distribution vector for a Markov chain. Electron.
J. Linear Algebra 27:354–72. https://doi.org/10.13001/1081-3810.1624
46. Ghosh A, Boyd S, Saberi A. 2008. Minimizing effective resistance of a graph. SIAM Rev. 50:37–66. https://
doi.org/10.1137/050645452
47. Agharkar P, Bullo F. 2016. Quickest detection over robotic roadmaps. IEEE Trans. Robot. 32:252–59.
https://doi.org/10.1109/TRO.2015.2506165
48. Grant M, Boyd S. 2020. CVX: Matlab software for disciplined convex programming. CVX Research. http://
cvxr.com/cvx
49. Jung K, Shah D, Shin J. 2010. Distributed averaging via lifted Markov chains. IEEE Trans. Inf. Theory
56:634–47. https://doi.org/10.1109/TIT.2009.2034777
50. Weichsel PM. 1962. The Kronecker product of graphs. Proc. Am. Math. Soc. 13:47–52. https://doi.org/
10.2307/2033769
51. Burda Z, Duda J, Luck JM, Waclaw B. 2009. Localization of the maximal entropy random walk. Phys. Rev.
Lett. 102:160602. https://doi.org/10.1103/PhysRevLett.102.160602
52. Guiasu S, Shenitzer A. 1985. The principle of maximum entropy. Math. Intell. 7:42–48. https://doi.org/
10.1007/BF03023004
53. Wang H. 2019. RoboSurv. GitHub. https://github.com/HanWang99/RoboSurv
264 Duan • Bullo

Annurev Control 071520 120123

Uploaded by

Copyright:

Available Formats

Annurev Control 071520 120123

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Annurev Control 071520 120123

Uploaded by

Copyright:

Available Formats

Annual Review of Control, Robotics, and

Annu. Rev. Control Robot. Auton. Syst. 2021. Keywords

1.1. Deterministic Strategy

244 Duan • Bullo

1.2. Stochastic Strategy

www.annualreviews.org • Stochastic Surveillance 245

246 Duan • Bullo

2.1. Markov Chains

2.2. Hitting Times

www.annualreviews.org • Stochastic Surveillance 247

2.4. Running Examples

248 Duan • Bullo

www.annualreviews.org • Stochastic Surveillance 249

3.1. Unit Travel Times

We can write Equation 6 in matrix form as

250 Duan • Bullo

Problem 1 (mean-hitting-time minimization for general Markov chains). Given a

www.annualreviews.org • Stochastic Surveillance 251

time—i.e., solve the following optimization problem:

πi pi j = π j p ji for all (i, j) ∈ E,

3.2. Arbitrary Travel Times

252 Duan • Bullo

3.3. Hitting Times for Multiple Surveillance Robots

www.annualreviews.org • Stochastic Surveillance 253

Ti1 ...iN , j = min{k ≥ 1 | Xk1 = j or Xk2 = j . . . or XkN = j,

254 Duan • Bullo

vec(M ) = (InN +1 − (In ⊗ P 1 ⊗ · · · ⊗ P N )(InN +1 − E ))−1 1nN +1 , 15.

3.4. Meeting Times of Two Markovian Agents

M = πp Mπe . 18.

www.annualreviews.org • Stochastic Surveillance 255

subject to πp P p = πp ,

Problem 4 (expected-meeting-time minimization) has Problem 1 (mean-hitting-time min-

4. UNPREDICTABLE SURVEILLANCE: MAXIMIZATION OF ENTROPY

256 Duan • Bullo

4.1. Maximization of Entropy Rate

maximize Hrate (P)

As in previous problems, a stationary distribution is specified in Problem 5. A similar problem

www.annualreviews.org • Stochastic Surveillance 257

then, the following linear iteration converges to x∗ (26, theorem 4):

4.2. Maximization of Return-Time Entropy

258 Duan • Bullo

www.annualreviews.org • Stochastic Surveillance 259

Hret -time (P) = −

The objective function in Problem 6 is replaced with Hrettrunc

260 Duan • Bullo

www.annualreviews.org • Stochastic Surveillance 261

262 Duan • Bullo

www.annualreviews.org • Stochastic Surveillance 263

264 Duan • Bullo

You might also like

M = πp Mπe . 18.

subject to πp P p = πp ,