Factor Graphs For Robot Perception
Factor Graphs For Robot Perception
Factor Graphs For Robot Perception
1 Introduction 2
1.1 Inference Problems in Robotics . . . . . . . . . . . . . . . 3
1.2 Probabilistic Modeling . . . . . . . . . . . . . . . . . . . . 4
1.3 Bayesian Networks for Generative Modeling . . . . . . . . 5
1.4 Specifying Probability Densities . . . . . . . . . . . . . . . 7
1.5 Simulating from a Bayes Net Model . . . . . . . . . . . . 8
1.6 Maximum a Posteriori Inference . . . . . . . . . . . . . . 9
1.7 Factor Graphs for Inference . . . . . . . . . . . . . . . . . 11
1.8 Computations Supported by Factor Graphs . . . . . . . . . 13
1.9 Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.10 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . 15
ii
iii
3 Exploiting Sparsity 30
3.1 On Sparsity . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1.1 Motivating Example . . . . . . . . . . . . . . . . . 30
3.1.2 The Sparse Jacobian and its Factor Graph . . . . . 31
3.1.3 The Sparse Information Matrix and its Graph . . . 32
3.2 The Elimination Algorithm . . . . . . . . . . . . . . . . . 34
3.3 Sparse Matrix Factorization as Variable Elimination . . . . 37
3.3.1 Sparse Gaussian Factors . . . . . . . . . . . . . . . 37
3.3.2 Forming the Product Factor . . . . . . . . . . . . 38
3.3.3 Eliminating a Variable using Partial QR . . . . . . 39
3.3.4 Multifrontal QR Factorization . . . . . . . . . . . . 39
3.4 The Sparse Cholesky Factor as a Bayes Net . . . . . . . . 42
3.4.1 Linear-Gaussian Conditionals . . . . . . . . . . . . 42
3.4.2 Solving a Bayes Net is Back-substitution . . . . . . 43
3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.6 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . 44
4 Elimination Ordering 47
4.1 Complexity of Elimination . . . . . . . . . . . . . . . . . . 47
4.2 Variable Ordering Matters . . . . . . . . . . . . . . . . . . 49
4.3 The Concept of Fill-in . . . . . . . . . . . . . . . . . . . . 51
4.4 Ordering Heuristics . . . . . . . . . . . . . . . . . . . . . 52
4.4.1 Minimum Degree Orderings . . . . . . . . . . . . . 52
4.4.2 Nested Dissection Orderings . . . . . . . . . . . . 53
4.5 Ordering Heuristics in Robotics . . . . . . . . . . . . . . . 54
4.6 Nested Dissection and SLAM . . . . . . . . . . . . . . . . 58
4.7 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . 60
6 Optimization on Manifolds 82
6.1 Attitude and Heading Estimation . . . . . . . . . . . . . . 82
6.1.1 Incremental Rotations . . . . . . . . . . . . . . . . 84
6.1.2 The Exponential Map . . . . . . . . . . . . . . . . 84
6.1.3 Local Coordinates . . . . . . . . . . . . . . . . . . 85
6.1.4 Incorporating Heading Information . . . . . . . . . 86
6.1.5 Planar Rotations . . . . . . . . . . . . . . . . . . . 87
6.2 PoseSLAM . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.1 Representing Poses . . . . . . . . . . . . . . . . . 89
6.2.2 Local Pose Coordinates . . . . . . . . . . . . . . . 89
6.2.3 Optimizing over Poses . . . . . . . . . . . . . . . . 90
6.2.4 PoseSLAM . . . . . . . . . . . . . . . . . . . . . . 91
6.3 Optimization over Lie Groups and Arbitrary Manifolds . . . 92
6.3.1 Matrix Lie Groups . . . . . . . . . . . . . . . . . . 93
6.3.2 General Manifolds and Retractions . . . . . . . . . 93
6.3.3 Retractions and Lie Groups . . . . . . . . . . . . . 95
6.4 Bibliographic Remarks . . . . . . . . . . . . . . . . . . . . 95
7 Applications 96
7.1 Inertial Navigation . . . . . . . . . . . . . . . . . . . . . . 96
7.2 Dense 3D Mapping . . . . . . . . . . . . . . . . . . . . . 98
7.3 Field Robotics . . . . . . . . . . . . . . . . . . . . . . . . 100
7.4 Robust Estimation and Non-Gaussian Inference . . . . . . 104
7.5 Long-term Operation and Sparsification . . . . . . . . . . 106
7.6 Large-scale and Distributed SLAM . . . . . . . . . . . . . 108
7.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Bibliography 114
v
Appendices 131
We review the use of factor graphs for the modeling and solving of
large-scale inference problems in robotics. Factor graphs are a family of
probabilistic graphical models, other examples of which are Bayesian
networks and Markov random fields, well known from the statistical
modeling and machine learning literature. They provide a powerful ab-
straction that gives insight into particular inference problems, making
it easier to think about and design solutions, and write modular soft-
ware to perform the actual inference. We illustrate their use in the
simultaneous localization and mapping problem and other important
problems associated with deploying robots in the real world. We in-
troduce factor graphs as an economical representation within which
to formulate the different inference problems, setting the stage for the
subsequent sections on practical methods to solve them. We explain the
nonlinear optimization techniques for solving arbitrary nonlinear factor
graphs, which requires repeatedly solving large sparse linear systems.
The sparse structure of the factor graph is the key to understand-
ing this more general algorithm, and hence also understanding (and
improving) sparse factorization methods. We provide insight into the
graphs underlying robotics inference, and how their sparsity is affected
by the implementation choices we make, crucial for achieving highly
performant algorithms. As many inference problems in robotics are in-
cremental, we also discuss the iSAM class of algorithms that can reuse
previous computations, re-interpreting incremental matrix factoriza-
tion methods as operations on graphical models, introducing the Bayes
tree in the process. Because in most practical situations we will have
to deal with 3D rotations and other nonlinear manifolds, we also in-
troduce the more sophisticated machinery to perform optimization on
nonlinear manifolds. Finally, we provide an overview of applications of
factor graphs for robot perception, showing the broad impact factor
graphs had in robot perception.
F. Dellaert and M. Kaess. Factor Graphs for Robot Perception. Foundations and
Trends® in Robotics, vol. 6, no. 1-2, pp. 1–139, 2017.
DOI: 10.1561/2300000043.
1
Introduction
This article reviews the use of factor graphs for the modeling and solv-
ing of large-scale inference problems in robotics, including the simulta-
neous localization and mapping (SLAM) problem. Factor graphs are a
family of probabilistic graphical models, other examples of which are
Bayesian networks and Markov random fields, which are well known
from the statistical modeling and machine learning literature. They
provide a powerful abstraction to give insight into particular inference
problems, making it easier to think about and design solutions, and
write modular, flexible software to perform the actual inference. Below
we illustrate their use in SLAM, one of the key problems in mobile
robotics. Other important problems associated with deploying robots
in the real world are localization, tracking, and calibration, all of which
can be phrased in terms of factor graphs, as well.
In this first section we introduce Bayesian networks and factor
graphs in the context of robotics problems. We start with Bayesian
networks as they are probably the most familiar to the reader, and
show how they are useful to model problems in robotics. However,
since sensor data is typically given to us, we introduce factor graphs
as a more relevant and economical representation. We show Bayesian
2
1.1. Inference Problems in Robotics 3
l1 l2
x1 x2 x3
Figure 1.1: A toy SLAM (simultaneous localization and mapping) example with
three robot poses and two landmarks. Above we schematically indicate the robot
motion with arrows, while the dotted lines indicate bearing measurements.
To act sensibly in the world, robots need to infer knowledge about the
world from their sensors, while drawing on a priori knowledge. There
are many different such inference problems in robotics, but none of
them have received as much attention as simultaneous localization and
mapping (SLAM). We discuss SLAM in detail and use it as a moti-
vating example below. Other inference problems include localization in
a known environment, tracking other actors in the environment, and
multi-robot versions of all of the above. More specialized problems are
also of interest, e.g., calibration or long-term inertial navigation.
In the SLAM problem the goal is to localize a robot using the infor-
mation coming from the robot’s sensors. In a simple case this could be
a set of bearing measurements to a set of landmarks. If the landmarks’
positions are known, this comes down to a triangulation problem remi-
niscent of how ships navigate at sea. However, the additional wrinkle in
SLAM is that we do not know the landmark map a priori, and hence we
have to infer the unknown map simultaneously with localization with
respect to the evolving map.
4 Introduction
l1 l2
z2 z3 z4
x1 x2 x3
z1
Figure 1.2: Bayes net for the toy SLAM example from Figure 1.1. Above we
showed measurements with square nodes, as these variables are typically observed.
boxes as they are observed. Per the general definition of Bayes nets, the
joint density p(X, Z) = p(x1 , x2 , x3 , l1 , l2 , z1 , z2 , z3 , z4 ) is obtained as a
product of conditional densities:
One can see that the joint density in this case consists of four qualita-
tively different sets of factors:
• A “Markov chain” p(x1 )p(x2 |x1 )p(x3 |x2 ) on the poses x1 , x2 , and
x3 [Eq. 1.4]. The conditional densities p(xt+1 |xt ) might represent
prior knowledge or can be derived from known control inputs.
Note that the graph structure makes an explicit statement about data
association, i.e., for every measurement zk we know which landmark
it is a measurement of. While it is possible to model unknown data
association in a graphical model context, in this text we assume that
data association is given to us as the result of a pre-processing step.
The exact form of the densities above depends very much on the appli-
cation and the sensors used. The most often-used densities involve the
multivariate Gaussian distribution, with probability density
1 1
N (θ; µ, Σ) = p exp − kθ − µk2Σ , (1.8)
|2πΣ| 2
where µ ∈ Rn is the mean, Σ is an n × n covariance matrix, and
∆
kθ − µk2Σ = (θ − µ)> Σ−1 (θ − µ) (1.9)
denotes the squared Mahalanobis distance. For example, priors on un-
known quantities are often specified using a Gaussian density.
In many cases it is both justified and convenient to model mea-
surements as corrupted by zero-mean Gaussian noise. For example, a
bearing measurement from a given pose x to a given landmark l would
be modeled as
z = h(x, l) + η, (1.10)
where h(.) is a measurement prediction function, and the noise η
is drawn from a zero-mean Gaussian density with measurement covari-
ance R. This yields the following conditional density p(z|x, l) on the
measurement z:
1 1
p(z|x, l) = N (z; h(x, l), R) = p exp − kh(x, l) − zk2R .
|2πR| 2
(1.11)
The measurement functions h(.) are often nonlinear in practical
robotics applications. Still, while they depend on the actual sensor
used, they are typically not difficult to reason about or write down.
The measurement function for a 2D bearing measurement is simply
h(x, l) = atan2(ly − xy , lx − xx ), (1.12)
8 Introduction
1 1
p(xt+1 |xt , ut ) = p exp − kg(xt , ut ) − xt+1 k2Q , (1.14)
|2πQ| 2
where g(.) is a motion model, and Q a covariance matrix of the appro-
priate dimensionality, e.g., 3 × 3 in the case of robots operating in the
plane. Note that for robots operating in three-dimensional space, we
will need slightly more sophisticated machinery to specify densities on
nonlinear manifolds such as SE(3), as discussed in Section 6.
1. sampling the poses x1 , x2 , and x3 from p(x1 )p(x2 |x1 )p(x3 |x2 ),
i.e., simulate a robot trajectory;
2. sampling l1 and l2 from p(l1 ) and p(l2 ), i.e., generate some plau-
sible landmarks;
Many other topological orderings are possible. For example, steps 1 and
2 above can be switched without consequence. Also, we can generate
the pose measurement z1 at any time after x1 is generated, etc.
Now that we have the means to model the world, we can infer knowledge
about the world when given information about it. Above we saw how
to fully specify a joint density P (Θ) in terms of a Bayes net: its factor-
ization is given by its graphical structure, and its exact computational
form by specifying the associated priors and conditional densities.
In robotics we are typically interested in the unknown state vari-
ables X, such as poses and/or landmarks, given the measurements Z.
The most often used estimator for these unknown state variables X
is the maximum a posteriori or MAP estimate, so named because
it maximizes the posterior density p(X|Z) of the states X given the
measurements Z:
The second equation above is Bayes’ law, and expresses the posterior
as the product of the measurement density p(Z|X) and the prior p(X)
over the states, appropriately normalized by the factor p(Z).
However, a different expression of Bayes law is the key to under-
standing the true computation underlying MAP inference. Indeed, all
of the quantities in Bayes’ law as stated in (1.16) can in theory be
computed from the Bayes net. However, as the measurements Z are
given, the normalization factor p(Z) is irrelevant to the maximization
and can be dropped. In addition, while the conditional density p(Z|X)
is a properly normalized Gaussian density in Z, we are only concerned
with it as a function in the unknown states X. Hence the second and
more important form of Bayes’ law:
While Bayes nets are a great language for modeling, factor graphs are
better suited to perform inference. Like Bayes nets, factor graphs allow
us to specify a joint density as a product of factors. However, they are
more general in that they can be used to specify any factored function
φ(X) over a set of variables X, not just probability densities.
To motivate this, consider performing MAP inference for the toy
SLAM example. After conditioning on the observed measurements Z,
the posterior p(X|Z) can be re-written using Bayes’ law (1.16) as
l1 l2
x1 x2 x3
Figure 1.3: Factor graph resulting from the Bayes net in Figure 1.2 on page 6 after
conditioning on the measurements Z.
where the correspondence between the factors and the original proba-
bility densities and/or likelihood factors in Equations 1.20-1.23 should
be obvious, e.g., φ7 (x1 , l1 ) = l(x1 , l1 ; z2 ) ∝ p(z2 |x1 , l1 ).
1.9 Roadmap
The SLAM problem [174, 129, 186] has received considerable attention
in mobile robotics as it is one way to enable a robot to explore and nav-
igate previously unknown environments. In addition, in many applica-
tions the map of the environment itself is the artifact of interest, e.g., in
urban reconstruction, search-and-rescue operations, and battlefield re-
connaissance. As such, it is one of the core competencies of autonomous
robots [187]. A comprehensive review was done by Durrant-Whyte and
Bailey in 2006 [59, 6] and more recently by Cadena et al. [19], but the
field is still generating a steady stream of contributions at the top-tier
robotics conferences.
The foundational book by Pearl [163] is still one of the best places
to read about Bayesian probability and Bayesian networks, as is the
tome by Koller and Friedman [121], and the book by Darwiche [38].
Although in these works the emphasis is (mostly) on problems with
discrete-valued unknowns, they can just as easily be applied to contin-
uous estimation problems like SLAM.
16 Introduction
The factor graph for a more realistic SLAM problem than the toy ex-
ample from the previous section could look something like Figure 2.1.
This graph was created by simulating a 2D robot, moving in the plane
for about 100 time steps, as it observes landmarks. For visualization
purposes each robot pose and landmark is rendered at its ground truth
position in 2D. With this, we see that the odometry factors form a
prominent, chain-like backbone, whereas off to the sides binary likeli-
hood factors are connected to the 20 or so landmarks. All factors in
such SLAM problems are typically nonlinear, except for priors.
17
18 Smoothing and Mapping
We now show that MAP inference for SLAM problems with Gaussian
noise models is equivalent to solving a nonlinear least-squares problem.
Indeed, for an arbitrary factor graph, MAP inference comes down to
maximizing the product (1.24) of all factor graph potentials:
Let us for now assume that all factors are of the form
1
φi (Xi ) ∝ exp − khi (Xi ) − zi k2Σi , (2.3)
2
which include both simple Gaussian priors and likelihood factors de-
rived from measurements corrupted by zero-mean, normally distributed
noise. Taking the negative log of (2.2) and dropping the factor 1/2 allows
us to instead minimize a sum of nonlinear least-squares:
2.3 Linearization
∆
and ∆i = Xi − Xi0 is the state update vector. Note that we make
an assumption that Xi lives in a vector-space or, equivalently, can be
represented by a vector. This is not always the case, e.g., when some of
the unknown states in X represent 3D rotations or other more complex
manifold types. We will revisit this issue in Section 6.
Substituting the Taylor expansion (2.5) into the nonlinear least-
squares expression (2.4) we obtain a linear least-squares problem in
the state update vector ∆,
X
2
∆∗ = argmin
hi (Xi0 ) + Hi ∆i − zi
(2.7)
∆ Σi
i
X
n o
2
= argmin
Hi ∆i − zi − hi (Xi0 )
, (2.8)
∆ Σi
i
where we made use of the equalities from Equation 2.18. Clearly, kek22
will be the least-squares sum of squared residuals, and the least-squares
solution ∆∗ can be obtained by solving the triangular system
R ∆∗ = d (2.21)
via back-substitution. The cost of QR is dominated by the cost of the
Householder reflections, which is 2(m − n/3)n2 . Comparing this with
Cholesky, we see that both algorithms require O(mn2 ) operations when
m n, but that QR-factorization is slower by a factor of 2.
Note that the upper-triangular factor R obtained using QR factor-
ization is the same (up to possible sign changes on the diagonal) as
would be obtained by Cholesky factorization, as
" #> " #
> R > R
A A= Q Q = R> R, (2.22)
0 0
where we again made use of the fact that Q is orthogonal.
There are efficient algorithms for factorizing large sparse matrices,
for both QR and Cholesky variants. Depending on the amount of non-
zeros and on the sparsity structure, the cost of a sparse factorization
can be far lower than its dense equivalent. Efficient software implemen-
tations are available, e.g., CHOLMOD [28] and SuiteSparseQR [39],
which are also used under the hood by MATLAB. In practice sparse
Cholesky and LDL factorization outperform QR factorization on sparse
problems as well, and not just by a constant factor.
In summary, the optimization problem associated with SLAM can
be concisely stated in terms of sparse linear algebra. It comes down to
factorizing either the information matrix Λ or the measurement Jaco-
bian A into square root form. Because they are based on matrix square
roots derived from the smoothing and mapping (SAM) problem, we
have
√ referred to this family of approaches as square root SAM, or
SAM for short [46, 48].
24 Smoothing and Mapping
i
and corresponds to a nonlinear factor graph derived from the measure-
ments along with prior densities on some or all unknowns.
All of the algorithms share the following basic structure: They start
from an initial estimate X 0 . In each iteration, an update step ∆ is
calculated and applied to obtain the next estimate X t+1 = X t + ∆.
This process ends when certain convergence criteria are reached, such
as the change ∆ falling below a small threshold.
2.5.2 Gauss-Newton
Gauss-Newton (GN) provides faster convergence by using a second or-
der update. GN exploits the special structure of the nonlinear least-
2.5. Nonlinear Optimization for MAP Inference 25
2.5.3 Levenberg-Marquardt
The Levenberg-Marquardt (LM) algorithm allows for iterating multiple
times to convergence while controlling in which region one is willing to
trust the quadratic approximation made by Gauss-Newton. Hence, such
a method is often called a trust region method.
To combine the advantages of both the SD and GN methods, Lev-
enberg [133] proposed to modify the normal equations (2.14) by adding
a non-negative constant λ ∈ R+ ∪ {0} to the diagonal
A> A + λI ∆lb = A> b. (2.26)
Note that for λ = 0 we obtain GN, and for large λ we approximately
obtain ∆∗ ≈ λ1 A> b, an update in the negative gradient direction of
the cost function g (2.23). Hence, LM can be seen to blend naturally
between the Gauss-Newton and Steepest Descent methods.
Marquardt [144] later proposed to take into account the scaling of
the diagonal entries to provide faster convergence:
A> A + λ diag(A> A) ∆lm = A> b. (2.27)
This modification causes larger steps in the steepest descent direction
if the gradient is small (nearly flat directions of the objective func-
tion) because there the inverse of the diagonal entries will be large.
Conversely, in steep directions of the objective function the algorithm
becomes more cautious and takes smaller steps. Both modifications of
the normal equations can be interpreted in Bayesian terms as adding
a zero-mean prior to the system.
26 Smoothing and Mapping
Current
estimate
Dog leg update
Trust region
Gauss-Newton update
Figure 2.2: Powell’s dogleg algorithm combines the separately computed Gauss-
Newton and gradient descent update steps.
g(X t ) − g(X t + ∆)
ρ= , (2.28)
L(0) − L(∆)
3.1 On Sparsity
Dense methods will not scale to realistic problem sizes in SLAM. In the
introduction we looked at a small toy problem to explain Bayes nets and
factor graph formulations, for which a dense method will work fine. The
larger simulation example, with its factor graph shown in Figure 2.1 on
page 18, is more representative of real-world problems. However, it is
still relatively small as real SLAM problems go, where problems with
30
3.1. On Sparsity 31
l1 l2
x1 x2 x3
Figure 3.1: Factor graph (again) for the toy SLAM example.
Figure 3.2: Block structure of the sparse Jacobian A for the toy SLAM example
> >
with ∆ = δl1> , δl2> , δx> >
1 , δx2 , δx3 .
tain a sparse system [A|b] with the block structure in Figure 3.2. Com-
paring this with the factor graph, it is obvious that every factor cor-
responds to a block row, and every variable corresponds to a block
column of A. In total there are nine block-rows, one for every factor in
the factorization of φ(l1 , l2 , x1 , x2 , x3 ).
Λ22 Λ25
Λ31 Λ33 Λ34
Λ41 Λ43 Λ44 Λ45
Λ52 Λ54 Λ55
∆
Figure 3.3: The Hessian matrix Λ = A> A for the toy SLAM problem.
l1 l2
x1 x2 x3
Figure 3.4: The Hessian matrix Λ can be interpreted as the matrix associated with
the Markov random field representation for the problem.
use the MRF graphical model much beyond that. Note that one can
develop an equivalent theory of how MRFs represent a different family
of factored probability densities, see e.g. Koller and Friedman [121]. In
the linear-Gaussian case, for instance, the least-squares error can be
re-written as
j ij
∆
where g = A> b. After exponentiating, we see that the induced Gaussian
density has the form
p(∆) ∝ exp − kA∆ − bk22 ∝ φj (∆j ) ψj (∆i , ∆j ), (3.4)
Y Y
j ij
which is the general form for densities induced by binary MRFs [204].
In what follows, however, factor graphs are better suited to our
needs. They are able to express a finer-grained factorization, and are
more closely related to the original problem formulation. For example,
if there exist ternary (or higher arity) factors in the factor graph, the
graph G of the equivalent MRF connects those nodes in an undirected
clique (a fully connected subgraph), but the origin of the corresponding
clique potential is lost. In linear algebra, this reflects the fact that many
matrices A can yield the same Λ = A> A matrix: important information
on the sparsity is lost.
variables X. This then allows for easy MAP inference, and even other
operations such as sampling (as we saw before) and/or marginalization.
In particular, the variable elimination algorithm is a way to fac-
torize any factor graph of the form
φ(X) = φ(x1 , . . . , xn ) (3.5)
into a factored Bayes net probability density of the form
p(X) = p(x1 |S1 )p(x2 |S2 ) . . . p(xn ) = p(xj |Sj ), (3.6)
Y
l1 l2 l1 l2
x1 x2 x3 x1 x2 x3
(a) (b)
l1 l2 l1 l2
x1 x2 x3 x1 x2 x3
(c) (d)
l1 l2
x1 x2 x3
(e)
Figure 3.5: Variable elimination for the toy SLAM example, transforming the
factor graph from Figure 3.1 into a Bayes net, using the ordering l1 , l2 , x1 , x2 , x3 .
i
1X
( )
= exp − kAi Xi − bi k22 (3.14)
2 i
1
2
= exp −
Āj [xj ; Sj ] − b̄j
, (3.15)
2 2
with
A41 b4
∆ ∆
Ā1 = A71 A73 , b̄1 = b7 . (3.17)
A81 A84 b8
3.3. Sparse Matrix Factorization as Variable Elimination 39
Looking at the sparse Jacobian in Figure 3.2 on page 32, this simply
boils down to taking out the block rows with non-zero blocks in the
first column, corresponding to the three factors adjacent to l1 .
where we used the fact that the rotation matrix Q does not alter the
value of the norms involved.
Example. In Figure 3.6 we show the result of eliminating the first
variable in the example, the landmark l1 with separator {x1 , x2 }. We
show the operation on the factor graph and the corresponding effect on
the sparse Jacobian from Figure 3.2, omitting the RHS. The partition
above the line corresponds to a sparse, upper-triangular matrix R that
is being formed. New contributions to the matrix are shown in boldface:
blue for the contributions to R, and red for newly created factors.
l1 l2
l1 l2
x1 x2 x3 x1 x2 x3
A13
R1 T13 T14
A23 A24
A13
A34 A35
A
A23 A24
41 A34 A35
A52
A52
A63
A
A63
71 A73
A92 A95
A81 A84
Ã13 Ã14
A92 A95
l1 l2
l1 l2
x1 x2 x3
x1 x2 x3
R1 T13 T14
R2 T25 R1 T13 T14
A13
R2 T25
A23 A24
R3 T34
A34 A35
A34 A35
A63 Ã25
Ã13 Ã14 Ã34
Ã25
l1 l2 l1 l2
x1 x2 x3 x1 x2 x3
R1 T13 T14
R1 T13 T14
R2 T25
R2 T25
R3 T34
R3 T34
R4 T45
R4 T45
Ã25
R5
Ã45
Figure 3.7: The remaining elimination steps for the toy example, completing a full
multifrontal QR factorization.
42 Exploiting Sparsity
since by construction the MAP estimate for the separator Sj∗ is fully
known by this point.
3.5 Discussion
Insight into the graphs underlying robotics inference, and how their
sparsity is affected by the implementation choices we make, is crucial
for achieving highly performant algorithms. In this section we discuss
elimination ordering strategies and their effect on performance. This
will also allows us, in Section 5, to understand the effects of marginal-
izing out variables, and its possibly deleterious effect on sparsity, espe-
cially in the SLAM case. Other inference problems in robotics benefit
from only keeping track of the most recent state estimate; this leads to
filtering and/or fixed-lag smoothing algorithms.
j=1
47
48 Elimination Ordering
this cost heavily depends on the application. However, for the case of
sparse matrix factorization, the cost of Algorithm 3.2 is dominated by
the partial QR factorization step (Section 3.3.3). The main cost in QR
is incurred by the Householder reflections, which—without going into
detail—take 4mk nk flops when applied to an mk × nk matrix. We need
one Householder reflection for every dimension k in xj , hence
nj nj
g(Φj:n , xj ) ≈ 4mk nk = 4(mj − k)(nj + sj + 1 − k), (4.2)
X X
k=1 k=1
n−1
f (Φ1:n ) = 4(m − k)(n + 1 − k) = 2(m − n/3)n2 + O(mn). (4.3)
X
k=1
However, for the sparse, multifrontal algorithm the flop count will be
much lower than this number. In addition, the multifrontal method
can make use of Level-3 BLAS methods that exploit cache coherence
or even thread parallelism for larger elimination steps [39].
Example. For our running SLAM example, for which the multi-
frontal QR algorithm is shown in Figures 3.6 and 3.7, we have
l1 l2
x1 x2 x3
x3 x2 x1 l2 l1
R1 T12 T14
R2 T23 T24 T25
R3 T34 T35
R4 T45
R5
Figure 4.1: Bayes net resulting from using the ordering x3 , x2 , x1 , l2 , l1 for the
robotics example (the reverse ordering from Figure 3.5 on page 36).
The flop count for sparse factorization will be much lower than for a
dense matrix, but can vary dramatically for different elimination or-
derings. While any order will ultimately produce an identical MAP
estimate, the order in which variables are eliminated matters, as differ-
ent orderings lead to Bayes nets with different topologies. This will in
turn affect the computational complexity of the elimination algorithm,
as the sizes sj of the separators at each step are affected. To illustrate,
let us look at two examples below.
Example 1. In our toy example, the result of using the reverse
ordering from before is shown in Figure 4.1. Note that the resulting
Bayes net is almost fully dense: only the first variable eliminated, x3 ,
is not fully connected to the variables later in the ordering. This figure
should be compared with the final elimination stage of Figure 3.7 on
page 41.
Example 2. A more realistic example shows what is really at stake.
To this end, recall the larger simulation example, with its factor graph
shown in Figure 2.1 on page 18. The sparsity patterns for the corre-
50 Elimination Ordering
Figure 4.2: On the left, the measurement Jacobian A associated with the problem
in Figure 2.1, which has 3 × 95 + 2 × 24 = 333 unknowns. The number of rows,
1126, is equal to the number of (scalar) measurements. Also given is the number
of non-zero entries “nnz”. On the right: (top) the information matrix Λ , A> A;
(middle) its upper triangular Cholesky triangle R; (bottom) an alternative factor
amdR obtained with a better variable ordering (COLAMD).
4.3. The Concept of Fill-in 51
l1 l2 l1 l2
x1 x2 x3 x1 x2 x3
x3 x2 x1 l2 l1 x3 x2 x1 l2 l1
Λ11 Λ12 Λ14
R1 T12 T14
Λ21 Λ22 Λ23 Λ25 R2 T23 T24 T25
Λ32 Λ33 Λ35
R3 T34 T35
Λ41 Λ43 Λ44 R4 T45
Λ52 Λ53 Λ55 R5
Figure 4.3: Fill-in occurs when the DAG resulting from elimination into R has
extra edges with respect to the Markov random field G corresponding to Λ.
• For planar
√ graphs we can recursively find separators of size less
than 2 2n, and the resulting factorization will cost O(n1.5 ) flops.
1. Partition
√G into subgraphs A, B and C, with |A| , |B| ≤ 23 n and
√
|C| ≤ 2 2 n
Figure 4.4: The so-called “Schur complement trick” eliminates all landmarks first,
but this leads to a very connected graph on the robot poses. Doing the reverse (poses
first) would similarly lead to a fully connected clique of landmarks.
Figure 4.6: Comparing the information matrix and the Cholesky factor for the two
different orderings from Figures 4.4 and 4.5; nz = number of non-zeros.
Figure 4.7: By reordering while taking into account the special block structure of
the SLAM problem, the non-zero count can be reduced even further, to about 130K,
a reduction by a factor of 20 with respect to the original R.
(e)
(c)
(a)
√
n n |A| , |B| |C|
(a) 25,000 158.1 12446 108
(b) 12446 111.5 6196 54
(c) 6196 78.7 3071 54
(d) 3071 55.4 1518 36
(e) 1518 38.9 741 36
(f) 741 27.2 362 18
Figure 4.9: Simulated block world example that has a global planar graph struc-
ture. The table shows the level of the ND recursion, the number of nodes n in the
√
each partition, n, the size of the next partitions and the separator size.
√
found. To illustrate the n separator theorem at work in these types of
environments, the table in Figure 4.9 shows an example of the relevant
partition and separator sizes for a block-world with 25,000 nodes.
62
63
Floor
Basement
Figure 5.1: (top left) A map of ten floors of the MIT Stata Center created using a
reduced pose graph [104] in combination with a real-time visual SLAM system. The
data used was collected in 14 sessions spanning a six-month period. The total oper-
ation time was nine hours and the distance traveled was 11km. Elevator transitions
are shown as vertical blue lines. The view is orthographic and the vertical axis has
been exaggerated to make it easier to see each floor. The 2nd floor is approximately
90m across. (top right) Floor plans for each of the floors that were mapped. (bot-
tom) Map for one of the sessions covering the second floor of the MIT Stata Center
with ground truth floor plan added for reference. The dense point cloud consists of
RGB-D frames rendered at the solution of the pose graph. The pose graph itself is
derived from sparse stereo point features.
64 Incremental Smoothing and Mapping
∆
where ∆ = X − X 0 is the state update vector, and A and b represent
5.2. Updating a Matrix Factorization 65
column
j
...
1
c s
1. . =
..
1
row i -s c x 0
Givens R R'
Figure 5.2: Using a Givens rotation as a step in transforming a (sparse) matrix
into upper triangular form. The entry marked ’x’ is eliminated, changing some of
the entries marked in red (dark), depending on sparsity.
A series of Givens rotations are applied to zero out the new row,
starting from the left-most nonzero entry (i, j1 ) and resulting in an
upper triangular matrix that contains the updated factor R0
" #
R0
Gjk . . . Gj2 Gj1 Ra = . (5.6)
0
Figure 5.3: Incrementally updating the square-root information matrix for an ex-
ploration task: new measurement rows are added to the upper triangular factor R
and the right-hand side (RHS). The left column shows the updates for the first three
steps, the right column shows the update after 50 steps. The update operation is
symbolically denoted by ⊕. Entries that remain unchanged are shown in light blue.
The Kalman filter and smoother [145] are special cases of incrementally
updating a linear system. For simplicity, let us consider the simplest
possible factor graph structure—a simple chain—which is most often
encountered in localization tasks, or for example in pose graph SLAM
in the absence of loop-closure detection. To make the connection with
the Kalman smoother we first describe the concept of marginalization
in factor graphs, and then discuss two popular methods in robotics that
are based on marginalization: fixed-lag smoothing and filtering.
68 Incremental Smoothing and Mapping
5.3.1 Marginalization
Even with incremental updating, memory usage and computation are
still unbounded in time; one solution is to remove older variables with-
out removing information, a process that is called marginalization.
Stated in terms of probability densities, if we have a joint density p(x, y)
over two variables x and y, then marginalizing out the variable x cor-
responds to integrating over x, i.e.,
Z
p(y) = p(x, y), (5.8)
x
x1 x2 x3 x4
(a) Fully eliminated Bayes net after having seen four measurements. In a fixed-lag
smoothing scenario with n = 3, we “forget” about the pose x1 , hence it is grayed out.
This is mathematically equivalent to marginalization.
x1 x2 x3 x4 x5
(b) At the next time step, the new pose x5 is predicted using a motion model factor,
and measured via a new unary factor, both colored red. We also convert back the
density on pose x4 to a unary factor, in black.
x1 x2 x3 x4 x5
(c) After eliminating x4 and then x5 , we arrive back at a fully eliminated Bayes net.
In the linear case we do not have to re-eliminate x3 , as it would yield the same result.
Figure 5.4: Fixed-lag smoothing for a localization scenario, which illustrates incre-
mental inference with the language of graphical models.
because the variable x1 is a leaf, i.e., it does not point to any other
variable, it can be marginalized without any further computation. This
is indicated in the figure by graying x1 out.
In Figure 5.4b, a new pose x5 is added to the system, along with a
new relative constraint between x4 and x5 and a unary measurement
constraint on x5 . These are shown in the figure as a red binary and
unary factor, respectively. We now do something unexpected: the root
density p(x4 ) is converted back to a unary factor f (x4 ), indicated in
black in the figure. In other words, we build a small “mini factor graph”
on the two variables x4 and x5 that takes into account the previous
information on x4 , as well as the measurements associated with the new
pose x5 . Since we added an new variable, we marginalize out the now
oldest variable in the lag, the pose x2 , leaving only the three variables
x3 . . . x5 in play (the marginalization step).
5.4. Nonlinear Filtering and Smoothing 71
where πj are the parent nodes of xj . However, although the Bayes net
is chordal, at this variable level it is still a non-trivial graph: neither
chain-like nor tree-structured. The chordal Bayes net for our running
toy SLAM example is shown in Figure 3.5e on page 36, and it is clear
that there is an undirected cycle x1 − x2 − l1 .
l1 l2
x1 x2 x3
X X X
X X
X X
X X
X
(b) Bayes tree (c) Square root informa-
tion matrix
Figure 5.5: The Bayes tree (b) and the associated square root information matrix
R (c) describing the clique structure in the chordal Bayes net (a) based on our
canonical example from Figure 1.3. A Bayes tree is similar to a clique tree, but
is better at capturing the formal equivalence between sparse linear algebra and
inference in graphical models. The association of cliques with rows in the R factor
is indicated by color.
More formally, a Bayes tree is a directed tree where the nodes rep-
resent cliques Ck of the underlying chordal Bayes net. In particular,
we define one conditional density p(Fk |Sk ) per node, with the sepa-
rator Sk as the intersection Ck ∩ Πk of the clique Ck and its parent
clique Πk . The frontal variables Fk are the remaining variables, i.e.
∆
Fk = Ck \ Sk . We write Ck = Fk : Sk . The following expression gives
the joint density p(X) on the variables X defined by a Bayes tree:
For the root Fr the separator is empty, i.e., it is a simple prior p(Fr )
on the root variables. The way Bayes trees are defined, the separator
74 Incremental Smoothing and Mapping
x2, x3 x2 x3
l1, x1 : x2 l2 : x3
l1 x1
x2 x3
x1, x2, x3
l1 : x1, x2 l2 : x3 l1 x1
Figure 5.6: Updating a Bayes tree with a new factor, based on the example in
Figure 5.5. The affected part of the Bayes tree is highlighted for the case of adding
a new factor between x1 and x3 . Note that the right branch (green) is not affected
by the change. (top right) The factor graph generated from the affected part of the
Bayes tree with the new factor (dashed blue) inserted. (bottom right) The chordal
Bayes net resulting from eliminating the factor graph. (bottom left) The Bayes tree
created from the chordal Bayes net, with the unmodified right “orphan” sub-tree
from the original Bayes tree added back in.
arise from the fact that it encodes the information flow during elimi-
nation. The Bayes tree is formed from the chordal Bayes net following
the inverse elimination order. In this way, variables in each clique col-
lect information from their child cliques via the elimination of these
children. Thus, information in any clique propagates only upwards to
the root. Second, the information from a factor enters elimination only
when the first variable connected to that factor is eliminated. Combin-
ing these two properties, we see that a new factor cannot influence any
other variables that are not successors of the factor’s variables. How-
ever, a factor on variables having different (i.e., independent) paths to
the root means that these paths must now be re-eliminated to express
the new dependency between them.
76 Incremental Smoothing and Mapping
Algorithm 5.1 shows the pseudo-code for the Bayes tree updat-
ing scheme, and Figure 5.6 shows how these incremental factoriza-
tion/inference steps are applied to our canonical SLAM example. In
this example, we add a new factor between x1 and x3 , affecting only
the left branch of the tree, marked by the red dashed line in to top
left figure. We then create the factor graph shown in the top right fig-
ure by creating a factor for each of the clique densities, p(x2 , x3 ) and
p(l1 , x1 |x2 ), and add the new factor f (x1 , x3 ). The bottom right figure
shows the eliminated graph using the ordering l1 , x1 , x2 , x3 . And finally,
in the bottom left figure, the reassembled Bayes tree is shown consist-
ing of two parts: the Bayes tree derived from the eliminated graph, and
the unaffected clique from the original Bayes tree (shown in green).
Figure 5.7 shows an example of the Bayes tree for a small SLAM
sequence. Shown is the tree for step 400 of the well-known Manhattan
world simulated sequence by Olson et al. [157]. As a robot explores the
environment, new measurements only affect parts of the tree, and only
those parts are re-calculated.
Putting all of the above together and addressing some practical con-
sideration about re-linearization yields a state of the art incremental,
5.4. Nonlinear Filtering and Smoothing 77
x398,x399,x400
x396,x397
x395 x394
x383 x288 x273 x233 x364 x355 x133 x139 x160 x138
x382 x289 x286 x290 x272 x270 x266 x363 x354 x141 x353
x381 x292 x281 x285 x271 x135 x269 x349,x350 x362 x357 x267 x301
x379,x380 x308 x291 x279 x280 x278 x284 x248,x249,x251,x257,x258,x259,x260,x261,x262,x265 x268 x348 x361 x360 x253 x250
x378 x303 x277 x245,x246,x247 x98 x128 x114 x104 x347 x344
x377 x309 x302 x296 x300 x244 x230 x97 x168 x146 x101 x113 x79 x345 x264 x343
x376 x311 x307 x304 x297 x295 x241,x242,x243 x206 x94 x167 x148 x171 x145 x99 x82 x112 x320 x263 x342
x374,x375 x316 x310 x306 x294 x238,x239,x240 x227,x228,x229 x205 x137 x89 x117 x80 x127 x77 x129 x106 x84 x111 x85 x341
x373 x338 x298 x315 x314 x305 x293 x237 x122 x208 x224 x218 x131 x91 x184 x81 x116 x78 x76 x105 x83 x110 x340 x321
x372 x346 x317 x299 x312 x313 x287 x236 x140 x121 x223 x209 x132 x217 x213 x130 x202 x96 x183 x115 x75 x103 x109 x339
x371 x283 x254 x235 x226 x222 x170 x136 x216 x172 x212 x126 x201 x95 x182 x74 x102 x108 x337 x322
x369 x282 x123 x225 x207 x221 x214 x100 x124,x199,x200 x92 x73 x107 x335,x336 x323
x368 x359 x256 x220 x161 x195,x196,x197 x120 x119 x90 x72 x88 x334
x367 x351 x255 x370 x219 x162 x194 x166 x215 x193 x118 x71 x87 x333
x163 x154 x147 x143 x188 x173 x198 x52,x63,x64 x16 x15 x328 x325
x174 x36
x35
x34
x33
x32
Figure 5.7: An example of the Bayes tree data structure for a small SLAM se-
quence. The incremental nonlinear least-squares estimation algorithm iSAM2 [109]
is based on viewing incremental factorization as editing the graphical model cor-
responding to the posterior probability of the solution, the Bayes tree. As a robot
explores the environment, new measurements often only affect small parts of the
tree, and only those parts are re-calculated (shown in red).
to the original factors, instead of directly turning the cliques into a fac-
tor graph. And that requires caching certain quantities during elimina-
tion. The overall incremental nonlinear algorithm, iSAM2, is described
in much more detail in [109].
Just as we discussed the linear Kalman filter and fixed-lag smoother
in terms of graphical models, the well-known Extended Kalman Filter
and Smoother [145] can be seen as a special case of the iSAM2 algo-
rithm, where the Bayes tree reduces to a chain rather than a proper
tree. Or, another way to state this is that the Bayes tree representa-
tion enabled us to generalize the earlier non-linear estimators to general
graphical models. Indeed, iSAM and iSAM2 have been applied success-
fully to many different robotics estimation problems with non-trivial
constraints between variables that number into the millions, as will be
discussed in depth in Section 7.
data structure is similar to the clique tree [165, 17, 121], also known as
the junction tree in the AI literature [32], which has also been exploited
for distributed inference in SLAM by Paskin [160], Dellaert et al. [49],
and Pinies et al. [164].
To combine the advantages of the graphical model and sparse linear
algebra perspectives, we propose a novel data structure, the Bayes tree,
first presented in Kaess et al. [107]. Our approach is based on viewing
matrix factorization as eliminating a factor graph into a Bayes net,
which is the graphical model equivalent of the square root information
matrix. However, the Bayes tree is directed and corresponds more nat-
urally to the result of QR and Cholesky factorization in linear algebra,
allowing us to analyze it in terms of conditional probability densities in
the tree. As trees of cliques in chordal graphs, Bayes trees are similar
to clique trees [17] and junction trees [121]. However, a Bayes tree is
directed and is semantically closer to a Bayes net in the way it encodes
a factored probability density.
As defined in [121], a cluster tree is a directed tree of clusters in
which the running intersection property holds, and each factor in the
original graph is associated with a cluster. The cluster tree is more
general than the junction tree or the clique tree. Ni and Dellaert [152]
proposed TSAM 2, a multi-level SLAM approach that combines nested
dissection [78, 135] with a cluster tree representation.
Exploiting the Bayes tree and the insights gained, we proposed
iSAM2 [108, 109], a novel incremental exact inference method that
allows for incremental reordering and just-in-time re-linearization.
iSAM2 extends our original iSAM algorithm by leveraging these in-
sights about the connections between graphical model and sparse lin-
ear algebra perspectives. To the best of our knowledge this is a com-
pletely novel approach to providing an efficient and exact solution to a
sparse nonlinear optimization problem in an incremental setting, with
general applications beyond SLAM. While standard nonlinear opti-
mization methods repeatedly solve a linear batch problem to update
the linearization point, our Bayes tree-based algorithm allows fluid re-
linearization of a reduced set of variables, which translates into higher
efficiency, while retaining sparseness and full accuracy.
5.5. Bibliographic Remarks 81
82
6.1. Attitude and Heading Estimation 83
use more than 3 numbers, they obey constraints that makes them re-
fer to the same underlying three-dimensional rotation manifold. Ap-
pendix B.3 defines the 3D rotation manifold and describes the different
representations in more detail.
Now, as an example, let us assume that we have access to an ac-
celerometer that is subject to no forces except gravity, and outputs a
three-dimensional measurement za ∈ R3 . Furthermore, we assume that
we have access to the following measurement function that predicts the
accelerometer reading za from the rotation R:
ha : SO(3) → R3 : R 7→ za . (6.1)
Given what we have seen so far, it is natural to try and estimate the
unknown rotation R by minimizing the least-squares error criterion
However, this does not quite work for 3D rotations, even if we prop-
erly vectorize the initial rotation estimate R0 to add the 9-dimensional
update vector ξ. This is because adding an arbitrary 3 × 3 matrix Ξ
obtained from a 9-dimensional vector ξ almost inevitably moves away
from the SO(3) manifold. In particular, the matrix R + Ξ is almost
surely no longer orthogonal. In addition, we know that rotations have
only three degrees of freedom, not nine, so it seems counter-intuitive
to work with 9-dimensional increments.
84 Optimization on Manifolds
−ξy ξx 1
The 3×3 matrix above is not on the SO(3) manifold, strictly speaking,
but it is at least close to the manifold for small ξ. We can write the
above more concisely as
ˆ
R(ξ) ≈ I + ξ. (6.5)
Above, the hat operator creates a skew-symmetric matrix ξˆ from ξ,
and is defined as
0
−ξz ξy
ξˆ = ξz
∆
0 −ξx . (6.6)
−ξy ξx 0
k=0
k! 2! 3!
where the powers are to be interpreted as matrix powers. Note that the
first two terms are identical to the approximation in Equation 6.5; the
other terms can be seen as increasingly smaller corrections to bring the
matrix R(ξ) back to the SO(3) manifold.
6.1. Attitude and Heading Estimation 85
While not always fully appreciated, the exponential map is exact for
arbitrarily large vectors ξ. In addition, the exponential map for SO(3)
is available in closed form, through Rodrigues’ formula:
sin θ ˆ 1− cos θ ˆ2
exp ξˆ = I + ξ+ ξ . (6.8)
θ θ2
∆ ˆ
R0 ⊕ ξ = R0 · exp ξ, (6.9)
with ξ ∈ R3 . In other words, we exponentiate the coordinates ξ around
the identity to create an incremental rotation that is composed with
the base rotation R0 . Local coordinates equal to zero correspond to
R0 itself, and non-zero local coordinates ξ around zero are smoothly
mapped to a neighborhood of R0 on the rotation manifold. Because R3
is a vector space under addition, this will allow us to use R0 as the
linearization point in a nonlinear optimization scheme.
Indeed, after this re-parameterization, we can minimize for ξ in-
stead, starting from zero. We obtain the following estimator
∆
where now g a (ξ; R0 ) = ha (R0 eξ̂ ) is a new measurement function defined
over the local coordinates ξ.
To minimize (6.10) we need a notion of how this new prediction
function g a (ξ; R0 ) behaves in the neighborhood of zero. Loosely speak-
ing, we need to find the 3 × 3 Jacobian matrix GaR0 such that
ξ
za zm
Figure 6.1: Trivial factor graph for estimating attitude and heading by fusing both
accelerometer and magnetometer readings za and zm .
If we add time into this equation and introduce unknown rotation ma-
trices for successive time steps we can then estimate the changing 3D
orientation of a robotic platform over time.
While outside the scope of the current document, if in addition we
measure the difference between successive orientations through use of a
gyroscope, add variables to model the slowly varying accelerometer and
gyroscope biases, and integrate all these in a factor graph, we obtain a
fully-fledged attitude and heading reference system (AHRS) [63].
For completeness, and because this will be useful when we talk about
PoseSLAM in the next section, we now show that planar rotations can
be treated in a similar way, respecting their manifold nature. While it
is tempting to simply represent a planar rotation with an angle θ ∈
R, this does not accurately reflect the wrapping by 2π that so often
introduces bugs in code. Instead, as reviewed in Appendix B.1, the
nonlinear manifold SO(2) containing all orthonormal 2 × 2 matrices
accurately reflects this circular topology, using:
" #
cos θ − sin θ
θ→ . (6.16)
sin θ cos θ
88 Optimization on Manifolds
R0 ⊕ ξ = R0 · exp ξˆ
∆
(6.19)
" #" #
cos θ0 − sin θ0 cos ξ − sin ξ
= (6.20)
sin θ0 cos θ0 sin ξ cos ξ
" #
cos (θ0 + ξ) − sin (θ0 + ξ)
= , (6.21)
sin (θ0 + ξ) cos (θ0 + ξ)
i.e., the effect is exactly the same as adding ξ to θ0 , but with the
wrapping handled correctly.
6.2 PoseSLAM
with ω̂ the result of applying the hat operator of the corresponding ro-
tation group to ω. In detail, for SE(2) and SE(3), we have respectively
0 −ωz vx
ˆ: R → se(2) : ξ 7→ ωz
3
0 vy ∆τ (6.24)
0 0 0
0 −ωz ωy vx
ωz 0 −ωx vy
ˆ: R6 → se(3) : ξ 7→ ∆τ, (6.25)
−ωy ωx 0 vz
0 0 0 0
where one recognizes the skew-symmetric matrices corresponding to
an incremental rotation in the top-left. Closed-form solutions of the
associated exponential maps exist as well, see for example [150].
In either case, we can again use the exponential map to define a
mapping from local pose coordinates ξ back to a neighborhood
around an initial pose estimate x0 :
∆ ˆ
x0 ⊕ ξ = x0 · exp ξ, (6.26)
6.2.4 PoseSLAM
In PoseSLAM we do not over a single pose, of course, but over all poses
in a robot trajectory that we want to reconstruct. Typically, two types
of factors will be involved: unary factors such as pose priors and/or
absolute pose measurements (e.g., from GPS) and binary factors, such
as relative pose constraints derived from LIDAR.
The factor graph for a simple PoseSLAM example is shown in Fig-
ure 6.2. To anchor the graph we add the unary factor f0 (x1 ), and as the
robot travels through the world, binary factors ft (xt , xt+1 ) correspond-
ing to odometry are created. The red factor models a different event:
92 Optimization on Manifolds
f4 (x4 , x5 )
x5 x4
f5 (x5 , x2 ) f3 (x3 , x4 )
x1 x2 x3
f0 (x1 ) f1 (x1 , x2 ) f2 (x2 , x3 )
Figure 6.2: Factor graph for PoseSLAM with a loop closure constraint in red.
a loop closure. For example, the robot might recognize the same lo-
cation using vision or a laser range finder, and calculate the geometric
pose constraint to when it first visited this location. This is illustrated
for poses x5 and x2 , which generates the loop closing factor f5 (x5 , x2 ).
Deriving the detailed expressions for the corresponding measure-
ment functions and their Jacobians is beyond the scope of this doc-
ument, but the final optimization will—in each iteration—minimize
over the local coordinates of all poses by summing over the following
linearized measurement factors:
While the rotation and rigid transformation groups are by far the most
important in robotics, the local parameterization technique above can
be generalized to any matrix Lie group and even to non-group nonlinear
manifolds. In fact, sometimes it is computationally advantageous to
forget about the Lie group structure and simply use local mappings
other than the exponential map. We discuss all of this below.
6.3. Optimization over Lie Groups and Arbitrary Manifolds 93
Figure 6.3: For the sphere manifold, the local tangent plane with a local basis
provides the notion of local coordinates.
a · exp ξˆ − Ra (ξ)
lim = 0. (6.39)
ξ→0 |ξ|
Example. For SE(3), instead of using the true exponential map it
is computationally more efficient to define a retraction which uses the
simpler expression v∆τ for the translation update:
" #" # " #
R t eω̂∆τ v∆τ Reω̂∆τ t + Rv∆τ
RT (ξ) = = .
0 1 0 1 0 1
(6.40)
In effect, both rotation and translation are updated separately, al-
though still in the original coordinate frame. In contrast to the above,
the exponential map takes into account the rotating coordinate frame
when updating the translation, and the corresponding trajectory is a
screw motion when plotted as a function of ∆τ . This correspond to a
rigid body with constant angular and translational velocity in the body
frame. However, for the purposes of optimization the exact mapping
away from zero does not matter all that much.
The book by Murray, Li, and Sastry [150] is an excellent text on Lie
groups in robotics, albeit focused mainly on manipulation tasks. Op-
timization on manifolds using retractions is discussed in great detail
in [1]. Our exposition on local function behavior and the definition of
Jacobian matrices is taken from [177].
7
Applications
96
7.1. Inertial Navigation 97
l1
f Stereo or f P roj
f P rior fV O f GP S
f P rior
f Equiv
x1 x2 x3 f Equiv
x1 x2 x3 x4
f bias f bias
c1 c2 c3 c1 c2 c3 c4
(a) (b)
(c)
Figure 7.1: Aided inertial navigation [103]. (a) Factor graph for visual odometry
integration with inertial bias estimation. (b) GPS integration (red factors) and/or
stereo integration (blue). (c) Evaluation on KITTI sequence: Incremental smoothing
performs almost as good as full batch optimization. Fixed-lag smoothing solutions
are also shown for various lags.
98 Applications
In [102, 103] the plug and play capability of factor graphs is ex-
ploited in a navigation context (see Figure 7.1), with incremental
smoothing and mapping (iSAM) as the inference engine (see Section
5). To deal with the typically large update rates of IMUs, one can use
the idea by Lupton and Sukkarieh [143] to pre-integrate IMU mea-
surements between lower-rate measurements from other sensors such
as cameras and LIDARs. The use of IMU pre-integration is an excel-
lent way to balance computational efficiency with fixed-lag smoothing
in factor graphs.
These ideas were also exploited by Forster et al. [67, 68] in the con-
text of visual-inertial odometry or VIO. In these papers, a more
sophisticated integration scheme is coupled with fixed lag smoothing
to yield state-of-the-art performance. Leutenegger et al. [131, 132] also
present VIO and SLAM algorithms that are visualized as factor graphs.
And Usenko et al. [193] combine a semi-direct odometry algorithm with
inertial information, proposing a factor graph approach that essentially
follows the approach in [103]. However, instead of incremental smooth-
ing, they apply marginalization to only estimate the most recent state,
including inertial biases.
Finally, Mur-Artal and Tardos [149] added inertial sensing to their
prior ORB-SLAM system, a visual SLAM approach based on ORB
features detected in a video sequence. Here, inertial sensing is incorpo-
rated during tracking including bias estimation, while a local bundle
adjustment is used when adding new keyframes to the map.
While sparse point clouds are sufficient for keeping a robot localized,
interaction with the environment requires a more complete map repre-
sentation. Common choices for dense 3D map representations include
point clouds, surfels (points with orientation), triangle meshes, and
planar surfaces.
Whelan et al. [197, 196] present a dense SLAM solution, Kintinu-
ous, that extends the local dense method KinectFusion to large envi-
ronments, see Figure 7.2. Loop closures are applied to a factor graph
7.2. Dense 3D Mapping 99
(a)
(b)
Figure 7.2: Real-time dense 3D mapping with Kintinuous [196]: (a) A dense vol-
umetric representation (TSDF) is used locally and shifted along with the camera.
The pose graph is drawn in pink with small cuboids as camera poses that have cloud
slices associated with them. Loop closure on the pose graph is transferred to the
point cloud via mesh deformation. (b) Sequence over two floors of an apartment with
over six million vertices that was constructed including online loop closure. Small
details such as bathroom fixtures and objects around the environment are clearly
reconstructed.
100 Applications
that represents the poses of the trajectory, and the corresponding cor-
rections are transferred to the map, represented as a triangle mesh, by
applying a mesh deformation.
Dense methods can also be achieved on the object level. Salas-
Moreno et al. [170] present SLAM++, an object-based SLAM system
where landmarks correspond to objects such as chairs in the environ-
ment, for which a prior dense model can be available. The entire infer-
ence problem is posed in terms of factor graph optimization.
Planar surfaces are predominant in indoor environments, and this
information can be exploited in SLAM. Trevor et al. [190] formulate
this problem in the context of factor graph optimization. In [106] and
[99], we use infinite planes as landmarks in the factor graph and provide
a suitable retraction for optimization. The formulation is equivalent to
structure-from-motion with infinite planes instead of point features.
And, since the number of planar features is small per frame, the opti-
mization can be done in real-time even for large environments. Figure
7.3 provides examples of maps based on infinite planes.
(a)
(b)
(c)
Figure 7.3: Dense mapping system based on planar surfaces [99]. The factor graph
is similar to structure from motion, but with infinite planes instead of point features.
102 Applications
Thrus te rs
Camera
DVL
DIDSON Sonar
L ED Li g ht
START
0
2 DR
Depth[m]
4
6
8 SLAM
−20
−30
−40 END
−50 0
5
−60
10
−70
15
−80 20
Cartesian y [m]
−90 25 Cartesian x [m]
Figure 7.4: Ship hull and harbor infrastructure inspection with an underwater
robot [98]. A factor graph formulation is used to fuse navigation and sonar data into
3D models. Camera images can be added when water turbidity allows.
7.3. Field Robotics 103
(a)
(b) (c)
Figure 7.5: Spatio-temporal reconstruction for crop monitoring [53]. (a) Recon-
structed 4D model of a peanut field. (b) Factor graph of multi-sensor SLAM. (c)
Factor graph connecting two rows with shared landmarks.
104 Applications
(a) (b)
Figure 7.6: Non-Gaussian inference on the Bayes tree using kernel density esti-
mates [69]. (a) Intermediate steps and posterior are multimodal. (b) Data associ-
ation can be transformed into a multimodal estimation by marginalizing out the
discrete variable from the factor graph.
its feasibility has already been demonstrated for a SLAM problem with
thousands of variables and high ambiguity with a theoretical number
of modes larger than 2400 . Here, in contrast to other approaches de-
scribed above, data association has been converted into a continuous,
but multimodal inference problem by integrating out any discrete vari-
ables such as those related to uncertainty in data association.
(b) Full graph (top view) (c) GLC reduced graph (top view)
(d) Full graph (time scaled) (e) GLC reduced graph (time scaled)
Figure 7.7: Generic linear constraints (GLC) node removal with optional Chow-Liu
tree sparsification [23]. (a) Overview of the algorithm. (b)-(e) The example shows
extensive outdoor data from 27 mapping sessions over a period of 15 months. The
oblique time scaled views visualize the more than ten-fold reduction in both the
number of nodes and factors.
108 Applications
Figure 7.8: Three robots and a synthetic landmark used in the multi-robot mapping
experiment from [35] (left), and an aerial view of the parking lot in which the
experiment was performed (right).
Figure 7.9: Left: data association between 2 (out of 3) trajectories in the middle
of an experiment with the three robots from Figure 7.8. Measurement factors are
shown as translucent dark lines, trajectories as blue and green paths, and optimized
landmarks as black circles. Right: full final results for all three robots, at the end of
the experiment, this time omitting factors for clarity.
Figure 7.10: Large City Navigation Scenario (3.3 kilometers) from the DARPA
ASPN project: Ground truth (blue), solution using only 1-step short-term smoother
(red, 3D RMS error: 19.19 meters), solution using the CSM scheme (green, 3D RMS
error: 5.85 meters), and the 3D landmarks constructed during the first loop (red
points inside the highlighted yellow box).
7.7 Summary
113
Bibliography
[1] Absil, P.-A., Mahony, R., and Sepulchre, R. (2007). Optimization Algo-
rithms on Matrix Manifolds. Princeton University Press, Princeton, NJ,
USA.
[2] Agarwal, P. and Olson, E. (2012). Variable reordering strategies for SLAM.
In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS).
[3] Amestoy, P., Davis, T., and Duff, I. (1996). An approximate minimum
degree ordering algorithm. SIAM Journal on Matrix Analysis and Appli-
cations, 17(4):886–905.
[4] Anderson, S., Barfoot, T., Tong, C., and Särkkä, S. (2015). Batch nonlin-
ear continuous-time trajectory estimation as exactly sparse Gaussian pro-
cess regression. Autonomous Robots, 39(3):221–238.
[5] Anderson, S., Dellaert, F., and Barfoot, T. (2014). A hierarchical wavelet
decomposition for continuous-time SLAM. In IEEE Intl. Conf. on Robotics
and Automation (ICRA).
[6] Bailey, T. and Durrant-Whyte, H. (2006). Simultaneous localisation and
mapping (SLAM): Part II state of the art. Robotics & Automation Maga-
zine.
[7] Beall, C., Dellaert, F., Mahon, I., and Williams, S. (2011). Bundle ad-
justment in large-scale 3D reconstructions based on underwater robotic
surveys. In OCEANS, 2011 IEEE-Spain, pages 1–6. IEEE.
114
Bibliography 115
[8] Beeri, C., Fagin, R., Maier, D., Mendelzon, A., Ullman, J., and Yan-
nakakis, M. (1981). Properties of acyclic database schemes. In ACM Symp.
on Theory of Computing (STOC), pages 355–362, New York, NY, USA.
ACM Press.
[9] Bellman, R. and Dreyfus, S. (1962). Applied Dynamic Programming.
Princeton University Press.
[10] Bertele, U. and Brioschi, F. (1972a). Nonserial Dynamic Programming.
Academic Press.
[11] Bertele, U. and Brioschi, F. (1972b). On the theory of the elimination
process. J. Math. Anal. Appl., 35(1):48–57.
[12] Bertele, U. and Brioschi, F. (1973). On nonserial dynamic programming.
J. Combinatorial Theory, 14:137–148.
[13] Bichucher, V., Walls, J., Ozog, P., Skinner, K., and Eustice, R. (2015).
Bathymetric factor graph SLAM with sparse point cloud alignment. In
OCEANS, 2015. MTS/IEEE Conference and Exhibition.
[14] Bierman, G. (1977). Factorization methods for discrete sequential esti-
mation, volume 128 of Mathematics in Science and Engineering. Academic
Press, New York.
[15] Bierman, G. (1978). An application of the square-root information filter
to large scale linear interconnected systems. IEEE Trans. Automat. Contr.,
23(1):91–93.
[16] Bishop, C. (2006). Pattern Recognition and Machine Learning. Informa-
tion Science and Statistics. Springer-Verlag, Secaucus, NJ, USA.
[17] Blair, J. and Peyton, B. (1993). An introduction to chordal graphs and
clique trees. In [80], pages 1–27.
[18] Brown, D. C. (1976). The bundle adjustment - progress and prospects.
Int. Archives Photogrammetry, 21(3).
[19] Cadena, C., Carlone, L., Carrillo, H., Latif, Y., Scaramuzza, D., Neira, J.,
Reid, I., and Leonard, J. J. (2016). Past, present, and future of simultaneous
localization and mapping: Toward the robust-perception age. IEEE Trans.
Robotics, 32(6):1309–1332.
[20] Cannings, C., Thompson, E., and Skolnick, M. (1978). Probability func-
tions on complex pedigrees. Advances in Applied Probability, 10:26–61.
[21] Carlevaris-Bianco, N. and Eustice, R. M. (2013a). Generic factor-based
node marginalization and edge sparsification for pose-graph SLAM. In
IEEE Intl. Conf. on Robotics and Automation (ICRA), pages 5728–5735.
116 Bibliography
[34] Cunningham, A., Paluri, M., and Dellaert, F. (2010). DDF-SAM: Fully
distributed SLAM using constrained factor graphs. In IEEE/RSJ Intl.
Conf. on Intelligent Robots and Systems (IROS).
[35] Cunningham, A., Wurm, K., Burgard, W., and Dellaert, F. (2012). Fully
distributed scalable smoothing and mapping with robust multi-robot data
association. In IEEE Intl. Conf. on Robotics and Automation (ICRA), St.
Paul, MN.
[36] Cuthill, E. and McKee, J. (1969). Reducing the bandwidth of sparse
symmetric matrices. In Proc. of the 1969 24th ACM national conference,
pages 157–172, New York, NY, USA. ACM Press.
[37] D’Ambrosio, B. (1994). Symbolic probabilistic inference in large BN2O
networks. In Proc. 10th Conf. on Uncertainty in AI (UAI), pages 128–135,
Seattle, WA.
[38] Darwiche, A. (2009). Modeling and Reasoning with Bayesian Networks.
Cambridge University Press.
[39] Davis, T. (2011). Algorithm 915: SuiteSparseQR, a multifrontal mul-
tithreaded sparse QR factorization package. ACM Trans. Math. Softw.,
38(1):8:1–8:22.
[40] Davis, T., Gilbert, J., Larimore, S., and Ng, E. (2004). A column ap-
proximate minimum degree ordering algorithm. ACM Trans. Math. Softw.,
30(3):353–376.
[41] Davis, T. and Hager, W. (1996). Modifying a sparse Cholesky factoriza-
tion. SIAM Journal on Matrix Analysis and Applications, 20(3):606–627.
[42] Dechter, R. (1996). Bucket elimination: A unifying framework for several
probabilistic inference algorithms. In Proc. 12th Conf. on Uncertainty in
AI (UAI), Portland, OR.
[43] Dechter, R. (1998). Bucket Elimination: A unifying framework for rea-
soning. In Jordan, M., editor, Learning in Graphical Models, pages 75–104.
Kluwer Academic Press. Also published by MIT Press, 1999.
[44] Dechter, R. (1999). Bucket Elimination: A unifying framework for rea-
soning. Artificial Intelligence, 113(1-2):41–85.
[45] Dechter, R. and Pearl, J. (1987). Network-based heuristics for constraint-
satisfaction problems. Artificial Intelligence, 34(1):1–38.
[46] Dellaert, F. (2005). Square Root SAM: Simultaneous location and map-
ping via square root information smoothing. In Robotics: Science and Sys-
tems (RSS).
118 Bibliography
[60] Estrada, C., Neira, J., and Tardós, J. (2005). Hierarchical SLAM: Real-
time accurate mapping of large environments. IEEE Trans. Robotics,
21(4):588–596.
[61] Eustice, R., Singh, H., and Leonard, J. (2005). Exactly sparse delayed-
state filters. In IEEE Intl. Conf. on Robotics and Automation (ICRA),
pages 2417–2424.
[62] Fagin, R., Mendelzon, A., and Ullman, J. (1982). A simplied univer-
sal relation assumption and its properties. ACM Trans. Database Syst.,
7(3):343–360.
[63] Farrell, J. (2008). Aided Navigation: GPS with High Rate Sensors.
McGraw-Hill.
[64] Faugeras, O. (1993). Three-dimensional computer vision: A geometric
viewpoint. The MIT press, Cambridge, MA.
[65] Fiduccia, C. and Mattheyses, R. (1982). A linear time heuristic for im-
proving network partitions. In Proc. 19th IEEE Design Automation and
Conference, pages 175–181.
[66] Forney, Jr, G. (2001). Codes on graphs: Normal realizations. IEEE
Trans. Inform. Theory, 47(2):520–548.
[67] Forster, C., Carlone, L., Dellaert, F., and Scaramuzza, D. (2015).
IMU preintegration on manifold for efficient visual-inertial maximum-a-
posteriori estimation. In Robotics: Science and Systems (RSS).
[68] Forster, C., Carlone, L., Dellaert, F., and Scaramuzza, D. (2016). On-
manifold preintegration for real-time visual-inertial odometry. IEEE Trans.
Robotics.
[69] Fourie, D., Leonard, J., and Kaess, M. (2016). A nonparametric belief
solution to the Bayes tree. In IEEE/RSJ Intl. Conf. on Intelligent Robots
and Systems (IROS), Daejeon, Korea.
[70] Frese, U. (2005). Treemap: An O(log n) algorithm for simultaneous lo-
calization and mapping. In Spatial Cognition IV, pages 455–476. Springer
Verlag.
[71] Frese, U., Larsson, P., and Duckett, T. (2005). A multilevel relaxation al-
gorithm for simultaneous localisation and mapping. IEEE Trans. Robotics,
21(2):196–207.
[72] Frese, U. and Schröder, L. (2006). Closing a million-landmarks loop. In
IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS), pages
5032–5039.
120 Bibliography
[100] Huang, T. and Kaess, M. (2015). Towards acoustic structure from mo-
tion for imaging sonar. In IEEE/RSJ Intl. Conf. on Intelligent Robots and
Systems (IROS), pages 758–765, Hamburg, Germany.
[101] Indelman, V., Nelson, E., Dong, J., Michael, N., and Dellaert, F. (2016).
Incremental distributed inference from arbitrary poses and unknown data
association: Using collaborating robots to establish a common reference.
IEEE Control Systems Magazine (CSM), Special Issue on Distributed Con-
trol and Estimation for Robotic Vehicle Networks, 36(2):41–74.
[102] Indelman, V., Wiliams, S., Kaess, M., and Dellaert, F. (2012). Factor
graph based incremental smoothing in inertial navigation systems. In Intl.
Conf. on Information Fusion, FUSION.
[103] Indelman, V., Wiliams, S., Kaess, M., and Dellaert, F. (2013). Infor-
mation fusion in navigation systems via factor graph based incremental
smoothing. Robotics and Autonomous Systems, 61(8):721–738.
[104] Johannsson, H., Kaess, M., Fallon, M., and Leonard, J. (2013). Tempo-
rally scalable visual SLAM using a reduced pose graph. In IEEE Intl. Conf.
on Robotics and Automation (ICRA), pages 54–61, Karlsruhe, Germany.
[105] Julier, S. and Uhlmann, J. (2001). A counter example to the theory
of simultaneous localization and map building. In IEEE Intl. Conf. on
Robotics and Automation (ICRA), volume 4, pages 4238–4243.
[106] Kaess, M. (2015). Simultaneous localization and mapping with infinite
planes. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages
4605–4611, Seattle, WA.
[107] Kaess, M., Ila, V., Roberts, R., and Dellaert, F. (2010). The Bayes
tree: An algorithmic foundation for probabilistic robot mapping. In Intl.
Workshop on the Algorithmic Foundations of Robotics.
[108] Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J., and Del-
laert, F. (2011). iSAM2: Incremental smoothing and mapping with fluid
relinearization and incremental variable reordering. In IEEE Intl. Conf. on
Robotics and Automation (ICRA), Shanghai, China.
[109] Kaess, M., Johannsson, H., Roberts, R., Ila, V., Leonard, J., and Del-
laert, F. (2012a). iSAM2: Incremental smoothing and mapping using the
Bayes tree. Intl. J. of Robotics Research, 31:217–236.
[110] Kaess, M., Ranganathan, A., and Dellaert, F. (2007). Fast incremental
square root information smoothing. In Intl. Joint Conf. on AI (IJCAI),
pages 2129–2134, Hyderabad, India.
[111] Kaess, M., Ranganathan, A., and Dellaert, F. (2008). iSAM: Incremen-
tal smoothing and mapping. IEEE Trans. Robotics, 24(6):1365–1378.
Bibliography 123
[112] Kaess, M., Wiliams, S., Indelman, V., Roberts, R., Leonard, J., and
Dellaert, F. (2012b). Concurrent filtering and smoothing. In Intl. Conf. on
Information Fusion, FUSION.
[113] Karczmarczuk, J. (1998). Functional differentiation of computer pro-
grams. In Intl. Conf. on Functional Programming (ICFP), pages 195–203.
[114] Karypis, G. and Kumar, V. (1998). Multilevel algorithms for multi-
constraint graph partitioning. In Supercomputing ’98: Proceedings of the
1998 ACM/IEEE conference on Supercomputing (CDROM), pages 1–13,
Washington, DC, USA. IEEE Computer Society.
[115] Kelly, III, C. and Barclay, S. (1973). A general Bayesian model for
hierarchical inference. Organizational Behavior and Human Performance,
10:388–403.
[116] Kernighan, B. and Lin, S. (1970). An efficient heuristic procedure for
partitioning graphs. The Bell System Technical Journal, 49(2):291–307.
[117] Khan, Z., Balch, T., and Dellaert, F. (2006). MCMC data association
and sparse factorization updating for real time multitarget tracking with
merged and multiple measurements. IEEE Trans. Pattern Anal. Machine
Intell., 28(12):1960–1972.
[118] Kim, A. and Eustice, R. M. (2015). Active visual SLAM for robotic
area coverage. Intl. J. of Robotics Research, 34(4-5):457–475.
[119] Kim, B., Kaess, M., Fletcher, L., Leonard, J., Bachrach, A., Roy, N.,
and Teller, S. (2010). Multiple relative pose graphs for robust cooperative
mapping. In IEEE Intl. Conf. on Robotics and Automation (ICRA), pages
3185–3192, Anchorage, Alaska.
[120] Knight, J., Davison, A., and Reid, I. (2001). Towards constant time
SLAM using postponement. In IEEE/RSJ Intl. Conf. on Intelligent Robots
and Systems (IROS), pages 405–413.
[121] Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models:
Principles and Techniques. The MIT Press.
[122] Konolige, K. (2004). Large-scale map-making. In Proc. 21th AAAI
National Conference on AI, San Jose, CA.
[123] Krauthausen, P., Dellaert, F., and Kipp, A. (2006). Exploiting locality
by nested dissection for square root smoothing and mapping. In Robotics:
Science and Systems (RSS).
[124] Kschischang, F. (2003). Codes defined on graphs. IEEE Signal Proc.
Mag., 41:118–125.
124 Bibliography
[125] Kschischang, F., Frey, B., and Loeliger, H.-A. (2001). Factor graphs
and the sum-product algorithm. IEEE Trans. Inform. Theory, 47(2).
[126] Kümmerle, R., Grisetti, G., and Burgard, W. (2012). Simultaneous cali-
bration, localization, and mapping. In IEEE/RSJ Intl. Conf. on Intelligent
Robots and Systems (IROS), pages 3716–3721.
[127] Kümmerle, R., Grisetti, G., Strasdat, H., Konolige, K., and Burgard,
W. (2011). g2o: A general framework for graph optimization. In Proc. of the
IEEE Int. Conf. on Robotics and Automation (ICRA), Shanghai, China.
[128] Latif, Y., Cadena, C., and Neira, J. (2013). Robust loop closing over
time for pose graph SLAM. Intl. J. of Robotics Research, 32(14):1611–1626.
[129] Leonard, J., Durrant-Whyte, H., and Cox, I. (1992). Dynamic map
building for an autonomous mobile robot. Intl. J. of Robotics Research,
11(4):286–289.
[130] Leonard, J. and Feder, H. (2001). Decoupled stochastic mapping. IEEE
Journal of Oceanic Engineering, pages 561–571.
[131] Leutenegger, S., Furgale, P., Rabaud, V., Chli, M., Konolige, K., and
Siegwart, R. (2013). Keyframe-based visual-inertial SLAM using nonlinear
optimization. In Robotics: Science and Systems (RSS).
[132] Leutenegger, S., Lynen, S., Bosse, M., Siegwart, R., and Furgale, P.
(2015). Keyframe-based visual-inertial odometry using nonlinear optimiza-
tion. Intl. J. of Robotics Research, 34(3):314–334.
[133] Levenberg, K. (1944). A method for the solution of certain nonlinear
problems in least squares. Quart. Appl. Math, 2(2):164–168.
[134] Ling, F. (1991). Givens rotation based least squares lattice related
algorithms. IEEE Trans. Signal Processing, 39(7):1541–1551.
[135] Lipton, R., Rose, D., and Tarjan, R. (1979). Generalized nested dissec-
tion. SIAM Journal on Applied Mathematics, 16(2):346–358.
[136] Lipton, R. and Tarjan, R. (1979). A separator theorem for planar
graphs. SIAM Journal on Applied Mathematics, 36(2):177–189.
[137] Liu, J. (1985). Modification of the minimum-degree algorithm by mul-
tiple elimination. ACM Trans. Math. Softw., 11(2):141–153.
[138] Liu, J. W. H. (1989). A graph partitioning algorithm by node separators.
ACM Trans. Math. Softw., 15(3):198–219.
[139] Loeliger, H.-A. (2004). An introduction to factor graphs. IEEE Signal
Processing Magazine, pages 28–41.
Bibliography 125
[154] Ni, K., Steedly, D., and Dellaert, F. (2007). Tectonic SAM: Exact;
out-of-core; submap-based SLAM. In IEEE Intl. Conf. on Robotics and
Automation (ICRA), Rome; Italy.
[155] Nocedal, J. and Wright, S. J. (1999). Numerical Optimization. Springer
Series in Operations Research. Springer-Verlag.
[156] Olson, E. and Agarwal, P. (2013). Inference on networks of mixtures
for robust robot mapping. Intl. J. of Robotics Research, 32(7):826–840.
[157] Olson, E., Leonard, J., and Teller, S. (2006). Fast iterative alignment
of pose graphs with poor initial estimates. In IEEE Intl. Conf. on Robotics
and Automation (ICRA), pages 2262–2269.
[158] Ozog, P., Troni, G., Kaess, M., Eustice, R., and Johnson-Roberson,
M. (2015). Building 3D mosaics from an autonomous underwater vehicle
and 2D imaging sonar. In IEEE Intl. Conf. on Robotics and Automation
(ICRA), pages 1137–1143, Seattle, WA.
[159] Parter, S. (1961). The use of linear graphs in Gauss elimination. SIAM
Rev., 3(2):119–130.
[160] Paskin, M. (2003). Thin junction tree filters for simultaneous localiza-
tion and mapping. In Intl. Joint Conf. on AI (IJCAI).
[161] Paz, L. M., Pinies, P., Tardós, J. D., and Neira, J. (2008). Large
scale 6DOF SLAM with stereo-in-hand. IEEE Transactions on Robotics,
24(5):946–957.
[162] Pearl, J. (1982). Reverend Bayes on inference engines: a distributed
hierarchical approach. In Proc. First AAAI National Conference on AI,
pages 133–136, Carnegie Mellon University, Pittsburgh, PA.
[163] Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems: Net-
works of Plausible Inference. Morgan Kaufmann.
[164] Pinies, P., Paz, P., Haner, S., and Heyden, A. (2012). Decomposable
bundle adjustment using a junction tree. In IEEE Intl. Conf. on Robotics
and Automation (ICRA), pages 1246–1253. IEEE.
[165] Pothen, A. and Sun, C. (1992). Distributed multifrontal factorization
using clique trees. In Proc. of the Fifth SIAM Conf. on Parallel Processing
for Scientific Computing, pages 34–40. Society for Industrial and Applied
Mathematics.
[166] Pothen, A., Simon, H., and Wang, L. (1992). Spectral nested dissection.
Technical Report CS-92-01, Penn. State.
Bibliography 127
[192] Tweddle, B., Saenz-Otero, A., Leonard, J., and Miller, D. (2015). Fac-
tor graph modeling of rigid-body dynamics for localization, mapping, and
parameter estimation of a spinning object in space. J. of Field Robotics,
32(6):897–933.
[193] Usenko, V., Engel, J., Stückler, J., and Cremers, D. (2016). Direct
visual-inertial odometry with stereo cameras. In IEEE Intl. Conf. on
Robotics and Automation (ICRA), pages 1885–1892, Stockholm, Sweden.
[194] VanMiddlesworth, M., Kaess, M., Hover, F., and Leonard, J. (2013).
Mapping 3D underwater environments with smoothed submaps. In Field
and Service Robotics (FSR), pages 17–30, Brisbane, Australia.
[195] Walter, M., Hemachandra, S., Homberg, B., Tellex, S., and Teller, S.
(2014). A framework for learning semantic maps from grounded natural
language descriptions. Intl. J. of Robotics Research, 33(9):1167–1190.
[196] Whelan, T., Kaess, M., Johannsson, H., Fallon, M., Leonard, J., and
McDonald, J. (2015). Real-time large scale dense RGB-D SLAM with
volumetric fusion. Intl. J. of Robotics Research, 34(4-5):598–626.
[197] Whelan, T., Kaess, M., Leonard, J. J., and McDonald, J. (2013).
Deformation-based loop closure for large scale dense RGB-D SLAM.
In IEEE/RSJ Intl. Conf. on Intelligent Robots and Systems (IROS),
Tokyo,Japan.
[198] Wiberg, N. (1996). Codes and Decoding on General Graphs. PhD thesis,
Linkoping University, Sweden.
[199] Wiberg, N., Loeliger, H.-A., and Kötter, R. (1995). Codes and iterative
decoding on general graphs. In Euro. Trans. Telecomm., volume 6, pages
513–525.
[200] Williams, S., Indelman, V., Kaess, M., Roberts, R., Leonard, J. J., and
Dellaert, F. (2014). Concurrent filtering and smoothing: A parallel archi-
tecture for real-time navigation and full smoothing. Intl. J. of Robotics
Research.
[201] Winkler, G. (1995). Image analysis, random fields and dynamic Monte
Carlo methods. Springer Verlag.
[202] Yan, X., Indelman, V., and Boots, B. (2017). Incremental sparse GP
regression for continuous-time trajectory estimation and mapping. Robotics
and Autonomous Systems, 87(1):120–132.
[203] Yannakakis, M. (1981). Computing the minimum fill-in is NP-complete.
SIAM J. Algebraic Discrete Methods, 2.
130 Bibliography
[204] Yedidia, J., Freeman, W., and Y.Weiss (2000). Generalized belief prop-
agation. In Advances in Neural Information Processing Systems (NIPS),
pages 689–695.
[205] Zhang, J., Kaess, M., and Singh, S. (2014). Real-time depth enhanced
monocular odometry. In IEEE/RSJ Intl. Conf. on Intelligent Robots and
Systems (IROS), pages 4973–4980, Chicago, IL.
[206] Zhang, J., Kaess, M., and Singh, S. (2017). A real-time method for
depth enhanced monocular odometry. 41(1):31–43.
[207] Zhang, N. and Poole, D. (1994). A simple approach to Bayesian network
computations. In Proc. of the 10th Canadian Conf. on AI, Banff, Alberta,
Canada.
Appendices
A
Multifrontal Cholesky Factorization
∆
where Λ
b =A
j j j is the augmented Hessian matrix associated with
b> A
b
the product factor ψ(xj , Sj ). As an example, eliminating l2 in the toy
example yields the product factor
Λ
b = − A>
95 A95 A>95 b9 , (A.2)
2
− − b> b
5 5 + b>b
9 9
which one can see to be the sum of two outer products, corresponding
to the factors φ5 and φ9 .
132
133
S = Rj−> Λ
b
12 (A.4)
> >
L L = S S (A.5)
= Λ b > b −1 b
22 − Λ12 Λ11 Λ12 . (A.6)
b
B.1 2D Rotations
One of the simplest Lie groups is the space of 2D rotations with com-
position as the group operator, also known as the Circle Group. The
easiest way to define it is as the subset of all 2 × 2 invertible matrices
that are both orthogonal and have determinant one, i.e., 2 × 2 rotation
matrices. Because of this definition, people often refer to this Lie group
the as the Special Orthogonal Group in dimension 2, written as
SO(2). Here “special” refers to the unit determinant property.
The nonlinear orthogonality and unit determinant constraints
define a nonlinear, one-dimensional manifold within the larger 4-
dimensional space of 2 × 2 invertible matrices. In fact, the manifold
has the topology of a circle, but it remains a group: matrix multiplica-
134
B.2. 2D Rigid Transformations 135
and
" #" # " #
R1 t1 R2 t2 R1 R2 R1 t2 + t1
T1 T2 = = . (B.5)
0 1 0 1 0 1
B.3 3D Rotations
R1 R2 6= R2 R1 (B.8)
θ θ
(ω̄, θ) ↔ cos+ (ω̄x i + ω̄y j + ω̄z k) sin ⇒ R ← φ, θ, ψ, (B.11)
2 2
where the double arrow represents the double covering property of unit
quaternions, and the last arrow indicates the undesirable many to-one
mapping from Euler angles to rotation matrices (even more so now,
because of the inherent singularities).
B.5 Directions in 3D