Global Localization Using Local Pole Patterns

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Global Localization of Vehicles Using Local Pole

Patterns

Claus Brenner

Institute of Cartography and Geoinformatics,


Leibniz Universität Hannover, Appelstraße 9a, 30167 Hannover, Germany
claus.brenner@ikg.uni-hannover.de

Abstract. Accurate and reliable localization is an important require-


ment for autonomous driving. This paper investigates an asymmetric
model for global mapping and localization in large outdoor scenes. In
the first stage, a mobile mapping van scans the street environment in
full 3D, using high accuracy and high resolution sensors. From this raw
data, local descriptors are extracted in an offline process and stored in a
global map. In the second stage, vehicles, equipped with simple, inaccu-
rate sensors are assumed to be able to recover part of these descriptors
which allows them to determine their global position. The focus of this
paper is on the investigation of local pole patterns. A descriptor is pro-
posed which is tolerant with regard to missing data, and performance and
scalability are considered. For the experiments, a large, dense outdoor
LiDAR scan with a total length of 21.7 km is used.

1 Introduction
For future driver assistance systems and autonomous driving, reliable position-
ing is a prerequisite. While global navigation satellite systems like GPS are in
widespread use, they lack the required reliability, especially in densely builtup
urban areas. Relative positioning, using video or LiDAR sensors, is an important
alternative, which has been explored in many disciplines, like photogrammetry,
computer vision and robotics. In order to determine one’s position uniquely, a
global map is required, which is usually based on features (or landmarks), rather
than on the originally captured raw data. In robotics, features like line segments
or corners have been extracted from horizontal scans [1]. However, such features
exhibit a low degree of information and, especially for indoor sites, are not very
discriminative. Recently, there is much research in computer vision using high-
dimensional descriptors (such as SIFT), extracted from images, which can be
used for object recognition [2] or localization [3]. Thus, it is interesting if high-
dimensional descriptors can be found which are strictly based on geometry and
work in large outdoor environments.
In this paper, a large 3D scanned outdoor scene is used from which a map of
features is derived. Upright poles are extracted, which can be done quite reliably
using simple geometric reasoning. The pole centers then form a 2D pattern. 2D
point pattern matching is a research topic of its own, with applications for star

J. Denzler, G. Notni, and H. Süße (Eds.): DAGM 2009, LNCS 5748, pp. 61–70, 2009.
c Springer-Verlag Berlin Heidelberg 2009
62 C. Brenner

pattern and fingerprint matching. Van Wamelen et al. [4] present an expected
linear time algorithm, to find a pattern in O(n(log k)3/2 ), where n is the number
of points in the scene and k < n the number of points in the subset which
shall be searched in the scene. Bishnu et al. [5] describe an algorithm which is
O(kn4/3 log n) in the worst case, however is reported to perform much better for
practical cases. It is based on indexing the distance of point pairs. This paper
generalizes this approach, by encoding the local relations of two or more points.

2 Mobile Mapping Setup and Feature Extraction

We obtained a dense LiDAR scan of a part of Hannover, Germany, acquired by


the Steetmapper mobile mapping system, jointly developed by 3D laser mapping
Ltd., UK, and IGI mbH, Germany [6]. The scan was acquired with a configu-
ration of four scanners. Postprocessing yields the trajectory and, through the
calibrated relative orientations of the scanners and the GNSS/IMU system, a
georeferenced point cloud. Absolute point accuracy varies depending on GPS
outages, however a relative accuracy of a few centimeters can be expected due
to the high accuracy fiber optic IMU employed. Since we want to deal with local
point patterns, relative accuracy is actually much more important than absolute
accuracy.
Fig. 1(a) shows an overview. Note that the scanned area contains streets in
densely built up regions as well as highway like roads. The total length of the
scanned roads is 21.7 kilometers, captured in 48 minutes, which is an average
speed of 27 km/h. During that time, 70.7 million points were captured, cor-
responding to an effective measurement rate of 24,500 points per second. On
average, each road meter is covered by more than 3,200 points.
After obtaining the point cloud, the first step is to extract features. The
basic motivation of this is that many of the scanned points do not convey much
information and by reducing the huge cloud to only a few important features,
transmission, storage, and computation requirements are substantially reduced.
Our first choice were poles, which are usually abundant in inner city scenes as
well as geometrically stable over time.
Pole extraction uses a simple geometric model, namely that the basic charac-
teristic of a pole is that it is upright, there is a kernel region of radius r1 where
laser scan points are required to be present, and a hollow cylinder, between r1
and r2 (r1 < r2 ) where no points are allowed to be present (Fig. 1(b)). The
structure is analyzed in stacks of hollow cylinders. A pole is confirmed when a
certain minimum number of stacked cylinders is found. After a pole is identified,
the points in the kernel are used for a least squares estimation of the pole center.
Note that this method owes its reliability to the availability of full 3D data, i.e.,
the processing of the 3D stack of cylinders. Using just one scan plane parallel
to the ground (as is often the case in robotics) or projecting the 3D cloud down
to the ground would not yield the same detection reliability. The method also
extracts some tree trunks of diameter ≤ r1 , which we do not attempt to discard
since they are useful for positioning purposes as well.
Global Localization of Vehicles Using Local Pole Patterns 63

(a) (b) (c)

Fig. 1. (a) Scan path in Hannover (color encodes height of scanned points, using
temperature scale). (b) Illustration of the pole extraction algorithm, using cylindri-
cal stacks. In this example, some parts of the pole were not verified due to missing
scan points (middle), others due to an additional structure, such as a sign mounted to
the pole (top). (c) Extracted poles for the intersection ‘Friederikenplatz’ in Hannover.
(a) and (c) overlaid with a cadastral map (which takes not part in any computation).

For the entire 22 km scene, a total of 2,658 poles was found, which is on average
one pole every 8 meters (Fig. 1(c)). In terms of data reduction, this is one extracted
pole (or 2D point) per 27,000 original scan points. Although the current implemen-
tation is not optimized, processing time is uncritical and yields several poles per
second on a standard PC. There are also some false detections in the scene, e.g.
where a pole-like point pattern is induced by low point density or occlusion. Nev-
ertheless, the pole patterns obtained are considered to be representative.

3 Characteristics of Local Pole Patterns


The idea for global localization of vehicles is to use the local pole pattern and
match this to the global set of poles. As opposed to the general case of point pat-
tern matching, one can assume that the scale is fixed. Also, there are additional
constraints which arise from this special application, which is a combination of
‘horizon diameter’ and measurement accuracy. Since vehicles are not equipped
with high accuracy positioning sensors, one can achieve a good measurement
accuracy only for scenes of small extent. In order to keept the treatment gen-
eral and map-centric, i.e. without necessity to rely on a specific vehicle sensor,
standard parameters are used in this paper. It is assumed that the ‘horizon’ of a
vehicle is a 50 meter radius disk. Within this radius, the vehicle is deemed to be
able to measure poles with an accuracy of  = 0.1 m (or alternatively, 0.2 m). A
pole accuracy of  means that during matching, a pole from the reference map
and one from the vehicle’s horizon are considered to form a matching pair if
they are within a distance of . Note that these assumptions do not imply that
an actual vehicle must be equipped with a 360◦ , 50 m range sensor. Rather,
64 C. Brenner

it is just an assumption regarding the scene extent and accuracy which seems
feasible, either by direct measurement or by merging of multiple scans along the
path of the vehicle.
Using the assumed parameters, one can derive the first pole pattern charac-
teristics. Placing the center of the ‘horizon’ on every pole in turn, and counting
the number of other poles within a radius of r = 50 m, one finds that the average
number of neighbor poles is around 18, with a maximum of 41. The number of
neighbors is not uniformly distributed, but rather, there is a peak around 17, as
can be seen from the histogram in Fig. 2(a). Around 50% of all poles do have
between 12 and 22 neighbors.
To determine uniqueness, the following experiment was performed. Each of
the poles pi , 1 ≤ i ≤ N = 2, 658 is taken as center and its local set of neighbor
poles Pi = {pj |j = i, pi − pj 2 ≤ r} (with r = 50 m) is matched to every pole
neighborhood Pj , j = i in the scene. Taking into account all possible rotations, it
is counted how many poles match within a tolerance of 2 and the maximum over
all this counts is taken, i.e. ni,max = maxj=i (matchcount(Pi , Pj )). The result is
shown as a histogram in Fig. 2(b), where the area of the bubbles reflects the count
and the pole neighborhoods are sorted according to the number of neighbors |Pi |.
For example, the bubble at (17, 2) represents the number of poles with 17 other
poles in the neighborhood, for which ni,max = 2 (in this case, the bubble area
represents a count of 101, which means that 101 out of the 155 poles with 17
neighbors (peak in Fig. 2(a)) fulfill this criterion). An important observation is
that for poles with up to around 20 neighbors, there is a strong peak at two, which
means that in the majority of those cases, if we take the center and three more
points in the neighborhood, this is a pattern which is unique in the entire scene.
This property is used in the next section for the design of the local descriptor.
It can also be seen that ni,max may be as large as 20, for poles with |Pi | ≥ 36.
As it turns out, this is due to alleys in the scene, where trees are planted with
regular spacing. Along those alleys, there is a huge number of neighbors, many
of which fit to other places along the alley. Nevertheless, one can see that even
in this case, ni,max is only around 50% of |Pi |.

4 A Local Pole Pattern Descriptor


4.1 The Curse of Dimensionality
Similar to the case of image retrieval from large databases, we are looking for a
local ‘pole descriptor’ which can be retrieved quickly from a huge scene, contain-
ing millions of poles. One of the key properties of local descriptors used in vision
(such as the SIFT descriptor [2]) is that through their high dimensionality, they
are quite unique, even in large databases. Similarly, in our case, the patterns of
local pole neighbors are high-dimensional and unique. For example, if |Pi | = 17,
we can describe the center point and its 17 neighbors in a unique way using
their local (x, y) or polar coordinates, which will yield 33 = 2 · 18 − 3 param-
eters (in general, k points in 2D will require 2k − 3 parameters when rotation
and translation are not fixed). Different from the case of the SIFT descriptor,
Global Localization of Vehicles Using Local Pole Patterns 65

Ϯϰ

ϮϮ

ϮϬ

ϭϴ

ϭϲϬ ϭϲ

ϭϰϬ ϭϰ

ϭϮϬ ϭϮ

ϭϬϬ
ϭϬ

ϴϬ
ϴ

ϲϬ
ϲ

ϰϬ
ϰ

ϮϬ
Ϯ

Ϭ
Ϭ
Ϭ Ϯ ϰ ϲ ϴ ϭϬ ϭϮ ϭϰ ϭϲ ϭϴ ϮϬ ϮϮ Ϯϰ Ϯϲ Ϯϴ ϯϬ ϯϮ ϯϰ ϯϲ ϯϴ ϰϬ
Ϭ Ϯ ϰ ϲ ϴ ϭϬ ϭϮ ϭϰ ϭϲ ϭϴ ϮϬ ϮϮ Ϯϰ Ϯϲ Ϯϴ ϯϬ ϯϮ ϯϰ ϯϲ ϯϴ ϰϬ ϰϮ

(a) (b)

Fig. 2. (a) Histogram of the number of poles in a local neighborhood (r = 50 m disk).


(b) Histogram of the maximum number of poles in the neighborhood which match to
another pole neighborhood in the scene. x-axis is number of neighbors |Pi |, y-axis is
maximum matches ni,max , and the area of the bubbles represent the number of cases.

however, the number of dimensions would not be fixed but rather depend on
the local scene content. Moreover, since one has to take into account missing
or extra poles resulting from scene interpretation errors, descriptors of different
dimensions would have to be compared.
As is well-known, common efficient indexing methods are not successful in
databases of high dimensions. For example, the popular kd-tree has a time com-
plexity for retrieval of O(n1−1/d + l) where n is the number of elements in the
database, d is the dimension, and l is the number of retrieved neighbors. This
means that for high dimensions, it is as efficient as a brute force O(n) search over
all elements. Therefore, one has to rely on approximations, for example searching
for a near neighbor only (as done by Lowe [2]) or by using quantization (as done
by Nistér and Stewénius [7], where quantization into cells is given by a hierarchic
clustering tree).
In our case, quantization is staightforward, since the parameters required to
express the pole pattern are geometric in nature, with given maximum range
and measurement accuracy. For example, using a quantization of 2 = 0.2 m
within a total range of ±50 m would yield 500 discrete values, which requires
approximately α ≈ 9 bits to encode. Thus, let us assume for the moment that
the number of poles in all neighborhoods of the database (and the query) is
the same, say k, in which case the dimension is d = 2k − 3, and each local
neighborhood can be encoded into a single integer number using αd bits. Since
each descriptor is just an integer with bounded range, one can search for an exact
match instead of a nearest neighbor, using perfect hashing on a grid, with the
remarkable time complexity of only O(1), however this would require a (perhaps
unrealistic) O(αd · n3 ) preprocessing time [8]. Alternatively, using a search tree
or sorting all database entries would still yield a time complexity of O(log n) and
would require only O(n log n) preprocessing time.
66 C. Brenner

However, there are two caveats. First, since there is noise in the measured
query vector, quantization may lead to an error of ±1 (if the cell size is chosen
as outlined above). That is, one needs to search for the neighboring cell as
well. Using a tree, this can be done in O(1), however for just one dimension,
which will not help, since we ‘folded’ d dimensions into one integer. Overmars
[9] proposed an√ algorithm for range search on a grid which has retrieval time
O(l + logd−2 n α), where l is again the number of returned results, requiring
O(n logd−1 n) storage. However, since we are not performing a general range
search, but rather are interested in the two direct neighbors only (i.e., if the
value in one dimension is i, we have to look at i − 1, i, i + 1), we can simply
search several times, which requires 3 searches for each dimension, i.e. a total of
3d searches. This will grow by a factor of 3 for each added dimension instead of
log n and allow us to use O(n) storage. Still, for large d (remember 17 neighbors
yield d = 33) this is not practical. The second caveat is that we cannot assume a
fixed dimension and have to allow for missing and additional poles in the query.

4.2 Design of the Local Pattern Descriptor

While we can’t defeat the curse of dimensionality, the following observation is


the key to a practical solution. As we have seen in section 3 (Fig. 2(b)), the
maximum overlap of a pole with any other pole, ni,max , is usually quite small.
Therefore, it suffices to take a subset of k points of a local neighborhood in order
to perform a query. This suggests the following approach:

– Database construction. For every pole pi with neighborhood Pi , select


all possible combinations of k − 1 poles from Pi (for a total of k). Compute
a unique descriptor D for those points, which has a (fixed) dimension of
d = 2k − 3. Store the value i under the key D in the database.
– Query. For a given scene, draw a random selection of k points and retrieve
the set of possible solutions. Repeat this until there is only one solution re-
maining (see algorithm 1). (This could also be replaced by a voting scheme.)

The unique descriptor D of a point set with k ≥ 2 points is formed as follows.


First, the diameter of the points is determined (the largest distance between
any two points in the set), which can be done in O(k log k). The diameter yields
the first value of the descriptor. Then, the x-axis is defined along the diameter
and the y-axis perpendicular to it. The orientation is selected in such a way
that when the remaining k − 2 points are expressed in local coordinates, the
extension in +y is larger than in −y. Then, all the (xi , yi ) values are sorted in
lexicographic order and are added one after the other to the descriptor, yielding
a total of 1 + 2(k − 2) = 2k − 3 values. In order to prevent that a structural
change of the descriptor is induced by a small error in coordinates, building the
descriptor fails if during its construction any decision is closer than .
Using a fixed k solves the problem that varying pole neighbor counts lead to
different dimensions d. Also, selecting k as small as possible leads to a small d
and thus, to a small number of queries, 3d (see next section for scalability).
Global Localization of Vehicles Using Local Pole Patterns 67

Algorithm 1. Database query.


1: Q is a local set of poles to be searched in the database
2: S ← {1, . . . , N } [the set of all indices in the database]
3: Select a point q from Q, which acts as the ‘center pole’
4: while |S| > 1 do
5: Randomly select k − 1 points from Q \ {q}: {q1 , q2 , . . . , qk−1 }
6: Compute the unique descriptor, D, for the k points {q, q1 , . . . , qk−1 }
7: Query the database for the key D, which yields a set of indices S1
8: S ← S ∩ S1 [narrow the set of possible solutions]
9: return the single element in S

The random draws in the query part of the algorithm also solve the problem of
erroneous extra poles in the scene. For example, considering the average number
of 18 poles in a pole neighborhood, if only 50% of them (9) are captured by a
vehicle, with an additional
 3 captured
  in error, and k = 4, then the probability
9 12
of a good draw is still / = 25% and the expected number of draws
4 4
required to get at least one correct draw is 4.

4.3 Scalability
It remains to be determined how k should be selected. If it is small, this keeps the
database size and the number of required queries (3d ) small, and gives better
chances to pick a correct pole subset when erroneous extra poles are present.
However, if it is too small, the returned set of keys S1 in algorithm 1, line 7,
gets large. In fact, one would like to select k in such a way that |S1 | = O(1).
Otherwise, if d is too small, |S1 | will be linear in n.
For a concrete example, consider k = 2, then pairs of points are in the
database. If the average number of neighbors is 18, and N = 2,658, then n =
18·2,658 = 47,844. If  = 0.1 m, the error in distance (which is a difference)
is 0.2 m. If the 47,844 entries are distributed uniformly in the [0, 50 m] range,
383 will be in any interval of length 0.4 m (±0.2 m). Thus, in the uniformly
distributed case, we would expect that a random draw of a pair yields about 400
entries in the database and it will need several draws to reduce this to a single
solution, according to algorithm 1.
We will have a closer look at the case k = 3. In this case, we would expect
about N ∗ 18 ∗ 17/2 = 406,674 different descriptors (indeed there are 503,024).
How are those descriptors distributed? Since for k = 3 it follows d = 3, we
can plot them in 3D space. From Fig. 3, one sees that the distribution is quite
uniform. There is a certain point pattern evident, especially on the ground plane,
which occurs with a spacing of 6 m and can indeed be traced back to a row of
alley trees in the scene, planted at 6 m spacing. Fig. 3 supports the assumption
that, in contrast to indoor scenes (typically occuring in robotics), the descriptors
expose only little regularity.
68 C. Brenner

Fig. 3. All descriptors D of the scene, plotted in 3D space (N = 2,658 poles, k = 3).
The coordinates (x, y, z) correspond to the descriptor’s (d, x1 , y1 ).

If the distribution is about uniform, the question is how large the space
spanned by all possible descriptors
√ D is? The volume of the (oddly shaped)
descriptor space for k = 3 is (2 3/3 − 2π/9)r3 ≈ 0.46r3 . To give an estimation,
it is computed how many voxels of edge length 0.4 ( = 0.1 m, cf. to the reason-
ing in the case k = 2 above) fit to this space, which is 891,736. Therefore, for
k = 3, if the 503,024 descriptors are ‘almost’ uniformly placed in the ‘891,736
cell’ descriptor space, one can expect that a query will lead to a single result, as
desired.
In order to verify this, the following experiment was carried out. After filling
the database, 10 queries are performed for any pole in the database, according
to algorithm 1. A descriptor from the database was considered to match the
query descriptor if all elements were within a distance of 2. The number of
iterations (draws) required to narrow down the resulting set to a single element
(while loop in line 4 of algorithm 1) was recorded into a histogram. At most 20
iterations were allowed, i.e. the histogram entries at ‘20’ mark failures. Again,
the histogram entries are sorted according to the number of neighbors. Fig. 4(a)
shows the case for k = 2 and  = 0.1 m. It can be seen that in most of the cases,
5 iterations were required, with up to 10 or even more for poles with a large
number of neighbors. For poles with only a few neighbors, there is a substantial
number of failures. For  = 0.2 m, the situation gets worse (Fig. 4(b)). There
are more failures, and also, more iterations required in general. Of course, this
is the result of a too small descriptor space.
Moving on to k = 3, we see that most poles are found within 1 or 2 iterations
(Fig. 4(c)). (Note that point triplets will vote for all three of their endpoints if all
sides are ≤ r, for which reason often 3 solutions are returned for the first query
and another iteration is required.) When trying  = 0.2 m (Fig. 4(d)), there is
almost no change, which means that the descriptor space is large in relation to
the number of descriptors.
Global Localization of Vehicles Using Local Pole Patterns 69

ϮϬ ϮϬ

ϭϱ ϭϱ

ϭϬ ϭϬ

ϱ ϱ

Ϭ Ϭ
Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ

(a) (b)
ϮϬ ϮϬ

ϭϱ ϭϱ

ϭϬ ϭϬ

ϱ ϱ

Ϭ Ϭ
Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ

(c) (d)

Fig. 4. Histograms of the number of draws required to retrieve a pole uniquely from
the database. x-axis is the number of poles in the neighborhood, y-axis is number of
draws required (with 20 being failures). The area of the bubbles represent the number
of cases. All experiments for r = 50 m and (a) k = 2,  = 0.1 m, (b) k = 2,  = 0.2 m,
(c) k = 3,  = 0.1 m, (d) k = 3,  = 0.2 m.

Finally, to give an estimation on the order of N for different k, we use the


above reasoning (for r = 50 m,  = 0.1 m, 18 poles neighborhood). For k=2, there
are 18N descriptors and 50/0.4 = 125 cells, so that N = 7. Similarly, for k = 3,
there are 18 ∗ 17/2 ·N descriptors and 891,736 cells, so that N = 5,828. For k = 4
and k = 5 it follows N = 1.3·107 (1010 cells) and N = 4.8·1010 (1014 cells). Note
that although 1010 cells (for a a database size of thirteen million poles) sounds
large, this is in the order of the main memory of a modern desktop computer.

5 Conclusions
In this paper, the use of local pole patterns for global localization was inves-
tigated. First, the characteristics of local pole patterns are determined, using
a large scene captured by LiDAR and assumptions on the measurement range
70 C. Brenner

and accuracy. Second, a local descriptor is proposed which has a constant di-
mension and allows for an efficient retrieval. Third, the structure and size of the
descriptor space, the retrieval performance and the scalability were analyzed.
There are numerous enhancements possible. When constructing the database,
not all descriptors should be required and especially, clusters in descriptor space
can probably be removed (similar to stop lists). Also, additional features like
planar patches or dihedral edges can (and should) be used. Finally, experiments
with real vehicle sensors are required to verify the assumptions regarding range
and accuracy, and larger scenes would be needed to verify scalability.

Acknowledgements
This work has been supported by the VolkswagenStiftung, Germany.

References
1. Arras, K.O., Siegwart, R.Y.: Feature extraction and scene interpretation for map-
based navigation and map building. In: Proc. SPIE. Mobile Robots XII, vol. 3210,
pp. 42–53 (1997)
2. Lowe, D.: Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision 60(2), 91–110 (2004)
3. Fraundorfer, F., Wu, C., Frahm, J.M., Pollefeys, M.: Visual word based location
recognition in 3d models using distance augmented weighting. In: Fourth Interna-
tional Symposium on 3D Data Processing, Visualization and Transmission (2008)
4. Wamelen, P.B.V., Li, Z., Iyengar, S.S.: A fast expected time algorithm for the 2-D
point pattern matching problem. Pattern Recognition 37(8), 1699–1711 (2004)
5. Bishnu, A., Das, S., Nandy, S.C., Bhattacharya, B.B.: Simple algorithms for partial
point set pattern matching under rigid motion. Pattern Recognition 39(9), 1662–
1671 (2006)
6. Kremer, J., Hunter, G.: Performance of the streetmapper mobile lidar mapping
system in ‘real world’ projects. In: Photogrammetric Week, Wichmann, pp. 215–
225 (2007)
7. Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: Proc.
IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2161–2168 (2006)
8. Fredman, M.L., Komlos, J., Szemeredi, E.: Storing a sparse table with O(1) worst
case access time. Journal of the ACM 31(3), 538–544 (1984)
9. Overmars, M.H.: Efficient data structures for range searching on a grid. Techni-
cal Report RUU-CS-87-2, Department of Computer Science, University of Utrecht
(1987)

You might also like