Global Localization Using Local Pole Patterns
Global Localization Using Local Pole Patterns
Global Localization Using Local Pole Patterns
Patterns
Claus Brenner
1 Introduction
For future driver assistance systems and autonomous driving, reliable position-
ing is a prerequisite. While global navigation satellite systems like GPS are in
widespread use, they lack the required reliability, especially in densely builtup
urban areas. Relative positioning, using video or LiDAR sensors, is an important
alternative, which has been explored in many disciplines, like photogrammetry,
computer vision and robotics. In order to determine one’s position uniquely, a
global map is required, which is usually based on features (or landmarks), rather
than on the originally captured raw data. In robotics, features like line segments
or corners have been extracted from horizontal scans [1]. However, such features
exhibit a low degree of information and, especially for indoor sites, are not very
discriminative. Recently, there is much research in computer vision using high-
dimensional descriptors (such as SIFT), extracted from images, which can be
used for object recognition [2] or localization [3]. Thus, it is interesting if high-
dimensional descriptors can be found which are strictly based on geometry and
work in large outdoor environments.
In this paper, a large 3D scanned outdoor scene is used from which a map of
features is derived. Upright poles are extracted, which can be done quite reliably
using simple geometric reasoning. The pole centers then form a 2D pattern. 2D
point pattern matching is a research topic of its own, with applications for star
J. Denzler, G. Notni, and H. Süße (Eds.): DAGM 2009, LNCS 5748, pp. 61–70, 2009.
c Springer-Verlag Berlin Heidelberg 2009
62 C. Brenner
pattern and fingerprint matching. Van Wamelen et al. [4] present an expected
linear time algorithm, to find a pattern in O(n(log k)3/2 ), where n is the number
of points in the scene and k < n the number of points in the subset which
shall be searched in the scene. Bishnu et al. [5] describe an algorithm which is
O(kn4/3 log n) in the worst case, however is reported to perform much better for
practical cases. It is based on indexing the distance of point pairs. This paper
generalizes this approach, by encoding the local relations of two or more points.
Fig. 1. (a) Scan path in Hannover (color encodes height of scanned points, using
temperature scale). (b) Illustration of the pole extraction algorithm, using cylindri-
cal stacks. In this example, some parts of the pole were not verified due to missing
scan points (middle), others due to an additional structure, such as a sign mounted to
the pole (top). (c) Extracted poles for the intersection ‘Friederikenplatz’ in Hannover.
(a) and (c) overlaid with a cadastral map (which takes not part in any computation).
For the entire 22 km scene, a total of 2,658 poles was found, which is on average
one pole every 8 meters (Fig. 1(c)). In terms of data reduction, this is one extracted
pole (or 2D point) per 27,000 original scan points. Although the current implemen-
tation is not optimized, processing time is uncritical and yields several poles per
second on a standard PC. There are also some false detections in the scene, e.g.
where a pole-like point pattern is induced by low point density or occlusion. Nev-
ertheless, the pole patterns obtained are considered to be representative.
it is just an assumption regarding the scene extent and accuracy which seems
feasible, either by direct measurement or by merging of multiple scans along the
path of the vehicle.
Using the assumed parameters, one can derive the first pole pattern charac-
teristics. Placing the center of the ‘horizon’ on every pole in turn, and counting
the number of other poles within a radius of r = 50 m, one finds that the average
number of neighbor poles is around 18, with a maximum of 41. The number of
neighbors is not uniformly distributed, but rather, there is a peak around 17, as
can be seen from the histogram in Fig. 2(a). Around 50% of all poles do have
between 12 and 22 neighbors.
To determine uniqueness, the following experiment was performed. Each of
the poles pi , 1 ≤ i ≤ N = 2, 658 is taken as center and its local set of neighbor
poles Pi = {pj |j = i, pi − pj 2 ≤ r} (with r = 50 m) is matched to every pole
neighborhood Pj , j = i in the scene. Taking into account all possible rotations, it
is counted how many poles match within a tolerance of 2 and the maximum over
all this counts is taken, i.e. ni,max = maxj=i (matchcount(Pi , Pj )). The result is
shown as a histogram in Fig. 2(b), where the area of the bubbles reflects the count
and the pole neighborhoods are sorted according to the number of neighbors |Pi |.
For example, the bubble at (17, 2) represents the number of poles with 17 other
poles in the neighborhood, for which ni,max = 2 (in this case, the bubble area
represents a count of 101, which means that 101 out of the 155 poles with 17
neighbors (peak in Fig. 2(a)) fulfill this criterion). An important observation is
that for poles with up to around 20 neighbors, there is a strong peak at two, which
means that in the majority of those cases, if we take the center and three more
points in the neighborhood, this is a pattern which is unique in the entire scene.
This property is used in the next section for the design of the local descriptor.
It can also be seen that ni,max may be as large as 20, for poles with |Pi | ≥ 36.
As it turns out, this is due to alleys in the scene, where trees are planted with
regular spacing. Along those alleys, there is a huge number of neighbors, many
of which fit to other places along the alley. Nevertheless, one can see that even
in this case, ni,max is only around 50% of |Pi |.
Ϯϰ
ϮϮ
ϮϬ
ϭϴ
ϭϲϬ ϭϲ
ϭϰϬ ϭϰ
ϭϮϬ ϭϮ
ϭϬϬ
ϭϬ
ϴϬ
ϴ
ϲϬ
ϲ
ϰϬ
ϰ
ϮϬ
Ϯ
Ϭ
Ϭ
Ϭ Ϯ ϰ ϲ ϴ ϭϬ ϭϮ ϭϰ ϭϲ ϭϴ ϮϬ ϮϮ Ϯϰ Ϯϲ Ϯϴ ϯϬ ϯϮ ϯϰ ϯϲ ϯϴ ϰϬ
Ϭ Ϯ ϰ ϲ ϴ ϭϬ ϭϮ ϭϰ ϭϲ ϭϴ ϮϬ ϮϮ Ϯϰ Ϯϲ Ϯϴ ϯϬ ϯϮ ϯϰ ϯϲ ϯϴ ϰϬ ϰϮ
(a) (b)
however, the number of dimensions would not be fixed but rather depend on
the local scene content. Moreover, since one has to take into account missing
or extra poles resulting from scene interpretation errors, descriptors of different
dimensions would have to be compared.
As is well-known, common efficient indexing methods are not successful in
databases of high dimensions. For example, the popular kd-tree has a time com-
plexity for retrieval of O(n1−1/d + l) where n is the number of elements in the
database, d is the dimension, and l is the number of retrieved neighbors. This
means that for high dimensions, it is as efficient as a brute force O(n) search over
all elements. Therefore, one has to rely on approximations, for example searching
for a near neighbor only (as done by Lowe [2]) or by using quantization (as done
by Nistér and Stewénius [7], where quantization into cells is given by a hierarchic
clustering tree).
In our case, quantization is staightforward, since the parameters required to
express the pole pattern are geometric in nature, with given maximum range
and measurement accuracy. For example, using a quantization of 2 = 0.2 m
within a total range of ±50 m would yield 500 discrete values, which requires
approximately α ≈ 9 bits to encode. Thus, let us assume for the moment that
the number of poles in all neighborhoods of the database (and the query) is
the same, say k, in which case the dimension is d = 2k − 3, and each local
neighborhood can be encoded into a single integer number using αd bits. Since
each descriptor is just an integer with bounded range, one can search for an exact
match instead of a nearest neighbor, using perfect hashing on a grid, with the
remarkable time complexity of only O(1), however this would require a (perhaps
unrealistic) O(αd · n3 ) preprocessing time [8]. Alternatively, using a search tree
or sorting all database entries would still yield a time complexity of O(log n) and
would require only O(n log n) preprocessing time.
66 C. Brenner
However, there are two caveats. First, since there is noise in the measured
query vector, quantization may lead to an error of ±1 (if the cell size is chosen
as outlined above). That is, one needs to search for the neighboring cell as
well. Using a tree, this can be done in O(1), however for just one dimension,
which will not help, since we ‘folded’ d dimensions into one integer. Overmars
[9] proposed an√ algorithm for range search on a grid which has retrieval time
O(l + logd−2 n α), where l is again the number of returned results, requiring
O(n logd−1 n) storage. However, since we are not performing a general range
search, but rather are interested in the two direct neighbors only (i.e., if the
value in one dimension is i, we have to look at i − 1, i, i + 1), we can simply
search several times, which requires 3 searches for each dimension, i.e. a total of
3d searches. This will grow by a factor of 3 for each added dimension instead of
log n and allow us to use O(n) storage. Still, for large d (remember 17 neighbors
yield d = 33) this is not practical. The second caveat is that we cannot assume a
fixed dimension and have to allow for missing and additional poles in the query.
The random draws in the query part of the algorithm also solve the problem of
erroneous extra poles in the scene. For example, considering the average number
of 18 poles in a pole neighborhood, if only 50% of them (9) are captured by a
vehicle, with an additional
3 captured
in error, and k = 4, then the probability
9 12
of a good draw is still / = 25% and the expected number of draws
4 4
required to get at least one correct draw is 4.
4.3 Scalability
It remains to be determined how k should be selected. If it is small, this keeps the
database size and the number of required queries (3d ) small, and gives better
chances to pick a correct pole subset when erroneous extra poles are present.
However, if it is too small, the returned set of keys S1 in algorithm 1, line 7,
gets large. In fact, one would like to select k in such a way that |S1 | = O(1).
Otherwise, if d is too small, |S1 | will be linear in n.
For a concrete example, consider k = 2, then pairs of points are in the
database. If the average number of neighbors is 18, and N = 2,658, then n =
18·2,658 = 47,844. If = 0.1 m, the error in distance (which is a difference)
is 0.2 m. If the 47,844 entries are distributed uniformly in the [0, 50 m] range,
383 will be in any interval of length 0.4 m (±0.2 m). Thus, in the uniformly
distributed case, we would expect that a random draw of a pair yields about 400
entries in the database and it will need several draws to reduce this to a single
solution, according to algorithm 1.
We will have a closer look at the case k = 3. In this case, we would expect
about N ∗ 18 ∗ 17/2 = 406,674 different descriptors (indeed there are 503,024).
How are those descriptors distributed? Since for k = 3 it follows d = 3, we
can plot them in 3D space. From Fig. 3, one sees that the distribution is quite
uniform. There is a certain point pattern evident, especially on the ground plane,
which occurs with a spacing of 6 m and can indeed be traced back to a row of
alley trees in the scene, planted at 6 m spacing. Fig. 3 supports the assumption
that, in contrast to indoor scenes (typically occuring in robotics), the descriptors
expose only little regularity.
68 C. Brenner
Fig. 3. All descriptors D of the scene, plotted in 3D space (N = 2,658 poles, k = 3).
The coordinates (x, y, z) correspond to the descriptor’s (d, x1 , y1 ).
If the distribution is about uniform, the question is how large the space
spanned by all possible descriptors
√ D is? The volume of the (oddly shaped)
descriptor space for k = 3 is (2 3/3 − 2π/9)r3 ≈ 0.46r3 . To give an estimation,
it is computed how many voxels of edge length 0.4 ( = 0.1 m, cf. to the reason-
ing in the case k = 2 above) fit to this space, which is 891,736. Therefore, for
k = 3, if the 503,024 descriptors are ‘almost’ uniformly placed in the ‘891,736
cell’ descriptor space, one can expect that a query will lead to a single result, as
desired.
In order to verify this, the following experiment was carried out. After filling
the database, 10 queries are performed for any pole in the database, according
to algorithm 1. A descriptor from the database was considered to match the
query descriptor if all elements were within a distance of 2. The number of
iterations (draws) required to narrow down the resulting set to a single element
(while loop in line 4 of algorithm 1) was recorded into a histogram. At most 20
iterations were allowed, i.e. the histogram entries at ‘20’ mark failures. Again,
the histogram entries are sorted according to the number of neighbors. Fig. 4(a)
shows the case for k = 2 and = 0.1 m. It can be seen that in most of the cases,
5 iterations were required, with up to 10 or even more for poles with a large
number of neighbors. For poles with only a few neighbors, there is a substantial
number of failures. For = 0.2 m, the situation gets worse (Fig. 4(b)). There
are more failures, and also, more iterations required in general. Of course, this
is the result of a too small descriptor space.
Moving on to k = 3, we see that most poles are found within 1 or 2 iterations
(Fig. 4(c)). (Note that point triplets will vote for all three of their endpoints if all
sides are ≤ r, for which reason often 3 solutions are returned for the first query
and another iteration is required.) When trying = 0.2 m (Fig. 4(d)), there is
almost no change, which means that the descriptor space is large in relation to
the number of descriptors.
Global Localization of Vehicles Using Local Pole Patterns 69
ϮϬ ϮϬ
ϭϱ ϭϱ
ϭϬ ϭϬ
ϱ ϱ
Ϭ Ϭ
Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ
(a) (b)
ϮϬ ϮϬ
ϭϱ ϭϱ
ϭϬ ϭϬ
ϱ ϱ
Ϭ Ϭ
Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ Ϭ ϱ ϭϬ ϭϱ ϮϬ Ϯϱ ϯϬ ϯϱ ϰϬ
(c) (d)
Fig. 4. Histograms of the number of draws required to retrieve a pole uniquely from
the database. x-axis is the number of poles in the neighborhood, y-axis is number of
draws required (with 20 being failures). The area of the bubbles represent the number
of cases. All experiments for r = 50 m and (a) k = 2, = 0.1 m, (b) k = 2, = 0.2 m,
(c) k = 3, = 0.1 m, (d) k = 3, = 0.2 m.
5 Conclusions
In this paper, the use of local pole patterns for global localization was inves-
tigated. First, the characteristics of local pole patterns are determined, using
a large scene captured by LiDAR and assumptions on the measurement range
70 C. Brenner
and accuracy. Second, a local descriptor is proposed which has a constant di-
mension and allows for an efficient retrieval. Third, the structure and size of the
descriptor space, the retrieval performance and the scalability were analyzed.
There are numerous enhancements possible. When constructing the database,
not all descriptors should be required and especially, clusters in descriptor space
can probably be removed (similar to stop lists). Also, additional features like
planar patches or dihedral edges can (and should) be used. Finally, experiments
with real vehicle sensors are required to verify the assumptions regarding range
and accuracy, and larger scenes would be needed to verify scalability.
Acknowledgements
This work has been supported by the VolkswagenStiftung, Germany.
References
1. Arras, K.O., Siegwart, R.Y.: Feature extraction and scene interpretation for map-
based navigation and map building. In: Proc. SPIE. Mobile Robots XII, vol. 3210,
pp. 42–53 (1997)
2. Lowe, D.: Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision 60(2), 91–110 (2004)
3. Fraundorfer, F., Wu, C., Frahm, J.M., Pollefeys, M.: Visual word based location
recognition in 3d models using distance augmented weighting. In: Fourth Interna-
tional Symposium on 3D Data Processing, Visualization and Transmission (2008)
4. Wamelen, P.B.V., Li, Z., Iyengar, S.S.: A fast expected time algorithm for the 2-D
point pattern matching problem. Pattern Recognition 37(8), 1699–1711 (2004)
5. Bishnu, A., Das, S., Nandy, S.C., Bhattacharya, B.B.: Simple algorithms for partial
point set pattern matching under rigid motion. Pattern Recognition 39(9), 1662–
1671 (2006)
6. Kremer, J., Hunter, G.: Performance of the streetmapper mobile lidar mapping
system in ‘real world’ projects. In: Photogrammetric Week, Wichmann, pp. 215–
225 (2007)
7. Nistér, D., Stewénius, H.: Scalable recognition with a vocabulary tree. In: Proc.
IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2161–2168 (2006)
8. Fredman, M.L., Komlos, J., Szemeredi, E.: Storing a sparse table with O(1) worst
case access time. Journal of the ACM 31(3), 538–544 (1984)
9. Overmars, M.H.: Efficient data structures for range searching on a grid. Techni-
cal Report RUU-CS-87-2, Department of Computer Science, University of Utrecht
(1987)