Thesis Chapter 7
Thesis Chapter 7
Thesis Chapter 7
Conclusions
In this concluding chapter we summarize the contributions of this thesis and the
possible impact as we see it, and discuss the important directions of future work.
• It is usually beneficial to learn a model for the similarity relevant to the task, be
it regression, classification or retrieval. It rarely hurts, and usually improves the
performance of the end goal application. Of course, the precise gain of learn-
ing similarity for any given application can be assessed by standard validation
techniques.
• Such learning can be successfully done directly from examples of similarity judg-
ment specific for the task, with minimal assumptions regarding the properties of
the underlying similarity concept. In many cases, for instance when the task in-
volves regression, the learning procedure including labeling similarity examples
can be completed fully automatically.
133
7.1.1 Learning algorithms
The basis of our approach is to construct an embedding
such that low distance between H(x) and H(y) corresponds, with high probability,
to positive label assigned by the similarity S(x, y). The main advantage of this
approach, and what distinguishes it from the alternatives known to us, is that it
achieves two important goals:
• It provides us with a set of similarity classifiers on pairs of examples. This set is
parametrized by the value of the threshold on distance in the embedding space
H.
• It reduces the problem of similarity search to the problem of search for neighbors
with respect to the L1 distance. As a result, we are able to leverage state-of-
the-art search algorithms like LSH, that have sublinear running time.
In Chapter 3 we have presented a family of learning algorithms that construct an
embedding of the form described above:
Similarity Sensitive Coding (SSC) The algorithm1 takes pairs labeled by similar-
ity, and produces a binary embedding space H, typically of very high dimension.
The embedding is learned by independently collecting thresholded projections
of the data. This algorithm improves the performance of example-based meth-
ods on some data sets, and has been used however its utility is largely limited
to cases when the underlying similarity is close to L1 distance, with some mod-
ifications. This algorithm has been successful in articulated pose estimation
domain, as described in Chapters 4 and 5.
Boosted SSC This algorithm2 addresses the redundancy in SSCby collecting the
embedding dimensions greedily, rather than independently. It also introduces
weighting on the dimensions of H. We have applied this algorithm to the
tasks of pose and orientation estimation for an articulated tracking application,
described in Chapter 5.
BoostPro This algorithm is a generalization of Boosted SSCin that the dimensions
of the embedding are no longer limited to axis-parallel stumps. We have intro-
duced a continuous approximation for the thresholded projection paradigm in
which a gradient ascent optimization becomes possible. This algorithm further
improves the performance of example-based methods on standard benchmark
data. We also show its performance on articulated pose estimation, in chapter 4.
Finally, we have used this algorithm to learn visual similarity of image patches,
and have shown significant improvement over standard similarity measures used
with two patch descriptors.
1
Published in [105]; joint work with P. Viola.
2
Published in [93]; joint work with L. Ren, J. Hodgins, H. Pfister and P. Viola.
134
Semi-supervised learning For each of these three algorithms we have presented a
semi-supervised version which only requires pairs similar under S, in addition
to a set of unlabeled individual examples in X .
• This is the first attempt, to our knowledge, to improve the matching accuracy
of standard (and widely used) descriptors by learning a similarity model specific
to the invariant properties the matching is intended to capture.
• The fact that the learned similarity is measured by the L1 distance in the
embedding space is very significant from the computational point of view, since
in a large-scale recognition system we may need to probe databases with millions
of patches for similarity to the input set of patches. Our framework allows us
to apply algorithms like LSH, and perform this search in sublinear time.
135
7.2 Direction for future work
Theoretical investigation An open theoretical question that arises from the work
presented here pertains to the class of similarity concepts that can be attained by the
embedding algorithms presented in Chapter 3. By departing from the framework
of LSH to similarity-sensitive framework introduced in Section 2.4.2, we extend the
class of similarities from L1 to a more general family. It would be interesting to
characterize the properties of this family, and the connections between the geometry
of a similarity concept in X 2 and the extent to which an embedding learned by our
algorithms can represent that concept.
136
idea for integration of the similarity learning approach developed in this thesis in a
multi-category classification architecture.
Evidence from neuroscience [39] suggests that the majority of cells in the visual
pathway may be placed within a computational hierarchy. As the level in the hierar-
chy increases, which roughly corresponds to retino-cortical direction (away from the
retina),
137
Figure 7-1: A cartoon of the proposed hierarchical representation, showing the sharing
of the features and the two-stage learning architecture. A representation for a given
image patch may include any of the features from the lower (generic) layers and any
of the features from the higher, class-specific layers.
138