Sparse Concept Coding For Visual Analysis
Sparse Concept Coding For Visual Analysis
Deng Cai
Hujun Bao
Xiaofei He
The State Key Lab of CAD&CG, College of Computer Science, Zhejiang University
388 Yu Hang Tang Rd., Hangzhou, Zhejiang, China 310058
{dengcai, bao, xiaofeihe}@cad.zju.edu.cn
Abstract
We consider the problem of image representation for visual analysis. When representing images as vectors, the
feature space is of very high dimensionality, which makes it
difficult for applying statistical techniques for visual analysis. To tackle this problem, matrix factorization techniques,
such as Singular Vector Decomposition (SVD) and Nonnegative Matrix Factorization (NMF), received an increasing amount of interest in recent years. Matrix factorization
is an unsupervised learning technique, which finds a basis
set capturing high-level semantics in the data and learns
coordinates in terms of the basis set. However, the representations obtained by them are highly dense and can not
capture the intrinsic geometric structure in the data. In this
paper, we propose a novel method, called Sparse Concept
Coding (SCC), for image representation and analysis. Inspired from the recent developments on manifold learning
and sparse coding, SCC provides a sparse representation
which can capture the intrinsic geometric structure of the
image space. Extensive experimental results on image clustering have shown that the proposed approach provides a
better representation with respect to the semantic structure.
1. Introduction
Image representation is the fundamental problem in image processing. Researchers have long sought effective
and efficient representations of images. For a give image
database, we may have thousands of distinct features. However, the degree of the freedom of each image could be far
less. Instead of the original feature space, one might hope
to find a hidden semantic concept space to represent the
images. The dimensionality of this concept space will be
much smaller than the feature space. To achieve this goal,
matrix factorization based approaches have attracted considerable attention in the last decades [1, 5, 10].
Given an image data matrix X Rmn , each column of
X corresponding to an image, the matrix factorization methods find two matrices U Rmk and A Rkn whose
2905
that U Rmk and A = VT Rkn is the optimal solution of the optimization problem (1) under the constraint
rank(UA) k [8].
Another popular matrix factorization algorithm is Nonnegative Matrix Factorization (NMF) [10], which focuses
on the analysis of data matrices whose elements are nonnegative. NMF adds the non-negative constraint on both U
and A in the optimization problem (1). The non-negative
constraints on U and A only allow additive combinations
among different basis vectors. For this reason, it is believed
that NMF can learn a parts-based representation [10].
The representation matrix A learned in the above two
methods is usually dense. Since each basis vector (column
vector of U) can be regarded as a concept, the denseness
of A indicates that each images is a combination of all the
concepts. This is contrary to our common knowledge since
most of the images only include several semantic concepts.
Sparse Coding (SC) [11, 16] is a recently popular matrix
factorization method trying to solve this issue. Sparse coding adds the sparse constraint on A, more specifically, on
each column of A, in the optimization problem (1). In this
way, SC can learn a sparse representation. SC has several
advantages for data representation. First, it yields sparse
representations such that each data point is represented as
a linear combination of a small number of basis vectors.
Thus, the data points can be interpreted in a more elegant
way. Second, sparse representations naturally make for an
indexing scheme that would allow quick retrieval. Third,
the sparse representation can be overcomplete, which offers
a wide range of generating elements. Potentially, the wide
range allows more flexibility in signal representation and
more effectiveness at tasks like signal extraction and data
compression. Finally, there is considerable evidence that biological vision adopts sparse representations in early visual
areas [15]. The sparse coding approach is fundamentally
different from those sparse subspace learning methods, e.g.,
SparsePCA [25] and SparseLDA [14]. Instead of learning
a sparse A, the sparse subspace learning methods [14, 25]
learn a sparse U. The low dimensional representation matrix learned by these sparse subspace learning methods is
still dense.
graph, SCC encodes the semantic structure in the basis vectors. In the second step, SCC uses the LASSO [9] to learn
a sparse representation with respect to the learned basis for
each image. As a result, SCC can have more discriminating
power than the traditional matrix factorization approaches
which only consider the Euclidean structure of the data.
The rest of the paper is organized as follows: in Section
2, we provide a brief review of matrix factorization. Our
Sparse Concept Coding method is introduced in Section 3.
The experimental results are presented in Section 4. Finally,
we provide the concluding remarks in Section 5.
2. Background
Given a data set with high dimensionality, matrix factorization is a common approach to compress the data
by finding a set of basis vectors and the representation
with respect to the basis for each data point. Let X =
[x1 , , xn ] Rmn be the data matrix, matrix factorization can be mathematically defined as finding two matrices
U Rmk and A Rkn whose product can best approximate X:
X UA.
Each column of U can be regarded as a basis vector which
captures the higher-level features in the data and each column of A is the k-dimensional representation of the original
inputs with respect to the new basis. From this sense, matrix factorization can also be regarded as a dimensionality
reduction method since it reduces the dimension from m to
k.
A common way to measure the approximation is by
Frobenius norm of a matrix . Thus, the matrix factorization can be defined as the optimization problem as follows:
min X UA2
U,A
(1)
Various matrix factorization algorithms add different constraints on the above optimization problem based on different goals.
Latent Semantic Analysis (LSA) is one of the most popular matrix factorization algorithms for image analysis [5]. It
adds rank constraint (rank(UA) k) on the optimization
problem in Eq. (1). LSA is fundamentally based on Singular Value Decomposition (SVD) [8]. Suppose the rank of X
is r, the SVD decomposition of X is as follows:
X = UVT ,
where = diag(1 , , r ) and 1 2 r >
0 are the singular values of X, U = [u1 , , ur ] and ui s are
called left singular vectors, V = [v1 , , vr ] and vi s are
called right singular vectors. We have UT U = VT V = I.
LSI uses the first k columns of U and V. It can be proven
2906
(6)
(2)
(7)
where xi and ai are the i-th columns of X and A, respectively. |ai | denote the L1-norm of ai and this L1-norm regularization term is added to ensure the sparseness of ai . In
statistics, the above L1-regularized regression problem is
called LASSO [9].
The optimization problem in Eq. (7) has the following
equivalent formulation:
(5)
s.t.
|ai |
(8)
(4)
2907
MI(C, C ) =
Traditional sparse coding approaches consider the factorization on Euclidean space, whereas SCC learns the
basis by exploiting the intrinsic geometric of the data.
The basis learned by SCC can have more discriminating power.
NMI(C, C ) =
MI(C, C )
max(H(C), H(C ))
where H(C) and H(C ) are the entropies of C and C , respectively. It is easy to check that NMI(C, C ) ranges from
0 to 1. NMI=1 if the two sets of clusters are identical, and
NMI=0 if the two sets are independent.
4. Experimental Results
Previous studies show that matrix factorization and
sparse coding approaches are very powerful on clustering
[23]. It can achieve similar or better performance than most
of the state-of-the-art clustering algorithms. In this section,
we also evaluate our SCC algorithm on clustering problems.
The COIL20 image library1 is used in our experiment.
COIL20 contains 3232 gray scale images of 20 objects
viewed from varying angles and each object has 72 images.
The clustering result is evaluated by comparing the obtained label of each sample with the label provided by the
data set. Two metrics, the accuracy (AC) and the normalized mutual information metric (NMI) are used to measure
the clustering performance [3]. Given a data point xi , let ri
and si be the obtained cluster label and the label provided
by the corpus, respectively. The AC is defined as follows:
i=1
p(ci , cj )
p(ci ) p(cj )
AC =
ci C,cj C
O(n2 s + n2 p + k 3 + mk 2 ).
N
Space Concept Coding (SCC), which is the new approach proposed in this paper.
After the matrix factorization, we have a low dimensional
representation of each image. The clustering is then performed in this low dimensional space.
(si , map(ri ))
N
1 http://www1.cs.columbia.edu/CAVE/software/softlib/coil-20.php
2908
90
80
85
75
80
70
65
60
55
SCC
LapSC
SC
NMF
SVD
Baseline
50
45
40
5 20
50
100
150
200
# of basis vectors
250
LapSC
79.616.7
76.311.5
72.09.1
76.87.4
72.25.2
72.76.4
71.83.9
69.43.5
71.1
73.5
SCC
90.513.9
92.46.1
87.511.0
86.17.4
83.07.1
81.34.4
80.04.9
77.43.6
83.4
84.6
Baseline
74.618.3
73.211.4
71.86.8
75.06.2
73.15.6
73.34.2
74.63.1
73.72.6
73.4
73.6
90
70
75
70
65
60
55
50
45
SCC
87.617.4
90.27.8
88.78.3
89.34.2
87.65.2
87.53.4
87.22.8
86.31.9
88.3
88.1
95
80
SCC
LapSC
SC
NMF
SVD
Baseline
10 30 50 68 100
150
200
# of basis vectors
60
50
SCC
LapSC
SC
NMF
SVD
Baseline
40
30
250
10
15
Cardinality
20
85
Accuracy (%)
4
6
8
10
12
14
16
18
20
Avg
Baseline
83.015.2
74.510.3
68.65.7
69.68.0
65.06.8
64.04.9
64.04.9
62.74.7
63.7
68.3
Accuracy (%)
85
80
75
70
65
SCC
LapSC
SC
NMF
SVD
Baseline
60
55
50
10
15
Cardinality
20
5. Conclusions
In this paper, we have presented a novel matrix factorization method for image representation called Sparse Concept Coding (SCC). SCC is a two-step approach including
a basis learning stage and a sparse representation learning
stage. In the first stage, SCC learns the basis by exploiting
the intrinsic geometric structure of the data. By applying
the spectral analysis on the nearest neighbor graph, SCC
encodes the semantic structure in the basis vectors. In the
second stage, SCC uses the LASSO to learn a sparse representation with respect to the learned basis for each image.
As a result, SCC can have more discriminating power than
the traditional matrix factorization approaches (SVD, NMF
and SC) which only consider the Euclidean structure of the
data. Experimental results on image clustering show that
SCC provides better representation in the sense of semantic
structure.
Acknowledgments
This work was supported in part by National Natural
Science Foundation of China under Grants 60905001 and
60875044, National Basic Research Program of China (973
Program) under Grant 2011CB302206 and Fundamental
Research Funds for the Central Universities.
References
[1] S. Agarwal, A. Awan, and D. Roth. Learning to detect objects in images via a sparse, part-based representation. IEEE
Transactions on Pattern Analysis and Machine Intelligence,
pages 14751490, 2004.
[2] M. Belkin and P. Niyogi. Laplacian eigenmaps and spectral
techniques for embedding and clustering. In Advances in
Neural Information Processing Systems 14, pages 585591.
2001.
[3] D. Cai, X. He, and J. Han. Document clustering using locality preserving indexing. IEEE Transactions on Knowledge
and Data Engineering, 17(12):16241637, December 2005.
[4] F. R. K. Chung. Spectral Graph Theory, volume 92 of Regional Conference Series in Mathematics. AMS, 1997.
[5] S. C. Deerwester, S. T. Dumais, T. K. Landauer, G. W. Furnas, and R. A. harshman. Indexing by latent semantic analysis. Journal of the American Society of Information Science,
41(6):391407, 1990.
[6] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani. Least
angle regression. Annals of Statistics, 32(2):407499, 2004.
[7] S. Gao, I. Tsang, L. Chia, and P. Zhao. Local features are not
lonelyLaplacian sparse coding for image classification. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR10), pages 35553561, 2010.
2910