Abstract
Spectral clustering is a flexible clustering algorithm that can produce high-quality clusters on small scale data sets, but it is limited applicable to large scale data sets because it needs O(n 3) computational operations to process a data set of n data points[1]. Based on the minimization of the increment of distortion, we tackle this problem by developing a novel efficient growing vector quantization method to preprocess a large scale data set, which can compress the original data set into a small set of representative data points in one scan of the original data set. Then we apply spectral clustering algorithm to the small set. Experiments on real data sets show that our method provides fast and accurate clustering results.
This work is supported by National Natural Science Foundation of China under Grant No. 61003311, Jiangsu Provincial Key Laboratory of Network and Information Security Grant No. BM2003201-201006, Anhui Provincial Natural Science Research Key Project of China under Grant No. KJ2011A040, Youth Foundation of Anhui University of Technology under Grant No. QZ201316.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Yan, D., Huang, L., Jordan, M.I.: Fast approximate spectral clustering. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 907–916. ACM (2009)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Dietterich, T.G., Becker, S., Ghahramani, Z. (eds.) Proceedings of the Advances in Neural Information Processing Systems, pp. 849–856. MIT Press, Cambridge (2002)
Equitz, W.H.: A new vector quantization clustering algorithm. IEEE Transactions on Acoustics, Speech and Signal Processing 37(10), 1568–1575 (1989)
Vasuki, A., Vanathi, P.T.: A review of vector quantization techniques. IEEE Potentials 25(4), 39–47 (2006)
Lewis, D.D., Yang, Y., Rose, T.G., et al.: Rcv1: A new benchmark collection for text categorization research. The Journal of Machine Learning Research 5, 361–397 (2004)
Chen, W.Y., Song, Y., Bai, H., et al.: Parallel spectral clustering in distributed systems. IEEE Transactions on Pattern Analysis and Machine Intelligence 33(3), 568–586 (2011)
Zelnik-Manor, L., Perona, P.: Self-tuning spectral clustering. In: Advances in Neural Information Processing Systems, pp. 1601–1608 (2004)
Von Luxburg, U.: A tutorial on spectral clustering. Statistics and Computing 17(4), 395–416 (2007)
Wauthier, F.L., Jojic, N., Jordan, M.I.: Active spectral clustering via iterative uncertainty reduction. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1339–1347. ACM (2012)
Ochs, P., Brox, T.: Higher order motion models and spectral clustering. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 614–621. IEEE (2012)
Pagès, G., Wilbertz, B.: Intrinsic stationarity for vector quantization: Foundation of dual quantization. SIAM Journal on Numerical Analysis 50(2), 747–780 (2012)
Kästner, M., Hammer, B., Biehl, M., et al.: Functional relevance learning in generalized learning vector quantization. Neurocomputing 90, 85–95 (2012)
Dasgupta, S., Freund, Y.: Random projection trees for vector quantization. IEEE Transactions on Information Theory 55(7), 3229–3242 (2009)
Qian, P., Chung, F.L., Wang, S., et al.: Fast Graph-Based Relaxed Clustering for Large Data Sets Using Minimal Enclosing Ball. IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 42(3), 672–687 (2012)
Deng, Z., Choi, K.S., Chung, F.L., et al.: Enhanced soft subspace clustering integrating within-cluster and between-cluster information. Pattern Recognition 43(3), 767–781 (2010)
Wang, S., Chung, K.F.-L., Deng, Z., Hu, D., Wu, X.: Robust maximum entropy clustering algorithm with its labeling for outliers. Soft Comput. 10(7), 555–563 (2006)
Wang, S., Chung, K.F.-L., Deng, Z., Hu, D.: Robust fuzzy clustering neural network based on epsilon-insensitive loss function. Appl. Soft Comput. 7(2), 577–584 (2007)
Lee, C.H., Zaïane, O.R., Park, H.H., et al.: Clustering high dimensional data: A graph-based relaxed optimization approach. Information Sciences 178(23), 4501–4511 (2008)
Cao, J., Wu, Z., Wu, J., et al.: SAIL: Summation-bAsed Incremental Learning for Information-Theoretic Text Clustering (2013)
Cao, J., Wu, Z., Wu, J., et al.: Towards information-theoretic K-means clustering for image indexing. Signal Processing (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, X., Zheng, X., Qin, F., Zhao, B. (2013). A Fast Spectral Clustering Method Based on Growing Vector Quantization for Large Data Sets. In: Motoda, H., Wu, Z., Cao, L., Zaiane, O., Yao, M., Wang, W. (eds) Advanced Data Mining and Applications. ADMA 2013. Lecture Notes in Computer Science(), vol 8347. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-53917-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-53917-6_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53916-9
Online ISBN: 978-3-642-53917-6
eBook Packages: Computer ScienceComputer Science (R0)