Abstract
This paper analyzes the main steps of spectral co-clustering documents and words, finds out its cause of sensitivity to input order, and presents a modified method of spectral co-clustering documents and words based on fuzzy K-harmonic means. This method consists of two steps. The first step constructs Laplacian matrix which is insensitive to input order. The second step exploits fuzzy K-harmonic means algorithm instead of K-means algorithm to obtain clustering results. Fuzzy K-harmonic means algorithm uses fuzzy weight distance while calculating the distance between each data points and cluster centers. The experiments show that the proposed method not only is insensitive to input order, but also can improve the accuracy and robustness of clustering results.








Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Luxburg UV (2007) A tutorial on spectral clustering. Stat Comput 17(4):395–416
Ng AY, Jordan MI, Weiss Y(2002) On spectral clustering: analysis and an algorithm. In: Proceedings of the conference on advances in neural information processing systems. Mas-sachusetts, pp 849–856
Tian Z, Li XB, Ju YW (2007) Spectral clustering based on matrix perturbation theory. Sci China Ser F Inf Sci 50(1):63–81
Donath WE, Hoffman AJ (1973) Lower bound for the partitioning of graphs. IBM J Res Dev 17:420–425
Fiedler M (1973) Algebraic connectivity of graphs. Czechoslovak Math J 23(2):298–305
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Ng AY, Jordan ML, Weiss Y (2001) On spectral clustering: analysis and an algorithm. In: Advances in neural information processing systems, vol 14, pp 849–856
Prieto R, Jiang J, Choi CH (2003) A new spectral clustering algorithm for large training sets. In: International conference on machine learning and cybernetics, China, pp 147–152
Fern XZ, Brodley CE (2004) Solving cluster ensemble problems by bipartite graph partitioning. In: Proceedings of international conference on machine learning, vol 21, New York, pp 36–44
Sanguinetti G, Laidler J, Lawrence N (2005) Automatic Determination of the number of clusters using spectral algorithms. In: Proceedings of IEEE machine learning for signal processing, USA, pp 28–30
Fowlkes C, Belongie S, Chung F (2007) Spectral grouping using the Nystrom method. IEEE Trans Pattern Anal Mach Intell 26(2):217–225
Xu S, Lu ZM, Gu GC (2009) Two spectral algorithms for ensembling document clusters. Acta Autom Sin 35(7):997–1002
Yeung DS, Wang X (2002) Improving performance of similarity-based clustering by feature weight learning. IEEE Trans Pattern Anal Mach Intell 24(4):556–561
Wang XZ, Wang YD, Wang LJ (2004) Improving fuzzy c-means clustering based on feature-weight learning. Pattern Recognit Lett 25(10):1123–1132
Wang XZ, Dong CR, Fan TG (2007) Training T-S Norm neural networks to refine weights for fuzzy if-then rules. Neurocomputing 70(13–15):2581–2587
Xing HJ, Hu BG (2008) An adaptive fuzzy c-means clustering-based mixtures of experts model for unlabeled data classification. Neurocomputing 71(4–6):1008–1021
Wang XZ, Dong CR (2009) Improving generalization of fuzzy if-then rules by maximizing fuzzy entropy. IEEE Trans Fuzzy Syst 17(3):556–567
Liang J, Song W Clustering based on Steiner points. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0047-7
Graaff AJ, Engelbrecht AP Clustering data in stationary environments with a local network neighborhood artificial immune system. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0041-0
Guo G, Chen S, Chen L Soft subspace clustering with an improved feature weight self-adjustment mechanism. Int J Mach Learn Cyber. doi:10.1007/s13042-011-0038-8
Wang XZ, He YL, Dong LC, Zhao HY (2011) Particle swarm optimization for determining fuzzy measures from data. Inf Sci 181(19):4230–4252
Kluger Y, Basri R, Chang JT, Gerstein M (2003) Spectral biclustering of microarray data: coclustering genes and conditions. Genome Res 13(4):703–716
Guan J, Qiu G (2005) Spectral images and features co-clustering with application to content-based image retrieval. In: 7th IEEE workshop on multimedia signal processing, Shanghai, pp 1–4
Wieling M, Nerbonne J (2009) Bipartite spectral graph partitioning to co-cluster varieties and sound correspondences in dialectology. In: Proceedings of the 2009 workshop on graph-based methods for natural language processing, Singapore, pp 14–22
Xu G, Zong Y, Dolog P, Zhang Y (2010) Co-clustering analysis of weblogs using bipartite spectral projection approach. In: Proceedings of 14th KES, Cardiff, pp 398–407
Green NS (2010) Evolutionary spectral co-clustering. Dissertation, Rochester Institute of Technology
Dhillon I (2001) Co-clustering documents and words using bipartite spectral graph partitioning. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining, San Francisco, pp 269–274
Zhang B, Hsu M, Dayal U (1999) K-harmonic means-a data clustering algorithm. http://www.hpl.hp.com/techreports/1999/HPL-1999-124.pdf
Zhang B (2000) Generalized K-harmonic means-boosting in unsupervised learning. http://www.hpl.hp.com/techreports/2000/HPL-2000-137.html
Strehl A, Ghosh J (2002) Cluster ensembles-a knowledge reuse framework for combining partitionings. J Mach Learn Res 3:583–617
Acknowledgment
The author would like to express thanks to the anonymous reviewers for their insightful comments that helped improve this paper. This work is supported by National Natural Science Funds (No. 61175053,61073133), Innovative Team and Key Scientific Research Projects of Ministry of Education (No. 2011ZD010).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Liu, N., Chen, F. & Lu, M. Spectral co-clustering documents and words using fuzzy K-harmonic means. Int. J. Mach. Learn. & Cyber. 4, 75–83 (2013). https://doi.org/10.1007/s13042-012-0077-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-012-0077-9