Skip to main content

Advertisement

Log in

Non-negative matrix factorization for semi-supervised data clustering

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Traditional clustering algorithms are inapplicable to many real-world problems where limited knowledge from domain experts is available. Incorporating the domain knowledge can guide a clustering algorithm, consequently improving the quality of clustering. In this paper, we propose SS-NMF: a semi-supervised non-negative matrix factorization framework for data clustering. In SS-NMF, users are able to provide supervision for clustering in terms of pairwise constraints on a few data objects specifying whether they “must” or “cannot” be clustered together. Through an iterative algorithm, we perform symmetric tri-factorization of the data similarity matrix to infer the clusters. Theoretically, we show the correctness and convergence of SS-NMF. Moveover, we show that SS-NMF provides a general framework for semi-supervised clustering. Existing approaches can be considered as special cases of it. Through extensive experiments conducted on publicly available datasets, we demonstrate the superior performance of SS-NMF for clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
€32.70 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bansal N, Blum A, Chawla S (2002) Correlation clustering. Proceedings of the 43rd symposium on foundations of computer science, pp 238–247

  • Bar-Hillel A, Hertz T, Shental N, Weinshall D (2003) Learning distance functions using equivalence relations. Proceedings of the 20th international conference on machine learning, pp 11–18

  • Basu S, Banerjee A, Mooney RJ (2002) Semi-supervised clustering by seeding. Proceedings of the 19th international conference on machine learning, pp 27–34

  • Basu S, Bilenko M, Mooney RJ (2004) A probabilistic framework for semi-supervised clustering. Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 59–68

  • Blum A, Mitchell TM (1998) Combining labeled and unlabeled data with co-training. Annual workshop on computational learning theory, Proceedings of the 11th annual conference on Computational learning theory, pp 92–100

  • Boley D (1998). Principal direction divisive partitioning. Data Mining Knowledge Discovery 2(4): 325–344

    Article  Google Scholar 

  • Charikar M, Guruswami V, Wirth A (2003) Clustering with qualitative information. Proceedings of the 44th IEEE symposium on foundations of computer science, pp 524–533

  • Chung FRK (1997) Spectral Graph Theory, American Mathematical Society

  • Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J R Stat Soc B pp 1–38

  • Ding C, He X, Simon HD (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. Proceedings of SIAM international conference on data mining, pp 606–610

  • Ding C, Li T, Peng W, Park H (2006) Orthogonal nonnegative matrix tri-factorizations for clustering. Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining, ACM Press, New York, pp 126–135

  • Dubnov S, EI-Yaniv R, Gdalyahu Y, Schneidman E, Tishby N and Yona G (2002). A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles. Machine Learning 47(1): 35–61

    Article  MATH  Google Scholar 

  • Duda RO, Hart PE and Stork DG (2000). Pattern Classification. Wiley, New York

    Google Scholar 

  • Fiedler M (1973). Algebraic connectivity of graphs. Czechoslovak Math J 23: 298–305

    MathSciNet  Google Scholar 

  • Fiedler M (1975a). Eigenvectors of acyclic matrices. Czechoslovak Math J 25: 607–618

    MathSciNet  Google Scholar 

  • Fiedler M (1975b). A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory. Czechoslovak Math J 25: 619–633

    MathSciNet  Google Scholar 

  • Fisher D (1987). Knowledge acquisition via incremental conceptual clustering. Machine Learning 2: 139–172

    Google Scholar 

  • Godbole S, Harpale A, Sarawagi S, Chakrabarti S (2004) Document classification through interactive supervision of document and term labels. Proceedings of the 8th European conference on principles and practice of knowledge discovery in databases, pp 185–196

  • Gondek D and Hofmann T (2007). Non-redundant data clustering. Knowledge Information Systems 12(1): 1–24

    Article  Google Scholar 

  • Hagen L and Kahng AB (1992). New spectral methods for ratio cut partitioning and clustering. IEEE Trans CAD Integrated Circuits Systems 11(9): 1074–1085

    Article  Google Scholar 

  • Han E-H, Karypis G (2000) Centroid-based document classification: Analysis and experimental results. Proceedings of the 4th European conference on principles of data mining and knowledge discovery, pp 424–431

  • Hersh W, Buckley C, Leone T, Hickam D (1994) Ohsumed: an interactive retrival evaluation and new large test collection for research. Proceedings of 17th ACM SIGIR conference on research and development in information retrieval, pp 192–201

  • Hinneburg A and Keim D (2003). A general approach to clustering in large databases with noise. Knowledge Information Systems 5(4): 387–415

    Article  Google Scholar 

  • Hinneburg A, Keim DA (1998) An efficient approach to clustering in large multimedia databases with noise. Proceedings of the 4th international conference on knowledge discovery and data mining, pp 58–65

  • Hotho A, Staab S, Stumme G (2003) Text clustering based on background knowledge, Technical report 425. University of Karlsruhe, Institute AIFB, Karlsruhe

  • Huang Y, Mitchell TM (2006) Text clustering with extended user feedback. Proceedings of the 29th ACM SIGIR conference on research and development in information retrieval, pp 413–420

  • Jain AK, Murty MN and Flynn PJ (1999). Data clustering: a review. ACM computing surveys 31(3): 264–323

    Article  Google Scholar 

  • Ji X, Xu W (2006) Document clustering with prior knowledge. Proceedings of the 29th ACM SIGIR conference on research and development in information retrieval, pp 405–412

  • Joachims T (1999) Transductive inference for text classification using support vector machines. Proceedings of the 16th international conference on machine learning, pp 200–209

  • Jones R, McCallum A, Nigam K, Riloff E (1999) Bootstrapping for text learning tasks. Workshop on text mining: foundations, techniques and applications, proceedings of international joint conference on artifical intelligence, pp 52–63

  • Kamvar SD, Klein D, Manning CD (2003) Spectral learning. Proceedings of the 18th international joint conference on artificial intelligence, pp 561–566

  • Kaufman L and Rousseeuw P (1990). Finding Groups in Data: an introduction to cluster analysis. Wiley, New York

    Google Scholar 

  • Kim H and Lee S (2004). An intelligent information system for organizing online text documents. Knowledge Information Systems 6(2): 125–149

    Google Scholar 

  • Klein D, Kamvar S, Manning C (2002) From instance-level constraints to space-level constraints: Making the most of prior knowledge in data clustering. Proceedings of the 19th international conference on machine learning, pp 307–314

  • Koga H, Ishibashi T and Watanabe T (2007). Fast agglomerative hierarchical clustering algorithm using locality-sensitive hashing. Knowledge Information Systems 12(1): 25–53

    Article  Google Scholar 

  • Kulis B, Basu S, Dhillon I, Mooney R (2005) Semi-supervised graph clustering: a kernel approach. Proceedings of the 22nd international conference on machine learning, pp 457–464

  • Lee DD and Seung HS (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401: 788–791

    Article  Google Scholar 

  • Lee D, Seung H (2001) Algorithms for non-negative matrix factorization. Proceedings of annual conference on neural information processing systems 13, pp 556–562

  • Lewis DD (1999) Reuters-21578 text categorization test collection distribution 1.0, http://www.research.att/lewis

  • Li T, Ding C (2006) The relationships among various nonnegative matrix factorization methods for clustering. Proceedings of the 6th IEEE international conference on data mining, pp 362–371

  • Liu B, Li X, Lee WS, Yu PS (2004) Text classification by labeling words. Proceedings of AAAI conference on artificial intelligence, pp 425–430

  • Long B, Zhang Z, Yu PS (2005) Co-clustering by block value decomposition. Proceedings of the 11th ACM SIGKDD international conference on knowledge discovery and data mining, pp 635–640

  • Long B, Zhang Z, Yu PS (2007) Relational clustering by symmetric convex coding. Proceedings the 24th annual international conference on machine learning, pp 569–576

  • MacQueen J (1967) Some methods for classsification and analysis of multivariate observations. Proceedings of 5th Berkeley symposium on mathematical statistics and probability, pp 281–297

  • Newman C, Hettich S, Merz C (1998) UCI repository of machine learning databases

  • Nigam K, McCallum AK, Thrun S, Mitchell TM (1998) Learning to classify text from labeled and unlabeled documents. Proceedings of AAAI conference on artificial intelligence, pp 792–799

  • Raghavan H, Madani O, Jones R (2005) Interactive feature selection. Proceedings of international joint conference on artificial intelligence, pp 841–846

  • Sheikholesami G, Chatterjee S, Zhang A (1998) Wavecluster: a multi-resolution clustering approach for very large spatial databases. Proceedings of the international conference on very large databases, pp 428–439

  • Shi J and Malik J (2000). Normalized cuts and image segmentation. IEEE Trans on PAMI 22(8): 888–905

    Google Scholar 

  • TREC (n.d.) Text retrieval conference, http://trec.nist.gov

  • Wagstaff K, Cardie C, Rogers S, Schroedl S (2001) Constrained k-means clustering with background knowledge. Proceedings of the 18th international conference on machine learning, pp 577–584

  • Xing EP, Ng AY, Jordan M, Russell S (2002) Distance metric learning, with application to clustering with side-information. Advances in neural information processing systems 15, pp 502–512

  • Xu W, Liu X, Gong Y (2003) Document clustering based on non-negative matrix factorization. Proceedings of the 26th ACM SIGIR conference on research and development in information retrieval, pp 267–273

  • Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. Proceedings of the 14th international conference on machine learning, pp 412–420

  • Zhang T, Ramakrishnan R, Livny M (1996) Birch: An efficient data clustering method for very large databases. Proceedings of the ACM SIGMOD international conference on management of data, pp 103–114

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ming Dong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, Y., Rege, M., Dong, M. et al. Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst 17, 355–379 (2008). https://doi.org/10.1007/s10115-008-0134-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-008-0134-6

Keywords

Navigation