skip to main content
10.1145/1281192.1281262acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

A spectral clustering approach to optimally combining numericalvectors with a modular network

Published: 12 August 2007 Publication History

Abstract

We address the issue of clustering numerical vectors with a network. The problem setting is basically equivalent to constrained clustering by Wagstaff and Cardie and semi-supervised clustering by Basu et al., but our focus is more on the optimal combination of two heterogeneous data sources. An application of this setting is web pages which can be numerically vectorized by their contents, e.g. term frequencies, and which are hyperlinked to each other, showing a network. Another typical application is genes whose behavior can be numerically measured and a gene network can be given from another data source.We first define a new graph clustering measure which we call normalized network modularity, by balancing the cluster size of the original modularity. We then propose a new clustering method which integrates the cost of clustering numerical vectors with the cost of maximizing the normalized network modularity into a spectral relaxation problem. Our learning algorithm is based on spectral clustering which makes our issue an eigenvalue problem and uses k-means for final cluster assignments. A significant advantage of our method is that we can optimize the weight parameter for balancing the two costs from the given data by choosing the minimum total cost. We evaluated the performance of our proposed method using a variety of datasets including synthetic data as well as real-world data from molecular biology. Experimental results showed that our method is effective enough to have good results for clustering by numerical vectors and a network.

References

[1]
A.-L. Barabási and A. Reka. Emergence of scaling in random networks. Science, 286: 509--512, 1999.
[2]
S. Basu, M. Bilenko, and R. J. Mooney. A probabilistic framework for semi-supervised clustering. In KDD, pages 59--68, August 2004.
[3]
I. S. Dhillon, Y. Guan, and B. Kulis. Kernel k-means, spectral clustering and normalized cuts. In KDD, pages 551--556, 2004.
[4]
I. S. Dhillon and S. Sra. Modeling data using directional distributions. Technical Report TR--06--03, University of Texas, Dept. of Computer Sciences, 2003.
[5]
R. Edgar, M. Domrachev, and A. E. Lash. Gene expression omnibus: {NCBI gene expression and hybridization array data repository. NAR, 30(1): 207--210, 2002.
[6]
R. Guimera and L. A. Nunes Amaral. Functional cartography of complex metabolic networks. Nature, 433(7028): 895--900, 2005.
[7]
R. Guimera, M. Sales-Pardo, and L. A. N. Amaral. Modularity from fluctuations in random graphs and complex networks. Phys. Rev. E, 70: 025101, 2004.
[8]
L. Hagen and A. B. Kahng. New spectral methods for ratio cut partitioning and clustering. IEEE TCAD, 11: 1074--1085, 1992.
[9]
T. R. Hughes et al. Functional discovery via a compendium of expression profiles. Cell, 102(1): 109--126, 2000.
[10]
M. Kanehisa et al. From genomics to chemical genomics: new developments in KEGG. NAR, 34: D354--357, 2006.
[11]
B. Kulis, S. Basu, I. Dhillon, and R. J. Mooney. Semi-supervised graph clustering: A kernel approach. In ICML, pages 457--464, 2005.
[12]
K. V. Mardia and P. E. Jupp. Directional Statistics. John Wiley & Sons, second edition, 2000.
[13]
M. E. J. Newman and M. Girvan. Finding and evaluating community structure in networks. Phys. Rev. E, 69: 026113, 2004.
[14]
E. Ravasz et al. Hierarchical organization of modularity in metabolic networks. Science, 297(5589): 1551--1555, 2002.
[15]
J. Shi and J. Malik. Normalized cuts and image segmentation. IEEE PAMI, 22(8): 888--905, 2000.
[16]
M. Shiga, I. Takigawa and H. Mamitsuka. Annotating gene function by combining expression data with a modular gene network. To appear in ISMB, 2007.
[17]
C. Song, S. Havlin, and H. A. Makse. Self-similarity of complex networks. Nature, 433: 392--395, 2005.
[18]
A. Strehl and J. Ghosh. Relationship-based clustering and visualization for high-dimensional data mining. INFORMS Journal on Computing, 15(2):208--230, 2003.
[19]
O. Troyanskaya et al. Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6):520--525, 2001.
[20]
K. Wagstaff and C. Cardie. Clustering with instance-level constraints. In ICML, pages 1103--1110, 2000.
[21]
D. J. Watts and S. H. Strogatz. Collective dynamics of 'small-world' networks. Nature, 393: 440--442, 1998.
[22]
S. White and P. Smyth. A spectral clustering approach to finding communities in graphs. In SDM, pages 76--84, 2005.
[23]
L. F. Wu et al. Large-scale prediction of saccharomyces cerevisiae gene function using overlapping transcriptional clusters. Nat. Genet., 31(3):255--265, 2002.
[24]
S. Zhong and J. Ghosh. A unified framework for model--based clustering. JMLR, 4:1001--1037, 2003.
[25]
S. Zhong and J. Ghosh. Generative model-based document clustering: A comparative study. KAIS, 8(3):374--384, 2005.
[26]
X. Zhou, M. C. Kao, and W. H. Wong. Transitive functional annotation by shortest-path analysis of gene expression data. PNAS, 99(20):12783--12788, 2002.

Cited By

View all
  • (2021)Community detection in dynamic networks: a comprehensive and comparative review using external and internal criteriaInternational Journal of System Assurance Engineering and Management10.1007/s13198-020-01048-w12:2(217-230)Online publication date: 8-Jan-2021
  • (2020)A Feasible Community Detection Algorithm for Multilayer NetworksSymmetry10.3390/sym1202022312:2(223)Online publication date: 2-Feb-2020
  • (2020)Finding Best Matching Community for Common Nodes in Mobile Social NetworksWireless Personal Communications10.1007/s11277-020-07508-7Online publication date: 10-Jun-2020
  • Show More Cited By

Index Terms

  1. A spectral clustering approach to optimally combining numericalvectors with a modular network

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2007
      1080 pages
      ISBN:9781595936097
      DOI:10.1145/1281192
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2007

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. eigenvalue problem
      2. heterogeneous data sources
      3. k-means
      4. network modularity
      5. spectral clustering

      Qualifiers

      • Article

      Conference

      KDD07

      Acceptance Rates

      KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;
      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '25

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)17
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 18 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2021)Community detection in dynamic networks: a comprehensive and comparative review using external and internal criteriaInternational Journal of System Assurance Engineering and Management10.1007/s13198-020-01048-w12:2(217-230)Online publication date: 8-Jan-2021
      • (2020)A Feasible Community Detection Algorithm for Multilayer NetworksSymmetry10.3390/sym1202022312:2(223)Online publication date: 2-Feb-2020
      • (2020)Finding Best Matching Community for Common Nodes in Mobile Social NetworksWireless Personal Communications10.1007/s11277-020-07508-7Online publication date: 10-Jun-2020
      • (2018)Micro-blog user community discovery using generalized SimRank edge weighting methodPLOS ONE10.1371/journal.pone.019644713:5(e0196447)Online publication date: 7-May-2018
      • (2018)Tibetan Weibo User Group Division Based on User Behaviors for Analyzing Health ProblemsIEEE Access10.1109/ACCESS.2018.28227676(19441-19450)Online publication date: 2018
      • (2017)On Spectral Analysis of Signed and Dispute Graphs: Application to Community StructureIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.268480929:7(1480-1493)Online publication date: 1-Jul-2017
      • (2017)AnySCAN: An Efficient Anytime Framework with Active Learning for Large-Scale Network Clustering2017 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM.2017.76(665-674)Online publication date: Nov-2017
      • (2017)MiMAGKnowledge and Information Systems10.1007/s10115-016-0949-550:2(417-446)Online publication date: 1-Feb-2017
      • (2017)Attributed Graph Clustering with Unimodal Normalized CutMachine Learning and Knowledge Discovery in Databases10.1007/978-3-319-71249-9_36(601-616)Online publication date: 30-Dec-2017
      • (2016)A novel probabilistic clustering model for heterogeneous networksMachine Language10.1007/s10994-016-5544-1104:1(1-24)Online publication date: 1-Jul-2016
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media