Abstract
Clustering ensemble can be regarded as a mathematical optimization problem, and the genetic algorithm has been widely used as a powerful tool for solving such optimization problems. However, the existing research on clustering ensemble based on the genetic algorithm model has mainly focused on unsupervised approaches and has been limited by parameters like crossover probability and mutation probability. This paper presents a semi-supervised clustering ensemble based on the genetic algorithm model. This approach utilizes pairwise constraint information to strengthen the crossover process and mutation process, resulting in enhanced overall algorithm performance. To validate the effectiveness of the proposed approach, extensive comparative experiments were conducted on 9 diverse datasets. The results of the experiments demonstrate the superiority of the proposed algorithm in terms of clustering accuracy and robustness. In summary, this paper introduces a novel semi-supervised approach based on the genetic algorithm model. The utilization of pair-wise constraint information enhances the algorithm’s performance, making it a promising solution for real-world clustering problems.








Similar content being viewed by others
References
Yu SX, Shi J (2003) Multiclass spectral clustering. In: IEEE international conference on computer vision
Jain AK (2010) Data clustering: 50 years beyond K-means[J]. Pattern Recogn Lett 31(8):651–666
Strehl A, Ghosh, J (2003) Cluster ensembles – a knowledge reuse framework for combining multiple partitions. [J]. J Mach Learn Res (3):583–617
Li FJ, Qian YH, Wang JT, Liang JY (2015) Multigranulation information fusion: A dempster-shafer evidence theory based clustering ensemble method. In: International conference on machine learning cybernetics
Alexander Topchy WP, Jain AK (2004) A mixture model of clustering ensembles
Minaei-Bidgoli B, Parvin H, Alinejad-Rokny H, Alizadeh H, Punch WF (2014) 2.02. effects of resampling method and adaptation on clustering ensemble efficacy. Artif Intell Rev 41(1)
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature
Zhang X, Jiao L, Liu F, Bo L, Gong M (2008) Spectral clustering ensemble applied to sar image segmentation. Geosci Remote Sens IEEE Tran 46(7):2126–2136
Vega-Pons S, Ruiz-Shulcloper, (2011) A survey of clustering ensemble algorithms. Int J Pattern Recognit Artif Intell 25(3):337–372
Abdala DD, Jiang X (2014) Sopd – a new consensus function for the ensemble clustering problem. In: Chilean computer science society (SCCC), 2012 31st International Conference of the
Ching-Shih Deb K, Pratap A, Agarwal S, Meyarivan T (2002) a fast and elitist multiobjective genetic algorithm: Nsga-ii. IEEE Trans Evol Comput 6(2):182–197
Azadeh A, Saberi M, Anvari M, Mohamadi M (2011) An integrated artificial neural network-genetic algorithm clustering ensemble for performance assessment of decision making units. J Intell Manuf 22(2):229–245
Singh V, Mukherjee L, Peng J, Xu J (2010) Ensemble clustering using semidefinite programming with applications. Mach Learn 79(1–2):177–200
Fred A, Jain AK (2005) Combining multiple clusterings using evidence accumulation. IEEE Trans Pattern Anal Mach Intell 27(6):835–850
Abedallah L, Shimshoni I (2012) k nearest neighbor using ensemble clustering. In: Proceedings of the 14th international conference on data warehousing and knowledge discovery
Karypis G, Kumar V (2010) Metis - unstructured graph partitioning and sparse matrix ordering system, version 2.0. technical report. Appl Phys Lett 97(12), id. 124101 (3 pages)
He S, Huang J, He X (2020) Collective neurodynamic optimization for image segmentation by binary model with constraints. Cogn Comput 12(6):1–11
Shanmugam K, Haralick RM (2010) A computationally simple procedure for imagery data compression by the karhunen-love method. IEEE Trans Syst Man Cybern SMC- 3(2):202–204
Jenssen R (2013) Entropy-relevant dimensions in the kernel feature space: Cluster-capturing dimensionality reduction. Signal Process Mag IEEE 30(4):30–39
Otar BC, Akyuz S (2017) Ensemble clustering selection by optimization of accuracy-diversity trade off. In: Signal processing and communications applications conference, pp 1–4
Paledi U, Allahkarami E, Rezai B, Aslani MR (2021) Selectivity index and separation efficiency prediction in industrial magnetic separation process using a hybrid neural genetic algorithm. SN Appl Sci 3(3)
Yu Z, Luo P, You J, Wong HS, Leung H, Wu S, Zhang J, Han G (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714
Yang F, Li T, Zhou Q, Xiao H (2017) Cluster ensemble selection with constraints. Neurocomputing 235(APR.26):59–70
Wagstaff K, Cardie C, Rogers S, Schrdl S (2001) Constrained k-means clustering with background knowledge. In: Eighteenth international conference on machine learning
Yang W, Zhang Y, Wang H, Deng P, Li, (2021) Hybrid genetic model for clustering ensemble. Knowl-Based Syst 231(4):107457
Lui KC, Fung YH, Chan YH (2020) Predictive study of ultra-low emissions from dual-fuel engine using artificial neural networks combined with genetic algorithm. Adv Mat Res 105047
Bache K, Lichman M (2013) Uci machine learning repository
Tao M, Hua XS, Wei L, Yang L, Yuan L (2007) Msra-ustc-sjtu at trecvid 2007: High-level feature extraction and search. In: Trecvid workshop participants notebook papers
Huang D, Wang CD, Lai JH, Kwoh CK (2021) Matlab source code for multi-diversified ensemble clustering (mdec) (ieee tcyb 2021)
Cao S (2016) Deep neural networks for learning graph representations. In: Thirtieth Aaai conference on artificial intelligence
Huang D, Wang CD, Wu JS, Lai JH, Kwoh CK (2020) Matlab source code for ultra-scalable spectral clustering and ensemble clustering (u-spec and u-senc) (tkde 2020)
Zhang Ding SL, Yang Y (2022) Weighted semi-supervised clustering ensemble algorithm based on extended constraint projection. J Nanjing Univ 58(4):570
Leung H, You J, Wong HS, Si Z, Luo P (2016) Incremental semi-supervised clustering ensemble for high dimensional data clustering. IEEE Trans Knowl Data Eng 28(3):701–714
Yan S, Wang H, Li T, Chu J, Guo J (2020) Semi-supervised density peaks clustering based on constraint projection. Int J Comput Intell Syst
Hu J, Li T, Luo C, Fujita H, Yang Y (2017) Incremental fuzzy cluster ensemble learning based on rough set theory. Knowl-Based Syst 132(sep.15):144–15
Yang Z, Oja E (2010) Linear and nonlinear projective nonnegative matrix factorization. IEEE Trans Neural Netw 21(5):734–749
Acknowledgements
This work was supported by the National Natural Science Foundation of China (11961010, 61967004).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare no competing interests.
Ethical Approval
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Bi, S., Li, X. Semi-supervised clustering ensemble based on genetic algorithm model. Multimed Tools Appl 83, 55851–55865 (2024). https://doi.org/10.1007/s11042-023-17662-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17662-2