Abstract
Clustering for evolving data stream demands that the algorithm should be capable of adapting the discovered clustering model to the changes in data characteristics.
In this paper we propose an algorithm for exclusive and complete clustering of data streams. We explain the concept of completeness of a stream clustering algorithm and show that the proposed algorithm guarantees detection of cluster if one exists. The algorithm has an on-line component with constant order time complexity and hence delivers predictable performance for stream processing. The algorithm is capable of detecting outliers and change in data distribution. Clustering is done by growing dense regions in the data space, honouring recency constraint. The algorithm delivers complete description of clusters facilitating semantic interpretation.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Aggarwal, C.C., Han, J., Yu, P.S: A Framework for Clustering Evolving Data Streams. In: VLDB conference, pp. 81–92 (2003)
Aggarwal, C.C., Han, J., Yu, P.S: Framework for Projected Clustering of High Dimensional Data Streams. In: VLDB conference. Canada, pp. 852–863 (2004)
Barbára, D.: Requirements of Clustering Data Streams. SIGKDD 3, 23–27 (2002)
Cao, F., Ester, M., Qian, W., Zhou, A.: Density-Based Clustering over an Evolving Data Stream with Noise. In: SIAM, pp. 326–337 (2006)
Dong, G., Han, J., Lakshmanan, L.V.S., et al.: Online Mining of Changes from Data Streams: Research Problems and Preliminary Results. ACM SIGMOD (2003)
Orlowska, M.E., Sun, X., Li, X.: Can Exclusive Clustering on Streaming Data be Achieved? SIGKDD 8, 102–108 (2006)
Maimon, O., et al.: Data Mining and Knowledge Discovery Handbook. Springer, Heidelberg (2004)
Lu, Y., Sun, Y., Xu, G., Liu, G.: A Grid-Based Clustering Algorithm for High-dimensional Data Streams. ADMA. China (2005)
Agrawal, R., et al.: Automatic Subspace Clustering of High Dimensional data for Data Mining application. In: ACM SIGMOD (1998)
KDD CUP 1999 Intrusion Data: http://kdd.ics.uci.edu//databases/kddcup99/kddcup99.html
University of California at Irvine: UCI Machine Learning Repository, http://www.ics.uci.edu/~mlearn/MLSummary
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bhatnagar, V., Kaur, S. (2007). Exclusive and Complete Clustering of Streams. In: Wagner, R., Revell, N., Pernul, G. (eds) Database and Expert Systems Applications. DEXA 2007. Lecture Notes in Computer Science, vol 4653. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74469-6_61
Download citation
DOI: https://doi.org/10.1007/978-3-540-74469-6_61
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74467-2
Online ISBN: 978-3-540-74469-6
eBook Packages: Computer ScienceComputer Science (R0)