Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

Manthey, Bodo; Röglin, Heiko

doi:10.1007/978-3-642-10631-6_103

Bodo Manthey¹⁹ &
Heiko Röglin²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5878))

Included in the following conference series:

International Symposium on Algorithms and Computation

1914 Accesses

Abstract

The k-means algorithm is the method of choice for clustering large-scale data sets and it performs exceedingly well in practice. Most of the theoretical work is restricted to the case that squared Euclidean distances are used as similarity measure. In many applications, however, data is to be clustered with respect to other measures like, e.g., relative entropy, which is commonly used to cluster web pages. In this paper, we analyze the running-time of the k-means method for Bregman divergences, a very general class of similarity measures including squared Euclidean distances and relative entropy. We show that the exponential lower bound known for the Euclidean case carries over to almost every Bregman divergence. To narrow the gap between theory and practice, we also study k-means in the semi-random input model of smoothed analysis. For the case that n data points in ℝ^d are perturbed by noise with standard deviation σ, we show that for almost arbitrary Bregman divergences the expected running-time is bounded by \({\rm poly}(n^{\sqrt k}, 1/\sigma)\) and k ^kd ·poly(n, 1/σ).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

A Quality Metric for K-Means Clustering Based on Centroid Locations

An Improved Bregman k-means++ Algorithm via Local Search

References

Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for Bregman divergences. In: Proc. of the 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 1088–1097 (2009)
Google Scholar
Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and non-metric distance measures. In: Proc. of the 19th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 799–808 (2008)
Google Scholar
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proc. of the 50th Ann. IEEE Symp. on Found. of Computer Science, FOCS (to appear, 2009)
Google Scholar
Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. SIAM Journal on Computing 39(2), 766–782 (2009)
Article MathSciNet Google Scholar
Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. Journal of Machine Learning Research 6, 1705–1749 (2005)
MathSciNet Google Scholar
Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, USA (2002)
Google Scholar
Dhillon, I.S., Mallela, S., Kumar, R.: A divisive information-theoretic feature clustering algorithm for text classification. Journal of Machine Learning Research 3, 1265–1287 (2003)
Article MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. John Wiley & Sons, Chichester (2000)
Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. II. John Wiley & Sons, Chichester (1971)
MATH Google Scholar
Gray, R.M., Buzo, A., Gray Jr., A.H., Matsuyama, Y.: Distortion measures for speech processing. IEEE Transactions on Acoustics, Speech, and Signal Processing 28(4), 367–376 (1980)
Article MATH Google Scholar
Inaba, M., Katoh, N., Imai, H.: Variance-based k-clustering algorithms by Voronoi diagrams and randomization. IEICE Transactions on Information and Systems E83-D(6), 1199–1206 (2000)
Google Scholar
Lloyd, S.P.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28(2), 129–137 (1982)
Article MATH MathSciNet Google Scholar
Manthey, B., Röglin, H.: Improved smoothed analysis of the k-means method. In: Proc. of the 20th ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 461–470 (2009)
Google Scholar
Spielman, D.A., Teng, S.-H.: Smoothed analysis of algorithms: Why the simplex algorithm usually takes polynomial time. Journal of the ACM 51(3), 385–463 (2004)
Article MathSciNet Google Scholar
Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proc. of the 25th ACM Symp. on Computational Geometry (SoCG), pp. 324–332 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Mathematics, University of Twente,
Bodo Manthey
Department of Quantitative Economics, Maastricht University,
Heiko Röglin

Authors

Bodo Manthey
View author publications
You can also search for this author in PubMed Google Scholar
Heiko Röglin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. of Electrical and Computer Engineering, University of Hawaii, Holmes Hall 483, 2540 Dole Street, 96822, Honolulu, HI, USA
Yingfei Dong
Department of Computer Science, University of Texas at Dallas, 75083, Dallas, TX, USA
Ding-Zhu Du
Department of Computer Science, University of California, 93106, Santa Barbara, CA, USA
Oscar Ibarra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Manthey, B., Röglin, H. (2009). Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences. In: Dong, Y., Du, DZ., Ibarra, O. (eds) Algorithms and Computation. ISAAC 2009. Lecture Notes in Computer Science, vol 5878. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10631-6_103

Download citation

DOI: https://doi.org/10.1007/978-3-642-10631-6_103
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10630-9
Online ISBN: 978-3-642-10631-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

Abstract

Access this chapter

Preview

Similar content being viewed by others

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

A Quality Metric for K-Means Clustering Based on Centroid Locations

An Improved Bregman k-means++ Algorithm via Local Search

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Worst-Case and Smoothed Analysis of k-Means Clustering with Bregman Divergences

Abstract

Access this chapter

Preview

Similar content being viewed by others

BETULA: Numerically Stable CF-Trees for BIRCH Clustering

A Quality Metric for K-Means Clustering Based on Centroid Locations

An Improved Bregman k-means++ Algorithm via Local Search

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation