General Convergence Results for Linear Discriminant Updates

Grove, Adam J.; Littlestone, Nick; Schuurmans, Dale

doi:10.1023/A:1010844028087

General Convergence Results for Linear Discriminant Updates

Published: June 2001

Volume 43, pages 173–210, (2001)
Cite this article

Download PDF

Machine Learning Aims and scope Submit manuscript

General Convergence Results for Linear Discriminant Updates

Download PDF

1083 Accesses
60 Citations
3 Altmetric
Explore all metrics

Abstract

The problem of learning linear-discriminant concepts can be solved by various mistake-driven update procedures, including the Winnow family of algorithms and the well-known Perceptron algorithm. In this paper we define the general class of “quasi-additive” algorithms, which includes Perceptron and Winnow as special cases. We give a single proof of convergence that covers a broad subset of algorithms in this class, including both Perceptron and Winnow, but also many new algorithms. Our proof hinges on analyzing a generic measure of progress construction that gives insight as to when and how such algorithms converge.

Our measure of progress construction also permits us to obtain good mistake bounds for individual algorithms. We apply our unified analysis to new algorithms as well as existing algorithms. When applied to known algorithms, our method “automatically” produces close variants of existing proofs (recovering similar bounds)—thus showing that, in a certain sense, these seemingly diverse results are fundamentally isomorphic. However, we also demonstrate that the unifying principles are more broadly applicable, and analyze a new class of algorithms that smoothly interpolate between the additive-update behavior of Perceptron and the multiplicative-update behavior of Winnow.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Azoury, K. & Warmuth, M. (1999). Relative loss bounds for on-line density estimation with the exponential family of distributions. In Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence (pp. 31–40).
Block, H. D. (1962). The perceptron: A model for brain functioning. Reviews of Modern Physics, 34:1, 123–135.
Google Scholar
Blum, A. (1997). Empirical support for Winnow and Weighted-Majority based algorithms: Results on a calendar scheduling domain. Machine Learning, 26:1, 5–23.
Google Scholar
Bregman, L. M. (1967). The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. USSR Computational Mathematics and Mathematical Physics, 7, 200–217.
Google Scholar
Cesa-Bianchi, N., Freund, Y., Helmbold, D., Haussler, D., Schapire, R., & Warmuth, M. (1997). How to use expert advice. J. ACM 44, 427–485.
Google Scholar
Cesa-Bianchi, N., Helmbold, D., & Panizza., S. (1996). On Bayes methods for on-line boolean prediction. In Proceedings of the Ninth Annual Conference on Computational Learning Theory (pp. 314–324).
Censor, Y. & Zenios, S. A. (1997). Parallel Optimization: Theory, Algorithms, and Applications. New York: Oxford University Press.
Google Scholar
Duda, R. O. & Hart, P. (1973). Pattern classification and scene analysis. New York: Wiley.
Google Scholar
Dagan, I., Karov, Y., & Roth, D. (1997). Mistake-driven learning in text categorization. In Proceedings of the Second Conference on Empirical Methods in Natural Language Processing (pp. 55–63).
Ellis, R. (1985). Entropy, large deviations, and statistical mechanics. New York: Springer-Verlag.
Google Scholar
Gentile, C. & Littlestone, N. (1999). The robustness of the p-norm algorithms. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory (pp. 1–11).
Grove, A., Littlestone, N., & Schuurmans, D. (1997). General convergence results for linear discriminant updates. In Proceedings of the Tenth Annual Conference on Computational Learning Theory (pp. 171–183).
Golding, A. & Roth, D. A Winnow-based approach to spelling correction. Machine Learning, 34:1, 107–130
Gentile, C. & Warmuth, M. (1999). Linear hinge loss and average margin. In Advances in Neural Information Processing Systems 11 (pp. 225–231).
Helmbold, D., Kivinen, J., & Warmuth, M. (1999).Worst-case loss bounds for single neurons. IEEE Trans. Neural Networks, 10, 1291–1304.
Google Scholar
Khardon, R., Roth, D., & Valiant, L. (1999). Relational learning for NLP using linear threshold elements. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (pp. 911–917).
Kivinen, J. & Warmuth, M. (1997). Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 132:1, 1–63.
Google Scholar
Kivinen, J. & Warmuth, M. (1998). Relative loss bounds for multidimensional regression problems. In Advances in Neural Information Processing Systems 10 (pp. 287–293).
Kivinen, J., Warmuth, M., & Auer, P. (1997). The perceptron algorithm versus winnow: linear versus logarithmic mistake bounds when few input variables are relevant. Artificial Intelligence, 97:1-2, 325–343.
Google Scholar
Littlestone, N. (1988). Learning quickly when irrelevant attributes abound: A new linear threshold algorithm. Machine Learning, 2:4, 285–318.
Google Scholar
Littlestone, N. (1989). Mistake bounds & linear-threshold learning algorithms. Ph.D. thesis, University of California, Santa Cruz, Technical Report UCSC-CRL-89-11.
Google Scholar
Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear-threshold learning using Winnow. In Proceedings of the Fourth Annual Conference on Computational Learning Theory (pp. 147–156).
Littlestone, N. (1995). Comparing several linear-threshold learning algorithms on tasks involving superfluous attributes. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 353–361).
Littlestone, N. & Mesterharm, C. (1997). An apobayesian relative of winnow. In Advances in Neural Information Processing Systems 9 (pp. 204–210).
Littlestone, N. & Warmuth, M. (1989). The weighted majority algorithm. In Proceedings of the Thirtieth IEEE Annual Symposium on the Foundations of Computer Science (pp. 256–261).
Minsky, M. L. & Papert, S. A. (1969). Perceptrons. Cambridge, MA: MIT Press.
Google Scholar
Nilsson, N. J. (1965). Learning machines. San Mateo, CA: Morgan Kaufmann.
Google Scholar
Papert, S. (1961). Some mathematical models of learning. In Proceedings of the Fourth London Symposium on Information Theory.
Rockafellar, R. (1970). Convex analysis. Princeton University Press.
Rosenblatt, F. (1962). Principles of neurodynamics: perceptrons and the theory of brain mechanisms. Washington, DC: Spartan Books.
Google Scholar
Vovk, V. (1990). Aggregating strategies. In Proceedings of the Third Annual Conference on Computational Learning Theory (pp. 371–383).
Warmuth, M. & Jagota, A. (1998). Continuous and discrete-time non-linear gradient descent: Relative loss bounds and convergence. In Fifth International Symposium on Artificial Intelligence and Mathematics.

Download references

Author information

Authors and Affiliations

NEC Research Institute, 4 Independence Way, Princeton, NJ, 08540, USA
Adam J. Grove, Nick Littlestone & Dale Schuurmans

Authors

Adam J. Grove
View author publications
You can also search for this author in PubMed Google Scholar
Nick Littlestone
View author publications
You can also search for this author in PubMed Google Scholar
Dale Schuurmans
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grove, A.J., Littlestone, N. & Schuurmans, D. General Convergence Results for Linear Discriminant Updates. Machine Learning 43, 173–210 (2001). https://doi.org/10.1023/A:1010844028087

Download citation

Issue Date: June 2001
DOI: https://doi.org/10.1023/A:1010844028087

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

General Convergence Results for Linear Discriminant Updates

Abstract

Article PDF

Similar content being viewed by others

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

A globally convergent incremental Newton method

Rates of superlinear convergence for classical quasi-Newton methods

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

General Convergence Results for Linear Discriminant Updates

Abstract

Article PDF

Similar content being viewed by others

Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition

A globally convergent incremental Newton method

Rates of superlinear convergence for classical quasi-Newton methods

Explore related subjects

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation