Abstract
Understanding how biological visual systems recognize objects is one of the ultimate goals in computational neuroscience. From the computational viewpoint of learning, different recognition tasks, such as categorization and identification, are similar, representing different trade-offs between specificity and invariance. Thus, the different tasks do not require different classes of models. We briefly review some recent trends in computational vision and then focus on feedforward, view-based models that are supported by psychophysical and physiological data.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
206,07 € per year
only 17,17 € per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout



Similar content being viewed by others
References
Rosch, E., Mervis, C., Gray, W., Johnson, D. & Boyes-Braem, P. Basic objects in natural categories. Cogn. Psychol. 8, 382–439 ( 1976).
Logothetis, N. & Sheinberg, D. Visual object recognition. Annu. Rev. Neurosci. 19, 577–621 (1996).
Ullman, S. High-Level Vision: Object Recognition and Visual Cognition (MIT Press, Cambridge, Massachusetts, 1996).
Edelman, S. Representation and Recognition in Vision (MIT Press, Cambridge, Massachusetts, 1999).
Poggio, T. & Edelman, S. A network that learns to recognize 3D objects. Nature 343, 263– 266 (1990).
Brunelli, R. & Poggio, T. Face recognition: Features versus templates. IEEE PAMI 15, 1042– 1052 (1993).
Yang, M.-H., Roth, D. & Ahuja, N. A. in Advances in Neural Information Processing Systems Vol. 12 (eds. Solla, S.A., Leen, T.K. & Müller, K.-K.) 855–861 (MIT Press, Cambridge, Massachusetts, 1999).
Schneiderman, H. & Kanade, T. in Proc. IEEE Conference on Computer Vision and Pattern Recognition 45– 51 (IEEE, Los Alamitos, California, 1998).
Oren, M. Papageorgiou, C., Sinha, P., Osuna, E. & Poggio, T. in IEEE Conference on Computer Vision and Pattern Recognition 193–199 (IEEE, Los Alamitos, CA, 1997).
Chen, S., Donoho, D. & Saunders, M. Atomic decomposition by basis pursuit. Technical Report 479 (Dept. of Statistics, Stanford University, 1995).
Tanaka, K. Inferotemporal cortex and object vision. Annu. Rev. Neurosci. 19, 109–139 (1996).
Mohan, A. Object detection in images by components. AI Memo 1664 (CBCL and AI Lab, MIT, Cambridge, Massachusetts, 1999).
Heisele, B., Poggio, T. & Pontil, M. Face detection in still gray images. AI Memo 1687 (CBCL and AI Lab, MIT, Cambridge, Massachusetts, 2000).
Ullman, S. & Sali, E. in Proceedings of BMCV2000, Vol. 1811 of Lecture Notes in Computer Science (eds. Lee, S.-W., Bülthoff, H. & Poggio, T.) 73–87 (Springer, New York, 2000).
Schneiderman, H. & Kanade, T. A statistical method for 3D object detection applied to faces and cars. in IEEE Conference on Computer Vision and Pattern Recognition (in press).
Marr, D. & Nishihara, H. K. Representation and recognition of the spatial organization of three-dimensional shapes. Proc. R. Soc. Lond. B Biol. Sci. 200, 269– 294 (1978).
Biederman, I. Recognition-by-components: A theory of human image understanding. Psychol. Rev. 94, 115–147 (1987).
Hummel, J. & Biederman, I. Dynamic binding in a neural network for shape recognition. Psychol. Rev. 99, 480–517 (1992).
Biederman, I. & Gerhardstein, P. Recognizing depth-rotated objects: evidence and conditions for three-dimensional viewpoint invariance. J. Exp. Psychol. Hum. Percept. Perform. 19, 1162– 1182 (1993).
Tarr, M. & Bülthoff, H. Is human object recognition better described by geon structural descriptions or by multiple views? Comment on Biederman and Gerhardstein (1993). J. Exp. Psychol. Hum. Percept. Perform. 21, 1494–1505 ( 1995).
Tarr, M. & Bülthoff, H. Image-based object recognition in man, monkey and machine. Cognition 67, 1–20 (1998).
Logothetis, N., Pauls, J., Bülthoff, H. & Poggio, T. View-dependent object recognition by monkeys. Curr. Biol. 4, 401–414 (1994).
Tarr, M., Williams, P., Hayward, W. & Gauthier, I. Three-dimensional object recognition is viewpoint-dependent. Nat. Neurosci. 1, 275–277 ( 1998).
Logothetis, N., Pauls, J. & Poggio, T. Shape representation in the inferior temporal cortex of monkeys. Curr. Biol. 5, 552– 563 (1995).
Perrett, D., Hietanen, J., Oram, M. & Benson, P. Organization and functions of cells responsive to faces in the temporal cortex. Phil. Trans. R. Soc. Lond. B Biol. Sci. 335, 23– 30 (1992).
Ungerleider, L. & Haxby, J. 'What' and 'where' in the human brain. Curr. Opin. Neurobiol. 4, 157–165 (1994).
Kobatake, E. & Tanaka, K. Neuronal selectivities to complex object features in the ventral visual pathway of the macaque cerebral cortex . J. Neurophysiol. 71, 856– 867 (1994).
Booth, M. & Rolls, E. View-invariant representations of familiar objects by neurons in the inferior temporal visual cortex. Cereb. Cortex 8, 510–523 (1998).
Kobatake, E., Wang, G. & Tanaka, K. Effects of shape-discrimination training on the selectivity of inferotemporal cells in adult monkeys. J. Neurophysiol. 80, 324–330 (1998).
Wang, G., Tanaka, K. & Tanifuji, M. Optical imaging of functional organization in the monkey inferotemporal cortex. Science 272, 1665 –1668 (1996).
Young, M. & Yamane, S. Sparse population coding of faces in the inferotemporal cortex. Science 256, 1327–1331 (1992).
Miller, E. The prefrontal cortex and cognitive control. Nat. Rev. Neurosci. 1, 59–65 (2000 ).
Mumford, D. On the computational architecture of the neocortex. II. The role of corticocortical loops. Biol. Cybern. 66, 241– 251 (1992).
Rao, R. & Ballard, D. Dynamic model of visual recognition predicts neural response properties in the visual cortex. Neural Comput. 9, 721–763 ( 1997).
Anderson, C. & van Essen, D. Shifter circuits: a computational strategy for dynamic aspects of visual processing. Proc. Natl. Acad. Sci. USA 84, 6297–6301 (1987).
Olshausen, B., Anderson, C. & van Essen, D. A neurobiological model of visual attention and invariant pattern recognition based on dynamic routing of information. J. Neurosci. 13, 4700–4719 (1993).
Gochin, P. Properties of simulated neurons from a model of primate inferior temporal cortex. Cereb. Cortex 5, 532– 543 (1994).
Thorpe, S., Fize, D. & Marlot, C. Speed of processing in the human visual system. Nature 381, 520–522 ( 1996).
Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 193–202 (1980).
Perrett, D. & Oram, M. Neurophysiology of shape processing . Image Vis. Comput. 11, 317– 333 (1993).
Mel, B. SEEMORE: combining color, shape, and texture histogramming in a neurally inspired approach to visual object recognition. Neural Comput. 9, 777–804 (1997).
Riesenhuber, M. & Poggio, T. Hierarchical models of object recognition in cortex. Nat. Neurosci. 2, 1019–1025 (1999).
Wallis, G. & Rolls, E. A model of invariant object recognition in the visual system. Prog. Neurobiol. 51, 167–194 (1997).
Riesenhuber, M. & Poggio, T. Are cortical models really bound by the “binding problem”? Neuron 24, 87–93 (1999).
Amit, Y. & Geman, D. A computational model for visual selection . Neural Comput. 11, 1691– 1715 (1999).
Bülthoff, H. & Edelman, S. Psychophysical support for a two-dimensional view interpolation theory of object recognition . Proc. Natl. Acad. Sci. USA 89, 60– 64 (1992).
Riesenhuber, M. & Poggio, T. The individual is nothing, the class everything: Psychophysics and modeling of recognition in object classes. AI Memo 1682, CBCL Paper 185 (MIT AI Lab and CBCL, Cambridge, Massachusetts, 2000).
Edelman, S. Class similarity and viewpoint invariance in the recognition of 3D objects . Biol. Cybern. 72, 207– 220 (1995).
Moses, Y., Ullman, S. & Edelman, S. Generalization to novel images in upright and inverted faces. Perception 25, 443– 462 (1996).
Riesenhuber, M. & Poggio, T. A note on object class representation and categorical perception. AI Memo 1679, CBCL Paper 183 (MIT AI Lab and CBCL, Cambridge, Massachusetts, 1999).
Hinton, G., Dayan, P., Frey, B. & Neal, R. The wake-sleep algorithm for unsupervised neural networks. Science 268, 1158–1160 (1995).
Chelazzi, L., Duncan, J., Miller, E. & Desimone, R. Responses of neurons in inferior temporal cortex during memory-guided visual search. J. Neurophysiol. 80, 2918–2940 (1998).
Haenny, P., Maunsell, J. & Schiller, P. State dependent activity in monkey visual cortex. II. Retinal and extraretinal factors in V4. Exp. Brain Res. 69, 245–259 (1988).
Miller, E., Erickson, C. & Desimone, R. Neural mechanism of visual working memory in prefrontal cortex of the macaque. J. Neurosci. 16, 5154–5167 (1996).
Motter, B. Neural correlates of feature selective memory and pop-out in extrastriate area V4. J. Neurosci. 14, 2190– 2199 (1994).
Olshausen, B. & Field, D. Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381, 607–609 ( 1996).
Hyvärinen, A. & Hoyer, P. Emergence of phase and shift invariant features by decomposition of natural images into independent feature subspaces. Neural Comput. 12, 1705 –1720 (2000).
Földiák, P. Learning invariance from transformation sequences. Neural Comput. 3, 194–200 ( 1991).
Weber, M., Welling, W. & Perona, P. Towards automatic discovery of object categories. in IEEE Conference on Computer Vision and Pattern Recognition (in press).
Acknowledgements
Supported by grants from ONR, DARPA, NSF, ATR, Honda, a Merck/MIT Fellowship in Bioinformatics, and a McDonnell Pcw award (M.R.). T.P. is supported by the Uncas and Helen Whitaker Chair at the Whitaker College, MIT. For comments and suggestions, we are grateful to Heinrich Bülthoff, Peter Dayan, Shimon Edelman, David Freedman, Christof Koch, Earl Miller, David Perrett, Pawan Sinha and Francis Crick (also for the picture in Fig. 3).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Riesenhuber, M., Poggio, T. Models of object recognition. Nat Neurosci 3 (Suppl 11), 1199–1204 (2000). https://doi.org/10.1038/81479
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1038/81479
This article is cited by
-
Randomness impacts the building of specific priors, visual exploration, and perception in object recognition
Scientific Reports (2024)
-
Deep convolutional neural networks are not mechanistic explanations of object recognition
Synthese (2024)
-
Canonical circuit computations for computer vision
Biological Cybernetics (2023)
-
Visual categories and concepts in the avian brain
Animal Cognition (2023)
-
Banknote authenticity is signalled by rapid neural responses
Scientific Reports (2022)