skip to main content
research-article
Open access

ImageNet classification with deep convolutional neural networks

Published: 24 May 2017 Publication History

Abstract

We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0%, respectively, which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overfitting in the fully connected layers we employed a recently developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

References

[1]
Bell, R., Koren, Y. Lessons from the netflix prize challenge. ACM SIGKDD Explor. Newsl. 9, 2 (2007), 75--79.
[2]
Berg, A., Deng, J., Fei-Fei, L. Large scale visual recognition challenge 2010. www.image-net.org/challenges. 2010.
[3]
Breiman, L. Random forests. Mach. Learn. 45, 1 (2001), 5--32.
[4]
Cireşan, D., Meier, U., Masci, J., Gambardella, L., Schmidhuber, J. High-performance neural networks for visual object classification. Arxiv preprint arXiv:1102.0183, 2011.
[5]
Cireşan, D., Meier, U., Schmidhuber, J. Multi-column deep neural networks for image classification. Arxiv preprint arXiv:1202.2745, 2012.
[6]
Deng, J., Berg, A., Satheesh, S., Su, H., Khosla, A., Fei-Fei, L. In ILSVRC-2012 (2012).
[7]
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In CVPR09 (2009).
[8]
Fei-Fei, L., Fergus, R., Perona, P. Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories. Comput. Vision Image Understanding 106, 1 (2007), 59--70.
[9]
Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybern. 36, 4 (1980), 193--202.
[10]
Griffin, G., Holub, A., Perona, P. Caltech-256 object category dataset. Technical Report 7694, California Institute of Technology, 2007.
[11]
He, K., Zhang, X., Ren, S., Sun, J. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385, 2015.
[12]
Hinton, G., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012).
[13]
Jarrett, K., Kavukcuoglu, K., Ranzato, M.A., LeCun, Y. What is the best multi-stage architecture for object recognition? In International Conference on Computer Vision (2009). IEEE, 2146--2153.
[14]
Krizhevsky, A. Learning multiple layers of features from tiny images. Master's thesis, Department of Computer Science, University of Toronto, 2009.
[15]
Krizhevsky, A. Convolutional deep belief networks on cifar-10. Unpublished manuscript, 2010.
[16]
Krizhevsky, A., Hinton, G. Using very deep autoencoders for content-based image retrieval. In ESANN (2011).
[17]
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., Jackel, L., et al. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems (1990).
[18]
LeCun, Y. Une procedure d'apprentissage pour reseau a seuil asymmetrique (a learning scheme for asymmetric threshold networks). 1985.
[19]
LeCun, Y., Huang, F., Bottou, L. Learning methods for generic object recognition with invariance to pose and lighting. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004, CVPR 2004. Volume 2 (2004). IEEE, II--97.
[20]
LeCun, Y., Kavukcuoglu, K., Farabet, C. Convolutional networks and applications in vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS) (2010). IEEE, 253--256.
[21]
Lee, H., Grosse, R., Ranganath, R., Ng, A. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. In Proceedings of the 26th Annual International Conference on Machine Learning (2009). ACM, 609--616.
[22]
Linnainmaa, S. Taylor expansion of the accumulated rounding error. BIT Numer. Math. 16, 2 (1976), 146--160.
[23]
Mensink, T., Verbeek, J., Perronnin, F., Csurka, G. Metric learning for large scale image classification: Generalizing to new classes at near-zero cost. In ECCV -- European Conference on Computer Vision (Florence, Italy, Oct. 2012).
[24]
Nair, V., Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (2010).
[25]
Pinto, N., Cox, D., DiCarlo, J. Why is real-world visual object recognition hard? PLoS Comput. Biol. 4, 1 (2008), e27.
[26]
Pinto, N., Doukhan, D., DiCarlo, J., Cox, D. A high-throughput screening approach to discovering good forms of biologically inspired visual representation. PLoS Comput. Biol. 5, 11 (2009), e1000579.
[27]
Rumelhart, D.E., Hinton, G.E., Williams, R.J. Learning internal representations by error propagation. Technical report, DTIC Document, 1985.
[28]
Russell, BC, Torralba, A., Murphy, K., Freeman, W. Labelme: A database and web-based tool for image annotation. Int. J. Comput Vis. 77, 1 (2008), 157--173.
[29]
Sánchez, J., Perronnin, F. High-dimensional signature compression for large-scale image classification. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2011 (2011). IEEE, 1665--1672.
[30]
Simard, P., Steinkraus, D., Platt, J. Best practices for convolutional neural networks applied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition. Volume 2 (2003), 958--962.
[31]
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2015), 1--9.
[32]
Turaga, S., Murray, J., Jain, V., Roth, F., Helmstaedter, M., Briggman, K., Denk, W., Seung, H. Convolutional networks can learn to generate affinity graphs for image segmentation. Neural Comput. 22, 2 (2010), 511--538.
[33]
Werbos, P. Beyond regression: New tools for prediction and analysis in the behavioral sciences, 1974.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM
Communications of the ACM  Volume 60, Issue 6
June 2017
93 pages
ISSN:0001-0782
EISSN:1557-7317
DOI:10.1145/3098997
  • Editor:
  • Moshe Y. Vardi
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 May 2017
Published in CACM Volume 60, Issue 6

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)44,965
  • Downloads (Last 6 weeks)5,969
Reflects downloads up to 06 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Machine learning identification of fractional-order vortex beam diffraction processActa Physica Sinica10.7498/aps.74.2024145874:1(014203)Online publication date: 2025
  • (2025)Application of Machine Learning in Cell DetectionTargets10.3390/targets30100023:1(2)Online publication date: 6-Jan-2025
  • (2025)Attention Score-Based Multi-Vision Transformer Technique for Plant Disease ClassificationSensors10.3390/s2501027025:1(270)Online publication date: 6-Jan-2025
  • (2025)Deep BiLSTM Attention Model for Spatial and Temporal Anomaly Detection in Video SurveillanceSensors10.3390/s2501025125:1(251)Online publication date: 4-Jan-2025
  • (2025)Robust Multi-Subtype Identification of Breast Cancer Pathological Images Based on a Dual-Branch Frequency Domain Fusion NetworkSensors10.3390/s2501024025:1(240)Online publication date: 3-Jan-2025
  • (2025)Deep Learning Applications in Ionospheric Modeling: Progress, Challenges, and OpportunitiesRemote Sensing10.3390/rs1701012417:1(124)Online publication date: 2-Jan-2025
  • (2025)AFHRE: An Accurate and Fast Hardware Resources Estimation Method for Convolutional Accelerator with Systolic Array Structure on FPGAElectronics10.3390/electronics1401016814:1(168)Online publication date: 3-Jan-2025
  • (2025)AI-Driven Enhancement of Skin Cancer Diagnosis: A Two-Stage Voting Ensemble Approach Using Dermoscopic DataCancers10.3390/cancers1701013717:1(137)Online publication date: 3-Jan-2025
  • (2025)Concatenated Attention: A Novel Method for Regulating Information Structure Based on SensorsApplied Sciences10.3390/app1502052315:2(523)Online publication date: 8-Jan-2025
  • (2025)Deep Learning and Recurrence Information Analysis for the Automatic Detection of Obstructive Sleep ApneaApplied Sciences10.3390/app1501043315:1(433)Online publication date: 5-Jan-2025
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media