Abstract
Due to the parallel and computationally intensive nature of Artificial Neural Networks, we use GPUs to implement a generic Multilayer Perceptron (MLP) framework and compare the speed to an implementation on the CPU. The speedup achieved increases as the size of the network increases, but is also contingent on the hardware used. Three GPUs are tested, the Tesla K80, the Tesla T4, and the Tesla P100. For the largest ANNs tested, speedups ranged from 331.14\(\times \) for the K80 up to 2379.2\(\times \) on the P100.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abraham, A.: Artificial neural networks. In: Syden-ham, P., Thorn, R. (eds.) Handbook of Measuring System Design. John Wiley and Sons Ltd., London, pp. 901–908 (2005)
Akiba, T., Suzuki, S., Fukuda, K.: Extremely large minibatch SGD: training resnet-50 on ImageNet in 15 minutes. arXiv preprint arXiv:1711.04325 (2017)
Beckingsale, D.A., et al.: Portable performance for large-scale scientific applications. In: 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–81. IEEE (2019)
Cao, G., Balasubramanian, N., Balasubramanian. A.: MobiRNN: efficient recurrent neural network execution on mobile GPU. In: Proceedings of the 1st International Workshop on Deep Learning for Mobile Systems and Applications, pp. 1–6 (2017)
Chellapilla, K., Puri, S., Simard, P.: High performance convolutional neural networks for document processing. In: Lorette, G. (ed.) Tenth International Workshop on Frontiers in Handwriting Recognition, La Baule (France), October 2006. Université de Rennes 1, Suvisoft. https://www.suvisoft.com
Ciresan, D.C., Meier, U., Masci, J., Gambardella, L.M., Schmidhuber, J.: Flexible, high performance convolutional neural networks for image classification. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)
Dematté, L., Prandi, D.: GPU computing for systems biology. Brief. Bioinform. 11(3), 323–333 (2010)
Dogaru, R., Dogaru, I.: Optimization of gpu and cpu acceleration for neural networks layers implemented in python. In: 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE), pp. 1–6 (2017)
Dolhansky, B.: Artificial neural networks: Matrix form (Part 5), December 2014. https://www.briandolhansky.com/blog/2014/10/30/artificial-neural-networks-matrix-form-part-5
Fernando, R.: Reducing the Cost of Vertex Transfer, Chapter 28.3.2. Addison-Wesley (2004)
Guzhva, A., Dolenko, S., Persiantsev, I.: Multifold acceleration of neural network computations using GPU. In: Alippi, C., Polycarpou, M., Panayiotou, C., Ellinas, G. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 373–380. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-04274-4_39
Hassoun, M.H.: et al.: Fundamentals of Artificial Neural Networks. MIT Press, Cambridge (1995)
Huqqani, A.A., Schikuta, E., Ye, S., Chen, P.: Multicore and GPU parallelization of neural networks for face recognition. Procedia Comput. Sci. 18, 349–358 (2013)
Salar, S., Oskouei, L., Golestani, H., Hashemi, M., Ghiasi, S.: CNNdroid: GPU-accelerated execution of trained deep convolutional neural networks on android. In: Proceedings of the 24th ACM International Conference on Multimedia, pp. 1201–1205 (2016)
Lee, J., et al.: On-device neural net inference with mobile GPUs. arXiv preprint arXiv:1907.01989 (2019)
Li, B., et al.: Large scale recurrent neural network on GPU. In: 2014 International Joint Conference on Neural Networks (IJCNN), pp. 4062–4069 (2014)
Li, Y., Liu, Z., Kai, X., Hao, Yu., Ren, F.: A GPU-outperforming FPGA accelerator architecture for binary convolutional neural networks. ACM J. Emerg. Technol. Comput. Syst. (JETC) 14(2), 1–16 (2018)
Ma, Y., Rusu, F., Torres, M.: Stochastic gradient descent on modern hardware: Multi-core CPU or GPU? synchronous or asynchronous? In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pp. 1063–1072. IEEE (2019)
Nugteren, C.: Tutorial: Opencl sgemm tuning for kepler (2014). https://cnugteren.github.io/tutorial/pages/page1.html
Kyoung-Su, O., Jung, K.: GPU implementation of neural networks. Pattern Recogn. 37(6), 1311–1314 (2004)
Pallipuram, V.K., Bhuiyan, M., Smith, M.C.: A comparative study of GPU programming models and architectures using neural networks. J. Supercomput. 61(3), 673–718 (2012)
Strigl, D., Kofler, K., Podlipnig, S.: Performance and scalability of GPU-based convolutional neural networks. In: 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp. 317–324 (2010)
Vouzis, P.D., Sahinidis, N.V.: GPU-blast: using graphics processors to accelerate protein sequence alignment. Bioinformatics 27(2), 182–188 (2011)
Yegnanarayana, B.: Artificial Neural Networks. PHI learning Pvt. Ltd. (2009)
Zhang, S., Gunupudi, P., Zhang. Q-.J.: Parallel back-propagation neural network training technique using CUDA on multiple GPUs. In: 2015 IEEE MTT-S International Conference on Numerical Electromagnetic and Multiphysics Modeling and Optimization (NEMO), pp. 1–3. IEEE (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Udby, T., Tian, Y. (2023). A Generic Neural Network Implementation on GPU and Its Performance Benchmark. In: Arai, K. (eds) Proceedings of the Future Technologies Conference (FTC) 2022, Volume 3. FTC 2022 2022. Lecture Notes in Networks and Systems, vol 561. Springer, Cham. https://doi.org/10.1007/978-3-031-18344-7_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-18344-7_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18343-0
Online ISBN: 978-3-031-18344-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)