Tuning remote GPU virtualization for InfiniBand networks

Reaño, Carlos; Silla, Federico

doi:10.1007/s11227-016-1754-3

Tuning remote GPU virtualization for InfiniBand networks

Published: 23 May 2016

Volume 72, pages 4520–4545, (2016)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

355 Accesses
Explore all metrics

Abstract

In the past few years, a tendency towards using InfiniBand networks to interconnect high performance computing clusters can be observed. Thus, most of the supercomputers appearing in the TOP500 list either use Ethernet or InfiniBand interconnects. Regarding the latter, the complexity of the InfiniBand programming API (i.e., InfiniBand Verbs) makes it difficult for applications to get the maximum performance of these networks. In this paper we expose how we have tuned a remote GPU virtualization framework whose communications module is implemented using InfiniBand Verbs. The net result is a noticeable increase in the performance of this framework, significantly reducing the gap between remote and local GPUs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA

On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization

Article 19 July 2018

Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models

Article 21 June 2016

Notes

Although the X-axis is shown in MB/s for clarity, notice that the test has been made using different transfer sizes from 1 KB to 60 MB.

References

InfiniBand Trade Association (IBTA) (2015) [Online]. http://www.infinibandta.org
DAmbrosia J (2014) Ethernet in the TOP500 [Online]. http://www.scientificcomputing.com/blogs/2014/07/ethernet-top500
TOP500 Supercomputer Sites (2014) [Online]. http://www.top500.org/
InfiniBand Trade Association (IBTA) (2007) The InfiniBand Trade Association Specification
Kerr G (2011) Dissecting a small infiniband application using the verbs API. CoRR abs/1105.1827 [Online]. arxiv:1105.1827
Woodruff B, Hefty S, Dreier R, Rosenstock H (2005) Introduction to the infiniband core software. In: Linux symposium, vol 2
Bedeir T (2010) Building an RDMA-capable application with ib verbs, Technical report, HPC Advisory Council, Tech. Rep., 2010. http://www.hpcadvisorycouncil.com/pdf/building-an-rdma-capable-application-with-ib-verbs.pdf
Liu Q, Russell RD (2014) A performance study of infiniband fourteen data rate (fdr). In: Proceedings of the High performance computing symposium, ser. HPC ’14. San Diego, CA, USA: Society for Computer Simulation International, 2014, pp 16:1–16:10 [Online]. http://dl.acm.org/citation.cfm?id=2663510.2663526
Hjelm N (2014) Optimizing one-sided operations in open mpi. In: Proceedings of the 21st European MPI Users’ Group Meeting, ser. EuroMPI/ASIA ’14. New York, NY, USA: ACM, 2014, pp 123:123–123:124 [Online]. http://doi.acm.org/10.1145/2642769.2642792
Subramoni H, Hamidouche K, Venkatesh A, Chakraborty S, Panda D (2014) Designing mpi library with dynamic connected transport (dct) of infiniband: Early experiences. In: Kunkel J , Ludwig T, Meuer H (eds) Supercomputing, ser. lecture notes in computer science. Springer International Publishing, 2014, vol 8488, pp 278–295 [Online]. doi:10.1007/978-3-319-07518-1_18
Unified Communication X (UCX), 2015 [Online]. http://www.openucx.org
NVIDIA (2014) CUDA C Programming Guide 6.5
Peña AJ, Reaño C, Silla F, Mayo R, Quintana-Ortí ES, Duato J (2014) A complete and efficient cuda-sharing solution for hpc clusters. Parallel Comput 40(10):574– 588 [Online]. http://www.sciencedirect.com/science/article/pii/S0167819114001227
Reaño C, Silla F, Gimeno AC, Peña AJ, Mayo R, Quintana-Ortí ES, Duato J (2015) Improving the user experience of the rcuda remote GPU virtualization framework. Concurr Comput Pract Exp 27(14)3746–3770 [Online]. doi:10.1002/cpe.3409
Prades J, Reaño C, Silla F (2016) Flexible access to CUDA accelerators from Xen virtual machines in InfiniBand clusters using rCUDA. In: 21st ACM SIGPLAN symposium on principles and practice of parallel programming, PPoPP 2016
Iserte S, Gimeno AC, Mayo R, Quintana-Ortí ES, Silla F, Duato J, Reaño C, Prades J (2014) SLURM support for remote GPU virtualization: implementation and performance study. In: 26th IEEE international symposium on computer architecture and high performance computing, SBAC-PAD, 2014, pp 318–325 [Online]. doi:10.1109/SBAC-PAD.2014.49
NVIDIA (2014) NVIDIA CUDA Samples 6.5
Che S, Boyer M, Meng J, Tarjan D, Sheaffer J, Lee S-H, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Workload Characterization, 2009. IISWC 2009. IEEE international symposium on, 2009, pp 44–54
University of Tennessee, MAGMA: matrix algebra on GPU and multicore architectures [Online]. http://icl.cs.utk.edu/magma
Bosma W, Cannon J, Playoust C (1997) The Magma algebra system. I. The user language. Computational algebra and number theory (London, 1993). J Symbol Comput 24(3–4) 235–265 [Online]. doi:10.1006/jsco.1996.0125
GROMACS web page (2014 ) [Online]. http://www.gromacs.org/
Pronk S, Pll S, Schulz R, Larsson P, Bjelkmar P, Apostolov R, Shirts MR, Smith JC, Kasson PM, van der Spoel D, Hess B, Lindahl E (2013) Gromacs 4.5: a high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics 29(7)845–854 [Online]. http://bioinformatics.oxfordjournals.org/content/29/7/845.abstract
Brown WM, Kohlmeyer A, Plimpton SJ, Tharrington AN (2012) Implementing molecular dynamics on hybrid high performance computers: particle–particle particle–mesh. Comp Phys Commun 183(3):449–459
Article Google Scholar
Athanasopoulos A, Dimou A, Mezaris V, Kompatsiaris I (2011) GPU acceleration for support vector machines. In: 12th international workshop on image analysis for multimedia interactive services (WIAMIS)

Download references

Acknowledgments

This work was funded by the Spanish MINECO and FEDER funds under Grant TIN2012-38341-C04-01. Authors are also grateful for the generous support provided by Mellanox Technologies.

Author information

Authors and Affiliations

Universitat Politècnica de Valencia, 46022, Valencia, Spain
Carlos Reaño & Federico Silla

Authors

Carlos Reaño
View author publications
You can also search for this author in PubMed Google Scholar
Federico Silla
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlos Reaño.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Reaño, C., Silla, F. Tuning remote GPU virtualization for InfiniBand networks. J Supercomput 72, 4520–4545 (2016). https://doi.org/10.1007/s11227-016-1754-3

Download citation

Published: 23 May 2016
Issue Date: December 2016
DOI: https://doi.org/10.1007/s11227-016-1754-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

€32.70 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price includes VAT (France)

Instant access to the full article PDF.

Institutional subscriptions

Tuning remote GPU virtualization for InfiniBand networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA

On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization

Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Tuning remote GPU virtualization for InfiniBand networks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Providing CUDA Acceleration to KVM Virtual Machines in InfiniBand Clusters with rCUDA

On construction of a virtual GPU cluster with InfiniBand and 10 Gb Ethernet virtualization

Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation