This repository is meant to provide open source resources for educational purposes about CUDA C/C++ programming, which is the C/C++ interface to the CUDA parallel computing platform. In CUDA, the host refers to the CPU and its memory, while the device refers to the GPU and its memory. Code run on the host can manage memory on both the host and device, and also launches kernels which are functions executed on the device by many GPU threads in parallel.
NOTE: it is assumed that you have access to a computer with a CUDA-enabled NVIDIA GPU.
Here you can find the solutions for different simple exercises about GPU programming in CUDA C/C++. The source code is well commented and easy to follow, though a minimum knowledge of parallel architectures is recommended.
- exercise 00: hello, world!
- exercise 01: print devices properties
- exercise 02: addition
- exercise 03: vector addition using parallel blocks
- exercise 04: vector addition using parallel threads
- exercise 05: vector addition combining blocks and threads
- exercise 06: single-precision A*X Plus Y
- exercise 07: time, bandwidth, and throughput computation (single-precision A*X Plus Y)
- exercise 08: multiplication of square matrices
- exercise 09: transpose of a square matrix
- exercise 10: dot product using shared memory
- exercise 11: prefix sum (exclusive scan) using shared memory
The CUDA C/C++ compiler nvcc
is part of the NVIDIA CUDA Toolkit which is used to separate source code into host and device components. Then, you can compile the code with nvcc
.
NOTE: to find out how long the kernel takes to run or to check the memory usage, you can type nvprof ./<binary>
or cuda-memcheck ./<binary>
on the command line, respectively.
This project is licensed under the MIT License - see the LICENSE file for details.