Use static CUDA upstream libs on Unix #2785

9prady9 · 2020-03-09T06:50:06Z

Use static cufft, cublas, cusolver and cusolver on Unix. Dynamic libraries are used on Windows. (Check commit message for details)
Removed cuda_thrust_sort_by_key static dependency (Check commit message for details)

WilliamTambellini

This is a big change and looks like there is a way to split in 2 PRs (at least):
1- link with static nvidia libs
2- other changes
It is risky not to split big PR and some of the issues I ve suffered from in the last 3 years in AF were due to big PRs.
Would you mind splitting that one in 2 ?

9prady9 · 2020-03-09T16:53:46Z

Except the following commits

Renamed cu to cpp where possible and cleanup CUDA fast/orb
move CUDA kernels to runtime(nvrtc) compilation where ever possible

rest of them are actually fixes to enable linking with static cuda libs - it also fixes CUDA separable compilation in ArrayFire(broken right now) which is needed for the former to work successfully.

The reason I kept them separate is those changes are independent enough to be a commit by themselves - change doesn't break non-separable compilation and tests. Each of the commits compile and pass tests on my local Linux and Windows box. Also, each commit is separated based on the type of change so that reviewing is easy. Moving them into different branches and there by other PRs doesn't add any value in this case I think.

Note: Except the nvrtc related commit, rest of them are largely CMake related changes barring the LUT commit (it is a minor change compared to nvrtc).

umar456

I made a few comments but this is a really large PR. I would really appreciate it if you created separate PRs so we can discuss your changes individually.

src/backend/cuda/culibs/CMakeLists.txt

src/backend/cuda/dims_param.hpp

src/backend/cuda/culibs/CMakeLists.txt

src/backend/cuda/diff.cpp

src/backend/cuda/kernel/memcopy.hpp

src/backend/cuda/kernel/scan_by_key/CMakeLists.txt

src/backend/cuda/culibs/CMakeLists.txt

src/backend/cuda/CMakeLists.txt

src/backend/cuda/dims_param.hpp

src/backend/cuda/kernel/fast_lut.hpp

Addressed feedback

Instead of creating a static library out of all separate instantiations of thrust_sort_by_key sources, we now directly embed sources generated(using cmake's configure_file command) into afcuda target. This also fixed separable compilation. Prior to this change, separate compilation failed (related to cuda device linking - undefined references). I tried to fix that problem, but couldn't get a break through. However, I realized that just directly using the generated sources with afcuda target will do the job without any additional static library.

thrust::stable_sort_by_key has known issue with device linking. The code crashes with cudaInvalidValueError. It works as expected without any changes with or without separable compilation otherwise. https://github.com/thrust/thrust/wiki/Debugging#known-issues https://github.com/thrust/thrust/blob/master/doc/changelog.md#known-issues-2 The above documents mention a known issue with device linking and thrust. Although the documents say it happens in debug mode(with -G flag), I noticed similar crashes in release configuration too in ArrayFire. Due to the above issue, I have separated out the relevant source files (fft,blas,sparse and solver) which require device linking into separate static library. Once separated into a separate static library, sort_by_key and all the other unit tests that use it are running as expected without any crashes.

daniel-dsouza · 2020-07-23T15:11:48Z

Is there still a way to dynamically link to CUDA? The uncompressed libafcuda.so.3.7.2 is almost a gigabyte in size!

9prady9 · 2020-07-23T16:21:24Z

@daniel-dsouza The 1GB size of binary is not due to static linking of dynamic libs. It is rather due to compiling the library with multiple compute versions starting from 30 to 75 of CUDA cards. For our website binaries to work optimally on all CUDA cards, we build with all compute versions, hence increasing the binary size. For example, for a single compute version, the CUDA backend will end up being approx 450MB. Having said that, we are aware of the issue with such huge size and we are constantly working on reducing it. As part of that effort, we moved most of the kernels for runtime compilation instead of offline compilation using nvcc. This brought down CUDA backend binary a good bit. We have more work in the pipeline towards the same goal for future releases.

daniel-dsouza · 2020-07-23T19:50:32Z

@9prady9 Thank you for explaining this to me. Currently I am building from compute version 5 and higher, so I can look into narrowing that down to my specific hardware.

9prady9 added build CUDA dependency internal labels Mar 9, 2020

9prady9 requested a review from umar456 March 9, 2020 06:50

9prady9 added this to the v3.7.1 milestone Mar 9, 2020

WilliamTambellini suggested changes Mar 9, 2020

View reviewed changes

umar456 previously requested changes Mar 10, 2020

View reviewed changes

umar456 reviewed Mar 11, 2020

View reviewed changes

9prady9 force-pushed the use_static_culibs branch from 153504e to 8411f63 Compare March 14, 2020 09:29

9prady9 requested a review from umar456 March 14, 2020 09:30

9prady9 changed the title ~~Use static CUDA upstream libs and other changes to reduce global constant memory usage~~ Use static CUDA upstream libs on Unix Mar 14, 2020

9prady9 force-pushed the use_static_culibs branch from 8411f63 to 2512d56 Compare March 24, 2020 06:02

umar456 previously approved these changes Mar 24, 2020

View reviewed changes

9prady9 dismissed umar456’s stale review via c216b02 March 25, 2020 06:32

9prady9 force-pushed the use_static_culibs branch 3 times, most recently from 055130e to 700e420 Compare March 25, 2020 12:54

umar456 mentioned this pull request Mar 25, 2020

Arrayfire Static Library #313

Open

9prady9 force-pushed the use_static_culibs branch from 700e420 to a578e64 Compare March 26, 2020 06:15

9prady9 modified the milestones: 3.7.1, 3.7.2 Mar 26, 2020

9prady9 force-pushed the use_static_culibs branch from a578e64 to fbe662f Compare March 30, 2020 05:42

9prady9 requested a review from umar456 March 30, 2020 09:01

umar456 approved these changes Apr 1, 2020

View reviewed changes

9prady9 merged commit 08296d6 into arrayfire:master Apr 1, 2020

9prady9 deleted the use_static_culibs branch April 1, 2020 04:08

WilliamTambellini mentioned this pull request Apr 13, 2020

Add an option to link with the cudnn static lib flashlight/flashlight#98

Closed

9prady9 mentioned this pull request May 6, 2020

Add an ArrayFire conanfile.py that pulls from the linux binary installer #2875

Merged

umar456 mentioned this pull request Jun 27, 2020

Backport changes to 3.7 #2949

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Use static CUDA upstream libs on Unix #2785

Use static CUDA upstream libs on Unix #2785

Uh oh!

9prady9 commented Mar 9, 2020 •

edited

Loading

Uh oh!

WilliamTambellini left a comment

Uh oh!

9prady9 commented Mar 9, 2020

Uh oh!

umar456 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniel-dsouza commented Jul 23, 2020

Uh oh!

9prady9 commented Jul 23, 2020

Uh oh!

daniel-dsouza commented Jul 23, 2020

Uh oh!

Uh oh!

Use static CUDA upstream libs on Unix #2785

Use static CUDA upstream libs on Unix #2785

Uh oh!

Conversation

9prady9 commented Mar 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WilliamTambellini left a comment

Choose a reason for hiding this comment

Uh oh!

9prady9 commented Mar 9, 2020

Uh oh!

umar456 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

daniel-dsouza commented Jul 23, 2020

Uh oh!

9prady9 commented Jul 23, 2020

Uh oh!

daniel-dsouza commented Jul 23, 2020

Uh oh!

Uh oh!

9prady9 commented Mar 9, 2020 •

edited

Loading