Skip to content

Merge kernel caching logic for CUDA and OpenCL backends #2873

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 12, 2020

Conversation

9prady9
Copy link
Member

@9prady9 9prady9 commented May 5, 2020

Partially addresses tasks from #2857

@9prady9 9prady9 added this to the 3.7.2 milestone May 5, 2020
@9prady9 9prady9 requested a review from umar456 May 5, 2020 18:31
@9prady9 9prady9 marked this pull request as ready for review May 6, 2020 12:58
@9prady9 9prady9 force-pushed the ocl_kernel_cache_improv branch from cfce703 to b95f71c Compare May 11, 2020 06:58
Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Excellent documentation of the new functions. Can't wait to get rid of the old OpenCL kernel lookups.

* Moved common code required by CUDA and OpenCL caching algorithm into
  kernel_cache.[hpp|cpp]

* Added common/compile_kernel.hpp header that defines the signature
  of the function, compileKernel, that each backend has to implement.

* Each backend has to implement/satisfy the following requirements:

  - Provide compile_kernel.cpp source with compileKernel function that is
    used by common::findKernel

  - Provide Kernel.hpp/cpp that implements KernelInterface from
    common/KernelInterface.hpp

  - Kernel.hpp also provides a functor than helps launch backend kernels.

* Moved kernel utility helpers into separate header(s)/source:

  - TemplateArg.hpp/cpp contains the TemplateArg struct and some helper macros
    to convert template arguments to strings.

  - TemplateTypename.hpp contains the templated TemplateTypename struct that
    helps to convert backend kernel paramters to TemplateArg object.

* Refactored all CUDA kernels to use the new caching API

* Refactored only transpose, morph and canny to use new caching API from OpenCL

* Reduced lot of unnecessary instantiations for morphological functions
@9prady9 9prady9 force-pushed the ocl_kernel_cache_improv branch from b95f71c to dd9d3af Compare May 12, 2020 08:01
@9prady9 9prady9 requested a review from umar456 May 12, 2020 10:26
@9prady9 9prady9 dismissed umar456’s stale review May 12, 2020 10:27

Addressed feedback

@9prady9 9prady9 merged commit 54b7031 into arrayfire:master May 12, 2020
@9prady9 9prady9 deleted the ocl_kernel_cache_improv branch May 12, 2020 17:21
@umar456 umar456 mentioned this pull request Jun 27, 2020
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants