Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage #2791

9prady9 · 2020-03-12T10:14:19Z

There is negligible to no difference between texture based look-up table and constant memory look-up table. Hence, shifted to texture memory look-up table to reduce global constant memory usage.

src/backend/cuda/texture.hpp

umar456 · 2020-03-13T16:43:40Z

src/backend/cuda/texture.hpp

+class LookupTable1D {
+   public:
+    LookupTable1D()                          = delete;
+    LookupTable1D(const LookupTable1D& arg)  = delete;


It makes sense to have a copy constructor for this right? You can copy a texture if you need it.

Then we have to take care of how texture object is copied also. Just handle copy won't work. Move operation wont have issues though. We don't need neither for these use cases.

cuda::kernel::locate_features is the CUDA kernel that uses the fast lookup table. Shared below is performance of the kernel using constant memory vs texture memory. There is neglible to no difference between two versions. Hence, shifted to texture memory LUT to reduce global constant memory usage. Performance using constant memory LUT ------------------------------------- Time(%) Time Calls Avg Min Max Name 1.48% 101.09us 3 33.696us 32.385us 34.976us void cuda::kernel::locate_features<float, int=9> 1.34% 91.713us 2 45.856us 45.792us 45.921us void cuda::kernel::locate_features<double, int=9> 1.02% 69.505us 2 34.752us 34.400us 35.105us void cuda::kernel::locate_features<unsigned int, int=9> 0.99% 67.456us 2 33.728us 32.768us 34.688us void cuda::kernel::locate_features<int, int=9> 0.95% 65.186us 2 32.593us 31.201us 33.985us void cuda::kernel::locate_features<short, int=9> 0.93% 63.874us 2 31.937us 30.817us 33.057us void cuda::kernel::locate_features<unsigned short, int=9> Performance using texture LUT ----------------------------- Time(%) Time Calls Avg Min Max Name 1.45% 99.776us 3 33.258us 32.896us 33.504us void cuda::kernel::locate_features<float, int=9> 1.33% 91.105us 2 45.552us 44.961us 46.144us void cuda::kernel::locate_features<double, int=9> 1.02% 70.017us 2 35.008us 34.273us 35.744us void cuda::kernel::locate_features<unsigned int, int=9> 0.97% 66.689us 2 33.344us 32.065us 34.624us void cuda::kernel::locate_features<int, int=9> 0.95% 65.249us 2 32.624us 31.585us 33.664us void cuda::kernel::locate_features<short, int=9> 0.95% 65.025us 2 32.512us 30.945us 34.080us void cuda::kernel::locate_features<unsigned short, int=9>

cuda::kernel::extract_orb is the CUDA kernel that uses the orb lookup table. Shared below is performance of the kernel using constant memory vs texture memory. There is neglible to no difference between two versions. Hence, shifted to texture memory LUT to reduce global constant memory usage. Performance using constant memory LUT ------------------------------------- Time(%) Time Calls Avg Min Max Name 3.02% 292.26us 24 12.177us 11.360us 14.528us void cuda::kernel::extract_orb<float> 2.16% 209.00us 16 13.062us 11.616us 16.033us void cuda::kernel::extract_orb<double> Performance using texture LUT ----------------------------- Time(%) Time Calls Avg Min Max Name 2.84% 270.63us 24 11.276us 9.6970us 15.040us void cuda::kernel::extract_orb<float> 2.20% 209.28us 16 13.080us 10.688us 16.960us void cuda::kernel::extract_orb<double>

9prady9 added CUDA internal labels Mar 12, 2020

9prady9 added this to the v3.7.1 milestone Mar 12, 2020

9prady9 requested a review from umar456 March 12, 2020 10:14

9prady9 force-pushed the lut_move branch 2 times, most recently from afc0ff8 to e0a0307 Compare March 13, 2020 16:11

umar456 reviewed Mar 13, 2020

View reviewed changes

9prady9 added 2 commits March 13, 2020 23:30

9prady9 force-pushed the lut_move branch from e0a0307 to bdd3548 Compare March 13, 2020 18:05

9prady9 requested a review from umar456 March 13, 2020 18:07

umar456 approved these changes Mar 14, 2020

View reviewed changes

9prady9 merged commit 0d61c6f into arrayfire:master Mar 14, 2020

9prady9 deleted the lut_move branch March 14, 2020 04:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage #2791

Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage #2791

Uh oh!

9prady9 commented Mar 12, 2020

Uh oh!

Uh oh!

umar456 Mar 13, 2020

Uh oh!

9prady9 Mar 13, 2020

Uh oh!

Uh oh!

Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage #2791

Move Lookup tables in CUDA backend to texture memory to reduce global constant memory usage #2791

Uh oh!

Conversation

9prady9 commented Mar 12, 2020

Uh oh!

Uh oh!

umar456 Mar 13, 2020

Choose a reason for hiding this comment

Uh oh!

9prady9 Mar 13, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!