Skip to content

Dilated convolve2 and backward gradients #2359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Aug 29, 2019

Conversation

syurkevi
Copy link
Contributor

Adds dilation to forward convolve2 with cudnn integration.
Also adds functions to obtain convolve2 backwards gradients with respect to the filter or input data.

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick run of the code. I am suspicious about some of the operations on the cpu backend and the number of times reorder is called. I think we can do something more efficient there. Is there a MKL library that does this operation?

Copy link
Member

@9prady9 9prady9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at the style, code flow in general. Haven't looked at the kernels themselves. I will give it another pass once soon.

@9prady9
Copy link
Member

9prady9 commented Nov 29, 2018

also, rebase your branch.

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made another pass. Need to stop off of this for now but I will give it another go soon.

const T* iptr = iptr_ + col * istrides[d];

// Calculate input window index
dim_t winy = (col / nx);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe increment winy after nx iterations instead of performing a division. same for winx.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that would bring in a branch statement based on value of nx. Do you think it would be faster ?


// Compute output index local to column
const int outIdx = IS_COLUMN ?
(i * get_local_size(0) + get_local_id(0)) :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just increment i with local size and initialize it with local_id

@syurkevi syurkevi force-pushed the dilated_conv2 branch 2 times, most recently from 3e4afb4 to ccad15f Compare January 15, 2019 20:42
@syurkevi syurkevi assigned syurkevi and umar456 and unassigned syurkevi Jan 16, 2019
@9prady9 9prady9 force-pushed the dilated_conv2 branch 2 times, most recently from 71783d3 to c24c314 Compare March 4, 2019 13:16
@9prady9 9prady9 requested review from umar456 and 9prady9 March 4, 2019 13:17
@9prady9 9prady9 dismissed stale reviews from umar456 and themself March 4, 2019 13:17

Rebased from master and minor tweaks

umar456
umar456 previously approved these changes Aug 28, 2019
@umar456 umar456 dismissed 9prady9’s stale review August 28, 2019 14:39

Addressed changes

Copy link
Contributor

@WilliamTambellini WilliamTambellini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just questions

@@ -12,6 +12,7 @@ dependency_check(CUDA_FOUND "CUDA not found.")

find_cuda_helper_libs(nvrtc)
find_cuda_helper_libs(nvrtc-builtins)
find_cuda_helper_libs(cudnn)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any way to specify a minimum version here ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately not with this interface. This is only possible if we make a findcudnn package. We should think about adding that optionally because there are easy ways to get around this dependenciy. It would make the build processes simpler for those who are not interested in a 300 MB dependency.

@umar456 umar456 merged commit dae7c16 into arrayfire:master Aug 29, 2019
@WilliamTambellini
Copy link
Contributor

Could you just confirm that after merging that one, we ll just need a local cudnn sdk (include+lib) in order to compile afcuda ?
https://developer.nvidia.com/rdp/cudnn-download

@umar456
Copy link
Member

umar456 commented Aug 29, 2019

Yes, you will need to have cuDNN installed on the system. Currently we are installing it in the cuda directly inline. If there are users who which to specify the location of cuDNN please make an issue and we can discuss.

@WilliamTambellini
Copy link
Contributor

On my side thats ok to install it in /usr/local/cuda.
Does it link with static or dyn cudnn ?
Have you updated the packaging script(s) in order to embed libcudnn.so in the af installer ?

@umar456
Copy link
Member

umar456 commented Aug 30, 2019

Not with this commit. It should be done in the next couple of days. We are dynamically linking against cudnn although I don't think it is necessary. We should look into static linking.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants