-
Notifications
You must be signed in to change notification settings - Fork 545
Dilated convolve2 and backward gradients #2359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did a quick run of the code. I am suspicious about some of the operations on the cpu backend and the number of times reorder is called. I think we can do something more efficient there. Is there a MKL library that does this operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at the style, code flow in general. Haven't looked at the kernels themselves. I will give it another pass once soon.
also, rebase your branch. |
7fafa72
to
27f75f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made another pass. Need to stop off of this for now but I will give it another go soon.
const T* iptr = iptr_ + col * istrides[d]; | ||
|
||
// Calculate input window index | ||
dim_t winy = (col / nx); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe increment winy after nx iterations instead of performing a division. same for winx.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would bring in a branch statement based on value of nx
. Do you think it would be faster ?
|
||
// Compute output index local to column | ||
const int outIdx = IS_COLUMN ? | ||
(i * get_local_size(0) + get_local_id(0)) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just increment i with local size and initialize it with local_id
3e4afb4
to
ccad15f
Compare
71783d3
to
c24c314
Compare
Rebased from master and minor tweaks
The unique_handle move constructors were failing on Windows because the move constructor didn't zero the other handle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just questions
@@ -12,6 +12,7 @@ dependency_check(CUDA_FOUND "CUDA not found.") | |||
|
|||
find_cuda_helper_libs(nvrtc) | |||
find_cuda_helper_libs(nvrtc-builtins) | |||
find_cuda_helper_libs(cudnn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any way to specify a minimum version here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not with this interface. This is only possible if we make a findcudnn package. We should think about adding that optionally because there are easy ways to get around this dependenciy. It would make the build processes simpler for those who are not interested in a 300 MB dependency.
Could you just confirm that after merging that one, we ll just need a local cudnn sdk (include+lib) in order to compile afcuda ? |
Yes, you will need to have cuDNN installed on the system. Currently we are installing it in the cuda directly inline. If there are users who which to specify the location of cuDNN please make an issue and we can discuss. |
On my side thats ok to install it in /usr/local/cuda. |
Not with this commit. It should be done in the next couple of days. We are dynamically linking against cudnn although I don't think it is necessary. We should look into static linking. |
Adds dilation to forward convolve2 with cudnn integration.
Also adds functions to obtain convolve2 backwards gradients with respect to the filter or input data.