-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Minimal implementation of GpuMatND #19259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
54ad47d
to
d4862b6
Compare
675aece
to
0e2b741
Compare
Changed the behavior of:
These functions now return Previously these created a header for
Therefore, it seems reasonable to add these two methods for creating a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this!
Please take a look on the comments below.
element of the submatrix, and it can be different from data_->data. | ||
If this is not a submatrix, then data is always equal to data_->data. | ||
*/ | ||
uchar* data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps size_t offset
should be more suitable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uchar* data
is private now along with a new member size_t offset
. I have also added a public member function getDevicePtr()
to get the first byte.
DevicePtr(DevicePtr&&) = delete; | ||
DevicePtr& operator=(DevicePtr&&) = delete; | ||
|
||
uchar* data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DevicePtr
It make sense to move this class on upper level, make it more generic, and change its name:
GpuData
(likeUMatData
)- or
GpuBuffer
- or
GpuDataContainer
In the future, we can reuse/share this "container" with existed GpuMat
.
Please add size_t size
field (size of allocated buffer in bytes) to perform accurate validation checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GpuData
is a global struct now with a new member size_t size
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me in general. I added some comments for testing code in contrib. Manual test passed on Ubuntu 18.04 with NVIDIA GeForce 1080ti.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done!
@param _step Array of _size.size()-1 steps in case of a multi-dimensional array (the last step is always | ||
set to the element size). If not specified, the matrix is assumed to be continuous. | ||
*/ | ||
GpuMatND(SizeArray _size, int _type, void* _data, StepArray _step = StepArray()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@asmorkalov Does this ctor has some intersection with similar cv::Mat
ctor?
If so, then it make sense to "wrap" (or move) cv::Mat
instead of void*
(perhaps out of the scope of this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_data
is GPU memory, is not it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
upload
call handles the case with regular cv::Mat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main point here is to use named static function instead of constructor (to avoid confusions):
static GpuMatND wrapMemoryPtrGPU(...);
@param _type Array type. Use CV_8UC1, ..., CV_16FC4 to create 1-4 channel matrices, or | ||
CV_8UC(n), ..., CV_64FC(n) to create multi-channel (up to CV_CN_MAX channels) matrices. | ||
*/ | ||
GpuMatND(SizeArray _size, int _type); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_size
_type
No need to use underscores in declarations without implementation code. Lets keep docs and bindings clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do that in the next commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed it.
using StepArray = std::vector<size_t>; | ||
using IndexArray = std::vector<int>; | ||
|
||
~GpuMatND() = default; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= default
It makes sense to have implementation (empty) in .cpp file for that.
There are several non-trivial fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move the definition to the .cpp file to hide the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed it.
GpuMatND GpuMatND::clone() const | ||
{ | ||
CV_DbgAssert(!empty()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is not just return clone(Stream::Null());
?
(reduce duplicated code, DRY principle)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one with the empty parameter list calls synchronous CUDA APIs: cudaMemcpy
and cudaMemcpy2D
, whereas the overloaded function with one argument calls asynchronous CUDA APIs: cudaMemcpyAsync
and cudaMemcpyAsync2D
.
GpuMat::upload()
and GpuMat::download()
also have overloades for asynchronous CUDA API calls.
However, I agree that we should reduce duplicated code. So I suggest we need to keep the two overloads while reducing duplication as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have fixed it, but please look at this comment below.
///////////////////////////////////////////////////// | ||
/// clone | ||
|
||
static bool next(uchar*& d, const uchar*& s, std::vector<int>& idx, const int dims, const GpuMatND& dst, const GpuMatND& src) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function signature does not look so good, but I'd like to keep it as is for now. Making an internal iterator class that iterates each 2d plane seems a better design. I'll make that on the next PR when I implement the convertTo
, copyTo
, and setTo
member functions of GpuMatND
. A straightforward implementation would be to iterator each 2d plane of GpuMatND
and apply GpuMat
counterparts. The iterator class would also fit well for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for update!
modules/core/src/cuda/gpu_mat_nd.cu
Outdated
do | ||
{ | ||
CV_CUDEV_SAFE_CALL( | ||
cudaMemcpy2D( | ||
d, ret.step[dims-2], s, step[dims-2], | ||
size[dims-1]*step[dims-1], size[dims-2], cudaMemcpyDeviceToDevice) | ||
); | ||
} | ||
while (next(d, s, idx, dims, ret, *this)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you check performance of this loop?
If this a synchronized call, then the sequence of several synchronized calls may show bad performance.
I believe this scheme may perform better:
- schedule async tasks
- wait for completion
(code block behaves as synchronized)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed cudaMemcpy2D
to cudaMemcpy2DAsync
and added cudaStreamSynchronize(0)
to wait for completion.
Minimal implementation of GpuMatND * GpuMatND - minimal implementation * GpuMatND - createGpuMatHeader * GpuMatND - GpuData, offset, getDevicePtr(), license * reviews * reviews
For the first step to resolving #15897 and #16433, this PR tries to implement a minimal set of functions and entities that a
GpuMatND
class should have.This PR includes:
clone()
for deep copying, with the resulting array being always continuousMat
GpuMat
Tests are in opencv/opencv_contrib#2805
The following leak check was performed with the test.
I hope this PR could be accepted as a minimal implementation.
Further things that should be added include:
GpuMatND
to the list of classes handled by the proxy classes(InputArray
andOutputArray
)cuda::Stream
copyTo
,convertTo
,setTo
, as inGpuMat
I'm guessing that these can be made as a separate PR.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.