Skip to content

Fixes sub array (opencl, cuda, cpu, oneapi) for orb #3670

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

willyborn
Copy link
Contributor

@willyborn willyborn commented Jun 22, 2025

Using sub-arrays in the orb function results in undefined behaviour in the opencl, cpu and cuda backend.

Description

Using sub-arrays for input as filters generates random behaviour.

The orb function uses following sub functions, which needed individual sub-array testing (and corrections if needed):

resize(): Works as expected. A final PR will contain all extended test scripts that did not need corrections.

28bd6b0: Adds test helpers for temporary array formats (JIT, SUB, ...)
Adds the the function toTempFormat to the test helpers. This function generates the different temporary array formats (JIT array, SUB-array, Linear array, ...) used by arrayfire.

07b089f: Increased difficulty of sub-array testing
Some random behaviour persisted so the test conditions are harder now for sub-arrays
- When using without the offset, the full array will be random data
- When reallocating a data buffer based on dim[0]*dim[1]... size, while reusing the info structure, the buffer will be accessed outside boundaries (hoping this generates a segmentation fault)

1a4358c: Fixes sub-array (cuda, opencl) support for harris
- Linear temporary buffers copied the info struct from in, while they are linear and in is strided.
- second_order_deriv kernel was called with blocksize based on #elements of parent of in, iso in (dim[3]*strides[3])
- many memory allocations were based on #elements of parent of in, iso in (dim[3] * strides[3])

1fd4bfd: Fixes sub-array (cpu, cuda, opencl) support for fast
cuda & cpu:
- in idx(), test_pixel(), the elements of in were directly (linear) accessed without using strides
opencl:
- in load_shared_image() ,the elements of in were directly (linear) accessed without using strides and offset

2e72a4d: Fixes sub-array (cpu, cuda, opencl, oneapi) support for convolve
- Kernel object is extended with 2D and 3D memory copy functions
cuda:
- in conv2Helper(), the elements of in were directly (linear) accessed without using strides
- In conv2Helper(), the strided 2D memcopy function is added when needed
- in convolve_3d, convolve2(), the linear memcopy is replaced with strided 3D memcopy function
opencl:
- in conv1(), conv2(), conv3() the linear memcopy is grouped into convNHelper<> function (same structure as in cuda, ...)
- in convNHelper<> the linear memcopy is replaced with strided one
- In conv2Helper(), convSep() the strided 2D memcopy function is added when needed
cpu:
- in convolve2_separable(), the elements of in were directly (linear) accessed without using strides
oneapi:
- local defined linear memcopy function (included also conversion) is replaced by the strided memory function defined in kernel/memcopy.hpp. This strided function is also optimized for linear copies and performs only copy without conversions. (improved speed)
- in conv1(), conv2(), conv3(), convSep() linear copy replace with strided one

ad1413f: Fixes sub-array (cpu, cuda, opencl, oneapi) support for orb
- in harris_response(), controid_angle(), get_pixel() the elements of in were directly (linear) accessed without using strides
- Adds the offset to the cl code files
- in orb (opencl) the info struct was reused, while the buffer allocation was linear

Is this a new feature or a bug fix?
BUG
Why these changes are necessary.
PREVIOUSLY THE RESULT IS UNDEFINED FOR ABOVE SITUATION
Potential impact on specific hardware, software or backends.
ALL platforms (ONEAPI is not supporting orb)
New functions and their functionality.
NO
Can this PR be backported to older versions?
NO
Future changes not implemented in this PR.
NONE, although the function toTempFormat will be used in following PRs.

Changes to Users

Quality improvement

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass
  • Functions added to unified API
  • Functions documented

@willyborn
Copy link
Contributor Author

I still get rarely an error, dependent of previous runs.
I suspect that convolve is the culprit, so you can expect shortly an update on this PR.

@willyborn willyborn force-pushed the Fixes-sub-array-(opencl,-cuda,-cpu)-for-orb branch from 27d5b77 to ad1413f Compare June 28, 2025 09:45
@willyborn willyborn changed the title Fixes sub array (opencl, cuda, cpu) for orb Fixes sub array (opencl, cuda, cpu, oneapi) for orb Jun 28, 2025
@willyborn
Copy link
Contributor Author

The random behaviour disappeared in this new update. In order to find the cause, I had to make the testing more rigorous, resulting in a update of commit 28bd6b0. All tests are rerun, and previous PR's don't have any problem with the new situation.
The text of the PR is updated with comments on all the resulting commits.

@willyborn willyborn force-pushed the Fixes-sub-array-(opencl,-cuda,-cpu)-for-orb branch from ad1413f to 73b9dc9 Compare August 5, 2025 15:25
@willyborn
Copy link
Contributor Author

commit 07b089f "Increased difficulty of sub-array testing" is removed from this PR, since it is already merged with PR#3679.
All other commits are unchanged.

@willyborn willyborn closed this Aug 5, 2025
@willyborn willyborn reopened this Aug 5, 2025
@willyborn
Copy link
Contributor Author

Wrongly closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant