Skip to content

CL_DEVICE_HALF_FP_CONFIG returns CL_INVALID_VALUE #3068

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 18, 2021

Conversation

willyborn
Copy link
Contributor

CL_DEVICE_HALF_FP_CONFIG only exists when the extension cl_khr_fp16 is available.

Description

Instead of testing on the CONFIGURATION, it is better to directly test if the extensions (fp16 & fp64) are available so that no exceptions are thrown.

While updating the file, I added explicit conversions to eliminate compiler warnings, so that later real warnings jump in the eye.

  • Is this a new feature or a bug fix? bug
  • Why these changes are necessary: Without this correction, OCLGRIND HALTS reporting the error due to incompatibility with standard.
  • Potential impact on specific hardware, software or backends: Most older devices, that do not support fp16.
  • New functions and their functionality: None
  • Can this PR be backported to older versions? Yes
  • Future changes not implemented in this PR: None

Changes to Users

No changes to end-users.

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass
    - [ ] Functions added to unified API
    - [ ] Functions documented

return (dev.getInfo<CL_DEVICE_DOUBLE_FP_CONFIG>() > 0);
// 64bit fp is an optional extension
return (dev.getInfo<CL_DEVICE_EXTENSIONS>().find("cl_khr_fp64") !=
string::npos);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏾

Although I wonder if the vendor implementation caches the device type support in internal cache. If cached in such manner, then fetching that compared to doing a string search is more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will get through the debugger step by step and report what I find.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression that it is straight copy into a vector.
For NVIDIA

  • Total length of vector: 453 chars
  • cl_khr_fp16 NA
  • cl_khr_fp64 at pos 138
    For AMD
  • Total length of vector: 661 chars
  • cl_khr_fp16 NA
  • cl_khr_fp64 at pos 0

The complete string is cached in the driver. Copy happens in cl2.hpp at line 1427.
Although I can not see inside the function, I come to this conclusion because the exact length is provided upfront during the construction of the vector.

@9prady9 9prady9 requested a review from umar456 December 18, 2020 11:14
16fp and 64fp are optional extensions to OpenCL.  The CONFIG's only exists when the extension is available.  It is therefore better to check the availability of the extension, so that no errors are thrown (and have to treated).

+ Cleanup of compiler warnings.
@umar456 umar456 merged commit 0d0826f into arrayfire:master Feb 18, 2021
@willyborn willyborn deleted the HALF_FP_CONFIG branch September 29, 2022 12:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants