Skip to content

JIT optimization: Faster generation of an unique funcName #3040

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 4, 2020

Conversation

willyborn
Copy link
Contributor

@willyborn willyborn commented Nov 2, 2020

The performance of JIT commands are improved up to 2x, dependent on the length of the JIT tree.
The difference is more pronounced for cached JIT kernels.

Description

The unique funcName is used to identify a JIT tree combination. This name is generated, even when the corresponding kernel is cached.
The usage of std::string and eliminating of unnecessary formatting is a few times faster than the std::streamstring (which is very slow at construction).

Since some formatting is changed, the resulting KER number will be different resulting in a once recompilation of the kernel and a new cache file.

  • Is this a new feature or a bug fix? NO, Performance improvement
  • More detail if necessary to describe all commits in pull request: Each Node type has its virtual function to generate this part of the name, so they are all updated.
  • Why these changes are necessary: With the improved GPU speed, the bottleneck is shifting to the CPU. The GPU utilization increases.
  • Potential impact on specific hardware, software or backends: The faster the connected GPU, the bigger the speed improvement.
  • New functions and their functionality: None
  • Future changes not implemented in this PR: A new PR will be launched to improve the caching/hashing mechanism for all kernels.

Changes to Users

No changes, except on the clean-up of the caching directory.
Perhaps a note necessary in the documentation??

Checklist

  • Rebased on latest master
  • Code compiles
  • Tests pass

Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see why this would be faster. I suspect its the reserve function that improves the performance of this approach over the string stream approach. You will need to add the formatting of the IDs back because it will cause a naming conflict with some kernels.

for (int i = 0; i < m_num_children; i++) {
kerStream << std::setw(3) << std::setfill('0') << std::dec
<< ids.child_ids[i];
kerString += std::to_string(ids.child_ids[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The zeros here are necessary to avoid naming conflicts.

std::stringstream funcName;
std::stringstream hashName;
std::string funcName;
funcName.reserve(512);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing this is the primary reason for the performance increase. I don't think its possible to do something similar with string stream.

@willyborn
Copy link
Contributor Author

willyborn commented Nov 2, 2020 via email

@willyborn
Copy link
Contributor Author

willyborn commented Nov 3, 2020 via email

Add a separator between names of multiple output nodes.
Copy link
Member

@umar456 umar456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds reasonable. I had missed the comma in the first run through your code. Do you still have your performance benchmarks for this change?

@willyborn
Copy link
Contributor Author

willyborn commented Nov 3, 2020 via email

@umar456 umar456 merged commit d0645fe into arrayfire:master Nov 4, 2020
@umar456
Copy link
Member

umar456 commented Nov 4, 2020

This is great! Thanks for your contribution.

@9prady9
Copy link
Member

9prady9 commented Nov 5, 2020

I am little late to the party, simple but efficient improvement, thanks @willyborn

9prady9 pushed a commit to 9prady9/arrayfire that referenced this pull request Aug 2, 2021
…3040)

Use strings instead of stringstream to generate funcNames for JIT kernels.
* JIT optimization: Faster generation of an unique funcName
* Extra separator between returned names and IDs, to be certain that they never concatenate.
* Added separator for output nodes
* For improved performance: Use the operation ID iso operation string.
Add a separator between names of multiple output nodes.

(cherry picked from commit d0645fe)
syurkevi pushed a commit that referenced this pull request Dec 28, 2021
Use strings instead of stringstream to generate funcNames for JIT kernels.
* JIT optimization: Faster generation of an unique funcName
* Extra separator between returned names and IDs, to be certain that they never concatenate.
* Added separator for output nodes
* For improved performance: Use the operation ID iso operation string.
Add a separator between names of multiple output nodes.

(cherry picked from commit d0645fe)
@willyborn willyborn deleted the JIToverhead branch September 29, 2022 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants