JIT optimization: Faster generation of an unique funcName #3040

willyborn · 2020-11-02T17:47:14Z

The performance of JIT commands are improved up to 2x, dependent on the length of the JIT tree.
The difference is more pronounced for cached JIT kernels.

Description

The unique funcName is used to identify a JIT tree combination. This name is generated, even when the corresponding kernel is cached.
The usage of std::string and eliminating of unnecessary formatting is a few times faster than the std::streamstring (which is very slow at construction).

Since some formatting is changed, the resulting KER number will be different resulting in a once recompilation of the kernel and a new cache file.

Is this a new feature or a bug fix? NO, Performance improvement
More detail if necessary to describe all commits in pull request: Each Node type has its virtual function to generate this part of the name, so they are all updated.
Why these changes are necessary: With the improved GPU speed, the bottleneck is shifting to the CPU. The GPU utilization increases.
Potential impact on specific hardware, software or backends: The faster the connected GPU, the bigger the speed improvement.
New functions and their functionality: None
Future changes not implemented in this PR: A new PR will be launched to improve the caching/hashing mechanism for all kernels.

Changes to Users

No changes, except on the clean-up of the caching directory.
Perhaps a note necessary in the documentation??

Checklist

Rebased on latest master
Code compiles
Tests pass

umar456

I can see why this would be faster. I suspect its the reserve function that improves the performance of this approach over the string stream approach. You will need to add the formatting of the IDs back because it will cause a naming conflict with some kernels.

umar456 · 2020-11-02T18:48:20Z

src/backend/common/jit/NaryNode.hpp

        for (int i = 0; i < m_num_children; i++) {
-            kerStream << std::setw(3) << std::setfill('0') << std::dec
-                      << ids.child_ids[i];
+            kerString += std::to_string(ids.child_ids[i]);


The zeros here are necessary to avoid naming conflicts.

umar456 · 2020-11-02T18:49:59Z

src/backend/common/jit/Node.cpp

-    std::stringstream funcName;
-    std::stringstream hashName;
+    std::string funcName;
+    funcName.reserve(512);


I am guessing this is the primary reason for the performance increase. I don't think its possible to do something similar with string stream.

src/backend/common/jit/BufferNodeBase.hpp

…ey never concatenate.

willyborn · 2020-11-02T22:04:27Z

umar, The primary function of this string, is to generate an unique name for this JIT combination. Padding with zeros or adding commas serve the same purpose, although the comma's are much faster because it is a static operation. All IDs (Numbers) are separated by a comma (inside a node) or underscore (start of node). - NaryNode has a comma separating all the IDs. (This line is not in your snippet) - BufferNodeBase is only 1 ID. All the Nodes are starting with an underline, also separating possible numbers there. Perhaps getNameStr could start or end with a number, I will add an extra separator here as well when concatenated with an ID, just to be sure though I did not encounter such a case in the current code. ------ I will check an alternative, if making the stringstream static will give the same improvement, since we do not have any construction then. I have read that the construction of stringstream (due to the locale) is the reason. BR, Willy

…

On Mon, 2 Nov 2020 at 20:16, Umar Arshad ***@***.***> wrote: ***@***.**** requested changes on this pull request. I can see why this would be faster. I suspect its the reserve function that improves the performance of this approach over the string stream approach. You will need to add the formatting of the IDs back because it will cause a naming conflict with some kernels. ------------------------------ In src/backend/common/jit/NaryNode.hpp <#3040 (comment)>: > for (int i = 0; i < m_num_children; i++) { - kerStream << std::setw(3) << std::setfill('0') << std::dec - << ids.child_ids[i]; + kerString += std::to_string(ids.child_ids[i]); The zeros here are necessary to avoid naming conflicts. ------------------------------ In src/backend/common/jit/Node.cpp <#3040 (comment)>: > @@ -41,26 +41,17 @@ int Node::getNodesMap(Node_map_t &node_map, vector<Node *> &full_nodes, std::string getFuncName(const vector<Node *> &output_nodes, const vector<Node *> &full_nodes, const vector<Node_ids> &full_ids, bool is_linear) { - std::stringstream funcName; - std::stringstream hashName; + std::string funcName; + funcName.reserve(512); I am guessing this is the primary reason for the performance increase. I don't think its possible to do something similar with string stream. ------------------------------ In src/backend/common/jit/BufferNodeBase.hpp <#3040 (comment)>: > const common::Node_ids &ids) const final { - kerStream << "_" << getNameStr(); - kerStream << std::setw(3) << std::setfill('0') << std::dec << ids.id - << std::dec; + kerString += '_'; The filled in zeros are required to avoid naming conflicts. It is used to distinguish between 1, 3 and 13. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3040 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQ2WGPGGVV7U2CYSDAQASETSN4AR5ANCNFSM4THYQPFQ> .

willyborn · 2020-11-03T17:09:15Z

Umar, Please find an overview of my findings attached in the spreadsheet. The string# version corresponds with the JIToverhead PR (3040) The generated unique function names are the following: - master: L_s__float000_00100000001_000000000002_002001002003 - string#: L_s_float,0_1,0,0,1_0,0,0,2_2,1,2,3 The same information is in both strings, only the IDs from the same node are separated by commas which avoids the risk that numbers concatenate. The advantage of the latter, is that the formatting is static and no calculation is wasted. I also included 2 versions based on stringstream. 1. have the stringstream as thread_local so that the construction happens only once. The stringstream is initialized before each function. 2. same as 1 + only static formatting. Both stringstream versions result in similar performance as the master. As a conclusion, I propose the string# version (with ID number) because this is the fastest and has the same reliability as the current solution. PS: The PR is updated to correspond with the string# version. BR, Willy Born On Mon, 2 Nov 2020 at 23:04, Sabine & Willy Born < sabine.willy.born@gmail.com> wrote:

…

umar, The primary function of this string, is to generate an unique name for this JIT combination. Padding with zeros or adding commas serve the same purpose, although the comma's are much faster because it is a static operation. All IDs (Numbers) are separated by a comma (inside a node) or underscore (start of node). - NaryNode has a comma separating all the IDs. (This line is not in your snippet) - BufferNodeBase is only 1 ID. All the Nodes are starting with an underline, also separating possible numbers there. Perhaps getNameStr could start or end with a number, I will add an extra separator here as well when concatenated with an ID, just to be sure though I did not encounter such a case in the current code. ------ I will check an alternative, if making the stringstream static will give the same improvement, since we do not have any construction then. I have read that the construction of stringstream (due to the locale) is the reason. BR, Willy On Mon, 2 Nov 2020 at 20:16, Umar Arshad ***@***.***> wrote: > ***@***.**** requested changes on this pull request. > > I can see why this would be faster. I suspect its the reserve function > that improves the performance of this approach over the string stream > approach. You will need to add the formatting of the IDs back because it > will cause a naming conflict with some kernels. > ------------------------------ > > In src/backend/common/jit/NaryNode.hpp > <#3040 (comment)>: > > > for (int i = 0; i < m_num_children; i++) { > - kerStream << std::setw(3) << std::setfill('0') << std::dec > - << ids.child_ids[i]; > + kerString += std::to_string(ids.child_ids[i]); > > The zeros here are necessary to avoid naming conflicts. > ------------------------------ > > In src/backend/common/jit/Node.cpp > <#3040 (comment)>: > > > @@ -41,26 +41,17 @@ int Node::getNodesMap(Node_map_t &node_map, vector<Node *> &full_nodes, > std::string getFuncName(const vector<Node *> &output_nodes, > const vector<Node *> &full_nodes, > const vector<Node_ids> &full_ids, bool is_linear) { > - std::stringstream funcName; > - std::stringstream hashName; > + std::string funcName; > + funcName.reserve(512); > > I am guessing this is the primary reason for the performance increase. I > don't think its possible to do something similar with string stream. > ------------------------------ > > In src/backend/common/jit/BufferNodeBase.hpp > <#3040 (comment)>: > > > const common::Node_ids &ids) const final { > - kerStream << "_" << getNameStr(); > - kerStream << std::setw(3) << std::setfill('0') << std::dec << ids.id > - << std::dec; > + kerString += '_'; > > The filled in zeros are required to avoid naming conflicts. It is used to > distinguish between 1, 3 and 13. > > — > You are receiving this because you authored the thread. > Reply to this email directly, view it on GitHub > <#3040 (review)>, > or unsubscribe > <https://github.com/notifications/unsubscribe-auth/AQ2WGPGGVV7U2CYSDAQASETSN4AR5ANCNFSM4THYQPFQ> > . >

Add a separator between names of multiple output nodes.

umar456

That sounds reasonable. I had missed the comma in the first run through your code. Do you still have your performance benchmarks for this change?

willyborn · 2020-11-03T22:47:13Z

The spreadsheet, which was attached to the mail. [JIToverhead.xlsx](https://github.com/arrayfire/arrayfire/files/5484525/JIToverhead.xlsx)

…

On Tue, 3 Nov 2020 at 21:38, Umar Arshad ***@***.***> wrote: ***@***.**** approved this pull request. That sounds reasonable. I had missed the comma in the first run through your code. Do you still have your performance benchmarks for this change? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#3040 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AQ2WGPEQX4NFJKW2NOPODMTSOBS6FANCNFSM4THYQPFQ> .

umar456 · 2020-11-04T15:15:05Z

This is great! Thanks for your contribution.

9prady9 · 2020-11-05T10:31:25Z

I am little late to the party, simple but efficient improvement, thanks @willyborn

…3040) Use strings instead of stringstream to generate funcNames for JIT kernels. * JIT optimization: Faster generation of an unique funcName * Extra separator between returned names and IDs, to be certain that they never concatenate. * Added separator for output nodes * For improved performance: Use the operation ID iso operation string. Add a separator between names of multiple output nodes. (cherry picked from commit d0645fe)

Use strings instead of stringstream to generate funcNames for JIT kernels. * JIT optimization: Faster generation of an unique funcName * Extra separator between returned names and IDs, to be certain that they never concatenate. * Added separator for output nodes * For improved performance: Use the operation ID iso operation string. Add a separator between names of multiple output nodes. (cherry picked from commit d0645fe)

JIT optimization: Faster generation of an unique funcName

ce92a67

willyborn force-pushed the JIToverhead branch from ca3f515 to ce92a67 Compare November 2, 2020 18:08

umar456 requested changes Nov 2, 2020

View reviewed changes

Extra separator between returned names and IDs, to be certain that th…

5adfbc0

…ey never concatenate.

Added separator for output nodes

1dd18ff

For improved performance: Use the operation ID iso operation string.

6709626

Add a separator between names of multiple output nodes.

umar456 approved these changes Nov 3, 2020

View reviewed changes

umar456 merged commit d0645fe into arrayfire:master Nov 4, 2020

willyborn deleted the JIToverhead branch September 29, 2022 13:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

JIT optimization: Faster generation of an unique funcName #3040

JIT optimization: Faster generation of an unique funcName #3040

Uh oh!

willyborn commented Nov 2, 2020 •

edited

Loading

Uh oh!

umar456 left a comment

Uh oh!

umar456 Nov 2, 2020

Uh oh!

umar456 Nov 2, 2020

Uh oh!

Uh oh!

willyborn commented Nov 2, 2020 via email

Uh oh!

willyborn commented Nov 3, 2020 via email

Uh oh!

umar456 left a comment

Uh oh!

willyborn commented Nov 3, 2020 via email •

edited

Loading

Uh oh!

umar456 commented Nov 4, 2020

Uh oh!

9prady9 commented Nov 5, 2020

Uh oh!

Uh oh!

JIT optimization: Faster generation of an unique funcName #3040

JIT optimization: Faster generation of an unique funcName #3040

Uh oh!

Conversation

willyborn commented Nov 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes to Users

Checklist

Uh oh!

umar456 left a comment

Choose a reason for hiding this comment

Uh oh!

umar456 Nov 2, 2020

Choose a reason for hiding this comment

Uh oh!

umar456 Nov 2, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

willyborn commented Nov 2, 2020 via email

Uh oh!

willyborn commented Nov 3, 2020 via email

Uh oh!

umar456 left a comment

Choose a reason for hiding this comment

Uh oh!

willyborn commented Nov 3, 2020 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

umar456 commented Nov 4, 2020

Uh oh!

9prady9 commented Nov 5, 2020

Uh oh!

Uh oh!

willyborn commented Nov 2, 2020 •

edited

Loading

willyborn commented Nov 3, 2020 via email •

edited

Loading