-
Notifications
You must be signed in to change notification settings - Fork 548
JIT optimization: Faster generation of an unique funcName #3040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ca3f515
to
ce92a67
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can see why this would be faster. I suspect its the reserve function that improves the performance of this approach over the string stream approach. You will need to add the formatting of the IDs back because it will cause a naming conflict with some kernels.
for (int i = 0; i < m_num_children; i++) { | ||
kerStream << std::setw(3) << std::setfill('0') << std::dec | ||
<< ids.child_ids[i]; | ||
kerString += std::to_string(ids.child_ids[i]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The zeros here are necessary to avoid naming conflicts.
std::stringstream funcName; | ||
std::stringstream hashName; | ||
std::string funcName; | ||
funcName.reserve(512); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am guessing this is the primary reason for the performance increase. I don't think its possible to do something similar with string stream.
…ey never concatenate.
umar,
The primary function of this string, is to generate an unique name for this
JIT combination. Padding with zeros or adding commas serve the same
purpose, although the comma's are much faster because it is a static
operation.
All IDs (Numbers) are separated by a comma (inside a node) or underscore
(start of node).
- NaryNode has a comma separating all the IDs. (This line is not in your
snippet)
- BufferNodeBase is only 1 ID.
All the Nodes are starting with an underline, also separating possible
numbers there.
Perhaps getNameStr could start or end with a number, I will add an extra
separator here as well when concatenated with an ID, just to be sure
though I did not encounter such a case in the current code.
------
I will check an alternative, if making the stringstream static will give
the same improvement, since we do not have any construction then. I have
read that the construction of stringstream (due to the locale) is the
reason.
BR,
Willy
…On Mon, 2 Nov 2020 at 20:16, Umar Arshad ***@***.***> wrote:
***@***.**** requested changes on this pull request.
I can see why this would be faster. I suspect its the reserve function
that improves the performance of this approach over the string stream
approach. You will need to add the formatting of the IDs back because it
will cause a naming conflict with some kernels.
------------------------------
In src/backend/common/jit/NaryNode.hpp
<#3040 (comment)>:
> for (int i = 0; i < m_num_children; i++) {
- kerStream << std::setw(3) << std::setfill('0') << std::dec
- << ids.child_ids[i];
+ kerString += std::to_string(ids.child_ids[i]);
The zeros here are necessary to avoid naming conflicts.
------------------------------
In src/backend/common/jit/Node.cpp
<#3040 (comment)>:
> @@ -41,26 +41,17 @@ int Node::getNodesMap(Node_map_t &node_map, vector<Node *> &full_nodes,
std::string getFuncName(const vector<Node *> &output_nodes,
const vector<Node *> &full_nodes,
const vector<Node_ids> &full_ids, bool is_linear) {
- std::stringstream funcName;
- std::stringstream hashName;
+ std::string funcName;
+ funcName.reserve(512);
I am guessing this is the primary reason for the performance increase. I
don't think its possible to do something similar with string stream.
------------------------------
In src/backend/common/jit/BufferNodeBase.hpp
<#3040 (comment)>:
> const common::Node_ids &ids) const final {
- kerStream << "_" << getNameStr();
- kerStream << std::setw(3) << std::setfill('0') << std::dec << ids.id
- << std::dec;
+ kerString += '_';
The filled in zeros are required to avoid naming conflicts. It is used to
distinguish between 1, 3 and 13.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3040 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQ2WGPGGVV7U2CYSDAQASETSN4AR5ANCNFSM4THYQPFQ>
.
|
Umar,
Please find an overview of my findings attached in the spreadsheet.
The string# version corresponds with the JIToverhead PR (3040)
The generated unique function names are the following:
- master: L_s__float000_00100000001_000000000002_002001002003
- string#: L_s_float,0_1,0,0,1_0,0,0,2_2,1,2,3
The same information is in both strings, only the IDs from the same node
are separated by commas which avoids the risk that numbers concatenate.
The advantage of the latter, is that the formatting is static and no
calculation is wasted.
I also included 2 versions based on stringstream.
1. have the stringstream as thread_local so that the construction happens
only once. The stringstream is initialized before each function.
2. same as 1 + only static formatting.
Both stringstream versions result in similar performance as the master.
As a conclusion, I propose the string# version (with ID number) because
this is the fastest and has the same reliability as the current solution.
PS: The PR is updated to correspond with the string# version.
BR,
Willy Born
On Mon, 2 Nov 2020 at 23:04, Sabine & Willy Born <
sabine.willy.born@gmail.com> wrote:
… umar,
The primary function of this string, is to generate an unique name for
this JIT combination. Padding with zeros or adding commas serve the same
purpose, although the comma's are much faster because it is a static
operation.
All IDs (Numbers) are separated by a comma (inside a node) or underscore
(start of node).
- NaryNode has a comma separating all the IDs. (This line is not in your
snippet)
- BufferNodeBase is only 1 ID.
All the Nodes are starting with an underline, also separating possible
numbers there.
Perhaps getNameStr could start or end with a number, I will add an extra
separator here as well when concatenated with an ID, just to be sure
though I did not encounter such a case in the current code.
------
I will check an alternative, if making the stringstream static will give
the same improvement, since we do not have any construction then. I have
read that the construction of stringstream (due to the locale) is the
reason.
BR,
Willy
On Mon, 2 Nov 2020 at 20:16, Umar Arshad ***@***.***> wrote:
> ***@***.**** requested changes on this pull request.
>
> I can see why this would be faster. I suspect its the reserve function
> that improves the performance of this approach over the string stream
> approach. You will need to add the formatting of the IDs back because it
> will cause a naming conflict with some kernels.
> ------------------------------
>
> In src/backend/common/jit/NaryNode.hpp
> <#3040 (comment)>:
>
> > for (int i = 0; i < m_num_children; i++) {
> - kerStream << std::setw(3) << std::setfill('0') << std::dec
> - << ids.child_ids[i];
> + kerString += std::to_string(ids.child_ids[i]);
>
> The zeros here are necessary to avoid naming conflicts.
> ------------------------------
>
> In src/backend/common/jit/Node.cpp
> <#3040 (comment)>:
>
> > @@ -41,26 +41,17 @@ int Node::getNodesMap(Node_map_t &node_map, vector<Node *> &full_nodes,
> std::string getFuncName(const vector<Node *> &output_nodes,
> const vector<Node *> &full_nodes,
> const vector<Node_ids> &full_ids, bool is_linear) {
> - std::stringstream funcName;
> - std::stringstream hashName;
> + std::string funcName;
> + funcName.reserve(512);
>
> I am guessing this is the primary reason for the performance increase. I
> don't think its possible to do something similar with string stream.
> ------------------------------
>
> In src/backend/common/jit/BufferNodeBase.hpp
> <#3040 (comment)>:
>
> > const common::Node_ids &ids) const final {
> - kerStream << "_" << getNameStr();
> - kerStream << std::setw(3) << std::setfill('0') << std::dec << ids.id
> - << std::dec;
> + kerString += '_';
>
> The filled in zeros are required to avoid naming conflicts. It is used to
> distinguish between 1, 3 and 13.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly, view it on GitHub
> <#3040 (review)>,
> or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AQ2WGPGGVV7U2CYSDAQASETSN4AR5ANCNFSM4THYQPFQ>
> .
>
|
Add a separator between names of multiple output nodes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable. I had missed the comma in the first run through your code. Do you still have your performance benchmarks for this change?
The spreadsheet, which was attached to the mail.
[JIToverhead.xlsx](https://github.com/arrayfire/arrayfire/files/5484525/JIToverhead.xlsx)
…On Tue, 3 Nov 2020 at 21:38, Umar Arshad ***@***.***> wrote:
***@***.**** approved this pull request.
That sounds reasonable. I had missed the comma in the first run through
your code. Do you still have your performance benchmarks for this change?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#3040 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AQ2WGPEQX4NFJKW2NOPODMTSOBS6FANCNFSM4THYQPFQ>
.
|
This is great! Thanks for your contribution. |
I am little late to the party, simple but efficient improvement, thanks @willyborn |
…3040) Use strings instead of stringstream to generate funcNames for JIT kernels. * JIT optimization: Faster generation of an unique funcName * Extra separator between returned names and IDs, to be certain that they never concatenate. * Added separator for output nodes * For improved performance: Use the operation ID iso operation string. Add a separator between names of multiple output nodes. (cherry picked from commit d0645fe)
Use strings instead of stringstream to generate funcNames for JIT kernels. * JIT optimization: Faster generation of an unique funcName * Extra separator between returned names and IDs, to be certain that they never concatenate. * Added separator for output nodes * For improved performance: Use the operation ID iso operation string. Add a separator between names of multiple output nodes. (cherry picked from commit d0645fe)
The performance of JIT commands are improved up to 2x, dependent on the length of the JIT tree.
The difference is more pronounced for cached JIT kernels.
Description
The unique funcName is used to identify a JIT tree combination. This name is generated, even when the corresponding kernel is cached.
The usage of std::string and eliminating of unnecessary formatting is a few times faster than the std::streamstring (which is very slow at construction).
Since some formatting is changed, the resulting KER number will be different resulting in a once recompilation of the kernel and a new cache file.
Changes to Users
No changes, except on the clean-up of the caching directory.
Perhaps a note necessary in the documentation??
Checklist