Skip to content

Adding a "get_corpus" benchmark. #456

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jan 20, 2020
Merged

Adding a "get_corpus" benchmark. #456

merged 2 commits into from
Jan 20, 2020

Conversation

lemire
Copy link
Member

@lemire lemire commented Jan 20, 2020

We have a PR that proposes to streamline the get_corpus function, but we have no benchmark. Let us create one.

Copy link
Contributor

@DBJDBJ DBJDBJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

double bench(std::string filename, simdjson::padded_string& p) 

C++17 is value-based semantics oriented. It very much depends on the compiler and its std lib implementation, but it is better to use

double bench(std::string_view , simdjson::padded_string const & )  ;

Above is fast and can be called with: std::string, const char * and std::string_view.

@lemire
Copy link
Member Author

lemire commented Jan 20, 2020

Copying the filename is trivial in this instance compared to the cost of loading up the file's content.

@lemire lemire merged commit aea7991 into master Jan 20, 2020
@lemire lemire mentioned this pull request Jan 20, 2020
Copy link
Contributor

@DBJDBJ DBJDBJ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simdjson::get_corpus(filename).swap(p);

This is also measuring the performance of the swap(), padded_string method.
This is a bit better

(void)simdjson::get_corpus(filename) ;

Although I expect the fastest version would be C11 one, without exceptions. But not much faster than C++17 without exceptions

padded_string get_corpus (  std::errc &  /* posix error code */, std::string_view /* flename */  ) 

@DBJDBJ
Copy link
Contributor

DBJDBJ commented Jan 20, 2020

Copying the filename is trivial in this instance compared to the cost of loading up the file's content.

That is true. That was a general remark for why simdjson should use std::string_view function arguments, everywhere.

@lemire
Copy link
Member Author

lemire commented Jan 20, 2020

When benchmarking, we have to worry that the compiler might optimize away much of the work. The swap prevents the compiler from getting too clever because the data escapes the function.

@DBJDBJ
Copy link
Contributor

DBJDBJ commented Jan 21, 2020

image

this "new no exception" get_corpus has not made much performance gains

@jkeiser jkeiser deleted the addinggetcorpusbenchmark branch February 7, 2020 18:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants