Skip to content

Benchmark other methods mentioned in README #97

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 2, 2021

Conversation

timClicks
Copy link
Contributor

@timClicks timClicks commented Jun 2, 2021

The word boundary extensions to &str/String behave in a very similar, but not identical manner to .graphemes(). For example, Mandarin to slow(ish) on .graphemes() but
fast(ish) on .word_boundaries() whereas languages with whitespace-delimited words tend to have the same performance characteristics with the latter methods.

As the library develops, it would be worthwhile to monitor the speed of the rest of the documented API.

Out of interest, here are the results of local benchmarking:

     Running unittests (target/release/deps/graphemes-564b84453b2889b6)

running 8 tests
test graphemes_arabic      ... bench:     569,134 ns/iter (+/- 67,321) = 88 MB/s
test graphemes_english     ... bench:     772,797 ns/iter (+/- 72,533) = 64 MB/s
test graphemes_hindi       ... bench:     557,920 ns/iter (+/- 62,334) = 88 MB/s
test graphemes_japanese    ... bench:     592,961 ns/iter (+/- 98,301) = 85 MB/s
test graphemes_korean      ... bench:   1,069,377 ns/iter (+/- 152,167) = 46 MB/s
test graphemes_mandarin    ... bench:     418,928 ns/iter (+/- 47,041) = 120 MB/s
test graphemes_russian     ... bench:     582,695 ns/iter (+/- 64,659) = 87 MB/s
test graphemes_source_code ... bench:     837,820 ns/iter (+/- 103,583) = 59 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

     Running unittests (target/release/deps/unicode_words-ae63859a9debc323)

running 8 tests
test unicode_words_arabic      ... bench:     654,021 ns/iter (+/- 84,369) = 76 MB/s
test unicode_words_english     ... bench:   1,123,919 ns/iter (+/- 126,888) = 44 MB/s
test unicode_words_hindi       ... bench:     529,154 ns/iter (+/- 123,820) = 93 MB/s
test unicode_words_japanese    ... bench:   1,225,327 ns/iter (+/- 97,295) = 41 MB/s
test unicode_words_korean      ... bench:     620,752 ns/iter (+/- 44,833) = 80 MB/s
test unicode_words_mandarin    ... bench:   1,166,284 ns/iter (+/- 81,349) = 43 MB/s
test unicode_words_russian     ... bench:     700,773 ns/iter (+/- 72,376) = 72 MB/s
test unicode_words_source_code ... bench:   1,212,000 ns/iter (+/- 61,977) = 41 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

     Running unittests (target/release/deps/word_bounds-e8efa40319028a56)

running 8 tests
test word_bounds_arabic      ... bench:     526,613 ns/iter (+/- 167,663) = 95 MB/s
test word_bounds_english     ... bench:     986,120 ns/iter (+/- 150,942) = 50 MB/s
test word_bounds_hindi       ... bench:     405,190 ns/iter (+/- 75,288) = 122 MB/s
test word_bounds_japanese    ... bench:     819,973 ns/iter (+/- 104,399) = 61 MB/s
test word_bounds_korean      ... bench:     475,308 ns/iter (+/- 50,833) = 105 MB/s
test word_bounds_mandarin    ... bench:     735,141 ns/iter (+/- 111,634) = 68 MB/s
test word_bounds_russian     ... bench:     544,748 ns/iter (+/- 113,632) = 93 MB/s
test word_bounds_source_code ... bench:   1,129,101 ns/iter (+/- 361,817) = 44 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 8 measured

The word boundary extensions to &str/String behave in a
very similiar, but not identical manner to .graphemes().
For example, Mandarin to slow(ish) on .graphemes() but
fast(ish) on .word_boundaries() whereas languages with
whitespace-delimited words tend to have the same per-
formance characteristics with the latter methods.

As the library develops, it would be worthwhile to
monitor the speed of the rest of the documented API.
@Manishearth Manishearth merged commit 58d73ac into unicode-rs:master Jun 2, 2021
@timClicks timClicks deleted the add-word-benches branch June 2, 2021 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants