Work around bad codegen for the hottest loop in UTF-16 normalization

rustc generates bad code for the hottest loop in UTF-16 normalization, which makes ICU4X much slower than ICU4C when finding out the already-normalized prefix. (Previously, this particular loop formulation was found to be as fast as ICU4C's loop formulation for the situation that also writes to an output buffer. I haven't checked, yet, if writing makes a difference or if the compiler side has regressed since I last looked into the loop formulation.)

How the pointer advancement for a slice iterator compiles depends on things that logically are not supposed to have an effect on the output after optimization, yet they do. See https://github.com/rust-lang/rust/issues/144684 .

Look at the code on Compiler Explorer, try changing the things that affect the outcome per the above Rust issue and also try the suggestion from https://users.rust-lang.org/t/getting-rustc-to-allocate-a-particular-local-in-a-register/132391/6  until the bad codegen isn't triggered.

(Note: While this should fix perf for Latin-script input, it won't alone be enough for Greek, Chinese, etc. We should also make sure that the trie accessors are fully inlined as they are in ICU4C and make sure that trie access doesn't branch on trie type multiple times per lookup. See previously-filed issues about trie access perf.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Work around bad codegen for the hottest loop in UTF-16 normalization #6788

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Work around bad codegen for the hottest loop in UTF-16 normalization #6788

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions