Skip to content

Work around bad codegen for the hottest loop in UTF-16 normalization #6788

@hsivonen

Description

@hsivonen

rustc generates bad code for the hottest loop in UTF-16 normalization, which makes ICU4X much slower than ICU4C when finding out the already-normalized prefix. (Previously, this particular loop formulation was found to be as fast as ICU4C's loop formulation for the situation that also writes to an output buffer. I haven't checked, yet, if writing makes a difference or if the compiler side has regressed since I last looked into the loop formulation.)

How the pointer advancement for a slice iterator compiles depends on things that logically are not supposed to have an effect on the output after optimization, yet they do. See rust-lang/rust#144684 .

Look at the code on Compiler Explorer, try changing the things that affect the outcome per the above Rust issue and also try the suggestion from https://users.rust-lang.org/t/getting-rustc-to-allocate-a-particular-local-in-a-register/132391/6 until the bad codegen isn't triggered.

(Note: While this should fix perf for Latin-script input, it won't alone be enough for Greek, Chinese, etc. We should also make sure that the trie accessors are fully inlined as they are in ICU4C and make sure that trie access doesn't branch on trie type multiple times per lookup. See previously-filed issues about trie access perf.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-performanceArea: Performance (CPU, Memory)C-collatorComponent: Collation, normalization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions