Skip to content

Conversation

Kmeakin
Copy link
Contributor

@Kmeakin Kmeakin commented Sep 3, 2025

Split off from #145219

Cased is a derived property - it is the union of the Lowercase property, the Uppercase property, and the Titlecase_Letter general categories. We already have lookup tables for Lowercase and Uppercase, and Titlecase_Letter is very small. So instead of duplicating a lookup table for Cased, just test each of those properties in turn.

This probably will be slower than the old approach, but it is not a public API: it is only used in string::to_lower when deciding when a Greek "sigma" should be mapped to ς or to σ. This is a very rare case, so should not be performance sensitive.

`Cased` is a derived property - it is the union of the `Lowercase`
property, the `Uppercase` property, and the `Titlecase_Letter` general
categories. We already have lookup tables for `Lowercase` and
`Uppercase`, and `Titlecase_Letter` is very small. So instead of
duplicating a lookup table for `Cased`, just test each of those
properties in turn.

This probably will be slower than the old approach, but it is not a
public API: it is only used in `string::to_lower` when deciding when a
Greek "sigma" should be mapped to `ς` or to `σ`. This is a very rare
case, so should not be performance sensitive.
@rustbot
Copy link
Collaborator

rustbot commented Sep 3, 2025

r? @scottmcm

rustbot has assigned @scottmcm.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Sep 3, 2025
@rustbot
Copy link
Collaborator

rustbot commented Sep 3, 2025

library/core/src/unicode/unicode_data.rs is generated by the src/tools/unicode-table-generator tool.

If you want to modify unicode_data.rs, please modify the tool then regenerate the library source file via ./x run src/tools/unicode-table-generator instead of editing unicode_data.rs manually.

@Kmeakin Kmeakin changed the title optimization: Eliminate Cased table Remove Cased Unicode table Sep 3, 2025
@Kobzol
Copy link
Member

Kobzol commented Sep 4, 2025

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors bot added a commit that referenced this pull request Sep 4, 2025
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 4, 2025
@rust-bors
Copy link

rust-bors bot commented Sep 4, 2025

☀️ Try build successful (CI)
Build commit: abd6680 (abd6680cc4021a02ca80a20bb45c967f5cb9f056, parent: 033c0a4742794f5608b19eb78458726596f8ec18)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (abd6680): comparison URL.

Overall result: ❌✅ regressions and improvements - please read the text below

Benchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @rustbot label: +perf-regression-triaged. If not, please fix the regressions and do another perf run. If its results are neutral or positive, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.2% [0.2%, 0.2%] 1
Regressions ❌
(secondary)
0.4% [0.4%, 0.4%] 1
Improvements ✅
(primary)
-0.3% [-0.5%, -0.2%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.1% [-0.5%, 0.2%] 3

Max RSS (memory usage)

Results (primary 0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.3% [4.3%, 4.3%] 1
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.6% [-2.0%, -1.2%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [-2.0%, 4.3%] 3

Cycles

Results (secondary 2.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) - - 0

Binary size

Results (primary 0.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.1% [0.1%, 0.1%] 7
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.1% [0.1%, 0.1%] 7

Bootstrap: 466.287s -> 464.864s (-0.31%)
Artifact size: 388.39 MiB -> 388.39 MiB (0.00%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Sep 4, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf-regression Performance regression. S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants