test(formatter): check performance #12438

Dunqing · 2025-07-21T16:39:39Z

Do some experiments in this PR to see how the performance goes

Dunqing · 2025-07-21T16:40:06Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

0-merge - adds this PR to the back of the merge queue
hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

codspeed-hq · 2025-07-21T16:45:27Z

CodSpeed Instrumentation Performance Report

Merging #12438 will improve performances by 22.84%

_{Comparing 07-22-test_formatter_check_performance (d6761fd) with main (d936024)}

Summary

⚡ 4 improvements
✅ 30 untouched benchmarks

Benchmarks breakdown

	Benchmark	`BASE`	`HEAD`	Change
⚡	`formatter[RadixUIAdoptionSection.jsx]`	307.3 µs	250.2 µs	+22.84%
⚡	`formatter[binder.ts]`	20.7 ms	18.3 ms	+13.32%
⚡	`formatter[cal.com.tsx]`	162.2 ms	142.2 ms	+14.11%
⚡	`formatter[react.development.js]`	10.3 ms	8.6 ms	+20.21%

Dunqing · 2025-07-22T01:47:42Z

CodSpeed Instrumentation Performance Report

Merging #12438 will improve performances by 22.84%

Comparing 07-22-test_formatter_check_performance (d6761fd) with main (d936024)

Summary

⚡ 4 improvements ✅ 30 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
⚡ formatter[RadixUIAdoptionSection.jsx] 307.3 µs 250.2 µs +22.84%
⚡ formatter[binder.ts] 20.7 ms 18.3 ms +13.32%
⚡ formatter[cal.com.tsx] 162.2 ms 142.2 ms +14.11%
⚡ formatter[react.development.js] 10.3 ms 8.6 ms +20.21%

Gains 13%-20% performance improvement by pre-reserving enough space 🤣

Boshen · 2025-07-22T02:14:44Z

crates/oxc_formatter/src/formatter/mod.rs

@@ -371,7 +371,7 @@ pub fn format<'ast>(
    arguments: Arguments<'_, 'ast>,
 ) -> FormatResult<Formatted<'ast>> {
    let mut state = FormatState::new(program, context);
-    let mut buffer = VecBuffer::with_capacity(arguments.items().len(), &mut state);
+    let mut buffer = VecBuffer::with_capacity(program.source_text.len(), &mut state);


This is way too big? Integrate will monitor-oxc andd see what the average is?

Yes, this is the worst-case scenario where every character is an element. I will measure the average when I start fully working on improving performance. Now this is an experiment.

overlookmotel · 2025-07-22T12:08:48Z

Not to rain on your parade, but... I fear CodSpeed may be lying to us. Aapo taught me some stuff: oxc-project/backlog#11 (comment) (see "Multiple pointers + cost of growth" section of that comment).

From what he's saying, avoiding growing Vecs should not be the massive optimization that we have thought it is, because system allocator's realloc will handle growing a large allocation by altering the page table, instead of copying all the memory to a new location, which is much much cheaper. From reading around, Aapo's argument appears to be correct, but weirdly that's not what we've seen in practice.

To be fair to CodSpeed, I've also contributed to the unrealism of our benchmarks. To reduce variance in the numbers, I added a custom global allocator which specifically disables realloc.

I haven't had time to properly dig into all this, but just letting you know our benchmarks may be misleading in this respect.

If you want to make sure this change has real-world effect, you could run benchmarks locally before/after this PR. BUT comment out the custom global allocator NeverGrowInPlaceAllocator in both cases. If you do that, please let me know - I'd be really interested to hear the result.

One other thing: The realloc via page tables optimization (if it exists) only applies to the heap managed by Rust (i.e. native types like std::vec::Vec). Bumpalo does not do any page table tricks, so growing an oxc_allocator::Vec always copies memory. So there's no doubt that reserving sufficient capacity upfront for an oxc_allocator::Vec is an important optimization. It's just std::vec::Vec that I'm not so sure about.

Dunqing · 2025-07-22T13:32:23Z

Not to rain on your parade, but... I fear CodSpeed may be lying to us. Aapo taught me some stuff: oxc-project/backlog#11 (comment) (see "Multiple pointers + cost of growth" section of that comment).

From what he's saying, avoiding growing Vecs should not be the massive optimization that we have thought it is, because system allocator's realloc will handle growing a large allocation by altering the page table, instead of copying all the memory to a new location, which is much much cheaper. From reading around, Aapo's argument appears to be correct, but weirdly that's not what we've seen in practice.

To be fair to CodSpeed, I've also contributed to the unrealism of our benchmarks. To reduce variance in the numbers, I added a custom global allocator which specifically disables realloc.

I haven't had time to properly dig into all this, but just letting you know our benchmarks may be misleading in this respect.

If you want to make sure this change has real-world effect, you could run benchmarks locally before/after this PR. BUT comment out the custom global allocator NeverGrowInPlaceAllocator in both cases. If you do that, please let me know - I'd be really interested to hear the result.

Thank you for letting me know this, which is totally new for me!

I've run the benchmark in my local environment and commented out NeverGrowInPlaceAllocator

The result is still very promising!

cargo bench --bench formatter -- -b main
warning: unused imports: `GlobalAlloc`, `Layout`, and `System`
 --> tasks/benchmark/src/lib.rs:1:18
  |
1 | use std::alloc::{GlobalAlloc, Layout, System};
  |                  ^^^^^^^^^^^  ^^^^^^  ^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: `oxc_benchmark` (lib) generated 1 warning (run `cargo fix --lib -p oxc_benchmark` to apply 1 suggestion)
    Finished `bench` profile [optimized] target(s) in 0.08s
     Running benches/formatter.rs (target/release/deps/formatter-1c8db0825dd6469e)
formatter/RadixUIAdoptionSection.jsx
                        time:   [16.466 µs 16.480 µs 16.494 µs]
                        change: [−12.063% −11.881% −11.703%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
Benchmarking formatter/react.development.js: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 60.
formatter/react.development.js
                        time:   [875.18 µs 877.19 µs 879.28 µs]
                        change: [−5.5710% −5.2475% −4.9185%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
formatter/cal.com.tsx   time:   [13.719 ms 13.742 ms 13.765 ms]
                        change: [−17.482% −17.273% −17.070%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
formatter/binder.ts     time:   [1.9549 ms 1.9570 ms 1.9591 ms]
                        change: [+0.9608% +1.1245% +1.2814%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

I guess the binder.rs is slower than before is due we pre-reserved too much space.

One other thing: The realloc via page tables optimization (if it exists) only applies to the heap managed by Rust (i.e. native types like std::vec::Vec). Bumpalo does not do any page table tricks, so growing an oxc_allocator::Vec always copies memory. So there's no doubt that reserving sufficient capacity upfront for an oxc_allocator::Vec is an important optimization. It's just std::vec::Vec that I'm not so sure about.

Cool. Let me try this.

Dunqing · 2025-07-24T08:20:29Z

I give up on replacing std Vec with our own Vec in this PR for now. The reason is that it will take some time, so I think it is not worth doing it.

overlookmotel · 2025-07-24T08:51:37Z

Thanks for testing it out. The results are somewhat odd. Why does binder.ts show a small regression in local benchmark run, but a 13% positive improvement on CodSpeed? But overall, the local bench run shows that mostly preallocating capacity for the Vec is a real world perf improvement.

Let me clarify one thing I said earlier, as I think I phrased it badly:

One other thing: The realloc via page tables optimization (if it exists) only applies to the heap managed by Rust (i.e. native types like std::vec::Vec). Bumpalo does not do any page table tricks, so growing an oxc_allocator::Vec always copies memory. So there's no doubt that reserving sufficient capacity upfront for an oxc_allocator::Vec is an important optimization. It's just std::vec::Vec that I'm not so sure about.

I meant:

When using std::vec::Vec, the system allocator may be able to grow the Vec without memory copying (realloc alters page tables instead). So preallocating sufficient capacity upfront may or may not be a perf optimization.
When using oxc_allocator::Vec, bumpalo will always have to copy memory to grow the Vec. So preallocating sufficient capacity upfront is always a perf optimization (assuming you can guess fairly accurately in advance how much capacity will be required).

So switching from std::vec::Vec to oxc_allocator::Vec is not an optimization in this respect. In fact it may have negative consequences if large Vecs have to grow, because growing oxc_allocator::Vec always involves memory copying.

oxc_allocator::Vec does have other advantages - notably that it's non-Drop. But its high cost of growth is a potential disadvantage.

I don't know if that's any clearer?!

Dunqing mentioned this pull request Jul 21, 2025

feat(formatter): complete printing array-like ast nodes #12424

Merged

github-actions bot added A-formatter Area - Formatter C-test Category - Testing. Code is missing test cases, or a PR is adding them labels Jul 21, 2025

github-actions bot added the A-ast-tools Area - AST tools label Jul 22, 2025

graphite-app bot changed the base branch from 07-21-feat_formatter_complete_printing_array-like_ast_nodes to graphite-base/12438 July 22, 2025 01:12

graphite-app bot force-pushed the graphite-base/12438 branch from 2874c76 to d936024 Compare July 22, 2025 01:20

graphite-app bot force-pushed the 07-22-test_formatter_check_performance branch from cb7b1ff to f0af69e Compare July 22, 2025 01:20

graphite-app bot changed the base branch from graphite-base/12438 to main July 22, 2025 01:21

graphite-app bot force-pushed the 07-22-test_formatter_check_performance branch from f0af69e to 94c912d Compare July 22, 2025 01:21

Dunqing added 2 commits July 22, 2025 09:34

test(formatter): check performance

ffed779

pre-reserve enough space for elements

d6761fd

Dunqing force-pushed the 07-22-test_formatter_check_performance branch from 94c912d to d6761fd Compare July 22, 2025 01:36

Boshen reviewed Jul 22, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

test(formatter): check performance #12438

test(formatter): check performance #12438

Uh oh!

Dunqing commented Jul 21, 2025 •

edited

Loading

Uh oh!

Dunqing commented Jul 21, 2025 •

edited

Loading

Uh oh!

codspeed-hq bot commented Jul 21, 2025 •

edited

Loading

Uh oh!

Dunqing commented Jul 22, 2025 •

edited

Loading

CodSpeed Instrumentation Performance Report

Merging #12438 will improve performances by 22.84%

Summary

Benchmarks breakdown

Uh oh!

Boshen Jul 22, 2025

Uh oh!

Dunqing Jul 22, 2025

Uh oh!

overlookmotel commented Jul 22, 2025

Uh oh!

Dunqing commented Jul 22, 2025 •

edited

Loading

Uh oh!

Dunqing commented Jul 24, 2025

Uh oh!

overlookmotel commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

test(formatter): check performance #12438

Are you sure you want to change the base?

test(formatter): check performance #12438

Uh oh!

Conversation

Dunqing commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dunqing commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

codspeed-hq bot commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #12438 will improve performances by 22.84%

Summary

Benchmarks breakdown

Uh oh!

Dunqing commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CodSpeed Instrumentation Performance Report

Merging #12438 will improve performances by 22.84%

Summary

Benchmarks breakdown

Uh oh!

Boshen Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

Dunqing Jul 22, 2025

Choose a reason for hiding this comment

Uh oh!

overlookmotel commented Jul 22, 2025

Uh oh!

Dunqing commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dunqing commented Jul 24, 2025

Uh oh!

overlookmotel commented Jul 24, 2025

Uh oh!

Uh oh!

Dunqing commented Jul 21, 2025 •

edited

Loading

Dunqing commented Jul 21, 2025 •

edited

Loading

codspeed-hq bot commented Jul 21, 2025 •

edited

Loading

Dunqing commented Jul 22, 2025 •

edited

Loading

Dunqing commented Jul 22, 2025 •

edited

Loading