Skip to content

test(formatter): check performance #12438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Dunqing
Copy link
Member

@Dunqing Dunqing commented Jul 21, 2025

Do some experiments in this PR to see how the performance goes

Copy link
Member Author

Dunqing commented Jul 21, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

codspeed-hq bot commented Jul 21, 2025

CodSpeed Instrumentation Performance Report

Merging #12438 will improve performances by 22.84%

Comparing 07-22-test_formatter_check_performance (d6761fd) with main (d936024)

Summary

⚡ 4 improvements
✅ 30 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
formatter[RadixUIAdoptionSection.jsx] 307.3 µs 250.2 µs +22.84%
formatter[binder.ts] 20.7 ms 18.3 ms +13.32%
formatter[cal.com.tsx] 162.2 ms 142.2 ms +14.11%
formatter[react.development.js] 10.3 ms 8.6 ms +20.21%

@github-actions github-actions bot added the A-ast-tools Area - AST tools label Jul 22, 2025
@graphite-app graphite-app bot changed the base branch from 07-21-feat_formatter_complete_printing_array-like_ast_nodes to graphite-base/12438 July 22, 2025 01:12
@graphite-app graphite-app bot force-pushed the graphite-base/12438 branch from 2874c76 to d936024 Compare July 22, 2025 01:20
@graphite-app graphite-app bot force-pushed the 07-22-test_formatter_check_performance branch from cb7b1ff to f0af69e Compare July 22, 2025 01:20
@graphite-app graphite-app bot changed the base branch from graphite-base/12438 to main July 22, 2025 01:21
@graphite-app graphite-app bot force-pushed the 07-22-test_formatter_check_performance branch from f0af69e to 94c912d Compare July 22, 2025 01:21
@Dunqing Dunqing force-pushed the 07-22-test_formatter_check_performance branch from 94c912d to d6761fd Compare July 22, 2025 01:36
@Dunqing
Copy link
Member Author

Dunqing commented Jul 22, 2025

CodSpeed Instrumentation Performance Report

Merging #12438 will improve performances by 22.84%

Comparing 07-22-test_formatter_check_performance (d6761fd) with main (d936024)

Summary

⚡ 4 improvements ✅ 30 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
formatter[RadixUIAdoptionSection.jsx] 307.3 µs 250.2 µs +22.84%
formatter[binder.ts] 20.7 ms 18.3 ms +13.32%
formatter[cal.com.tsx] 162.2 ms 142.2 ms +14.11%
formatter[react.development.js] 10.3 ms 8.6 ms +20.21%

Gains 13%-20% performance improvement by pre-reserving enough space 🤣

@@ -371,7 +371,7 @@ pub fn format<'ast>(
arguments: Arguments<'_, 'ast>,
) -> FormatResult<Formatted<'ast>> {
let mut state = FormatState::new(program, context);
let mut buffer = VecBuffer::with_capacity(arguments.items().len(), &mut state);
let mut buffer = VecBuffer::with_capacity(program.source_text.len(), &mut state);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way too big? Integrate will monitor-oxc andd see what the average is?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is the worst-case scenario where every character is an element. I will measure the average when I start fully working on improving performance. Now this is an experiment.

@overlookmotel
Copy link
Member

Not to rain on your parade, but... I fear CodSpeed may be lying to us. Aapo taught me some stuff: oxc-project/backlog#11 (comment) (see "Multiple pointers + cost of growth" section of that comment).

From what he's saying, avoiding growing Vecs should not be the massive optimization that we have thought it is, because system allocator's realloc will handle growing a large allocation by altering the page table, instead of copying all the memory to a new location, which is much much cheaper. From reading around, Aapo's argument appears to be correct, but weirdly that's not what we've seen in practice.

To be fair to CodSpeed, I've also contributed to the unrealism of our benchmarks. To reduce variance in the numbers, I added a custom global allocator which specifically disables realloc.

I haven't had time to properly dig into all this, but just letting you know our benchmarks may be misleading in this respect.

If you want to make sure this change has real-world effect, you could run benchmarks locally before/after this PR. BUT comment out the custom global allocator NeverGrowInPlaceAllocator in both cases. If you do that, please let me know - I'd be really interested to hear the result.


One other thing: The realloc via page tables optimization (if it exists) only applies to the heap managed by Rust (i.e. native types like std::vec::Vec). Bumpalo does not do any page table tricks, so growing an oxc_allocator::Vec always copies memory. So there's no doubt that reserving sufficient capacity upfront for an oxc_allocator::Vec is an important optimization. It's just std::vec::Vec that I'm not so sure about.

@Dunqing
Copy link
Member Author

Dunqing commented Jul 22, 2025

Not to rain on your parade, but... I fear CodSpeed may be lying to us. Aapo taught me some stuff: oxc-project/backlog#11 (comment) (see "Multiple pointers + cost of growth" section of that comment).

From what he's saying, avoiding growing Vecs should not be the massive optimization that we have thought it is, because system allocator's realloc will handle growing a large allocation by altering the page table, instead of copying all the memory to a new location, which is much much cheaper. From reading around, Aapo's argument appears to be correct, but weirdly that's not what we've seen in practice.

To be fair to CodSpeed, I've also contributed to the unrealism of our benchmarks. To reduce variance in the numbers, I added a custom global allocator which specifically disables realloc.

I haven't had time to properly dig into all this, but just letting you know our benchmarks may be misleading in this respect.

If you want to make sure this change has real-world effect, you could run benchmarks locally before/after this PR. BUT comment out the custom global allocator NeverGrowInPlaceAllocator in both cases. If you do that, please let me know - I'd be really interested to hear the result.

Thank you for letting me know this, which is totally new for me!

I've run the benchmark in my local environment and commented out NeverGrowInPlaceAllocator

The result is still very promising!

cargo bench --bench formatter -- -b main
warning: unused imports: `GlobalAlloc`, `Layout`, and `System`
 --> tasks/benchmark/src/lib.rs:1:18
  |
1 | use std::alloc::{GlobalAlloc, Layout, System};
  |                  ^^^^^^^^^^^  ^^^^^^  ^^^^^^
  |
  = note: `#[warn(unused_imports)]` on by default

warning: `oxc_benchmark` (lib) generated 1 warning (run `cargo fix --lib -p oxc_benchmark` to apply 1 suggestion)
    Finished `bench` profile [optimized] target(s) in 0.08s
     Running benches/formatter.rs (target/release/deps/formatter-1c8db0825dd6469e)
formatter/RadixUIAdoptionSection.jsx
                        time:   [16.466 µs 16.480 µs 16.494 µs]
                        change: [−12.063% −11.881% −11.703%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 10 outliers among 100 measurements (10.00%)
  3 (3.00%) low mild
  3 (3.00%) high mild
  4 (4.00%) high severe
Benchmarking formatter/react.development.js: Warming up for 3.0000 s
Warning: Unable to complete 100 samples in 5.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 60.
formatter/react.development.js
                        time:   [875.18 µs 877.19 µs 879.28 µs]
                        change: [−5.5710% −5.2475% −4.9185%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe
formatter/cal.com.tsx   time:   [13.719 ms 13.742 ms 13.765 ms]
                        change: [−17.482% −17.273% −17.070%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
  2 (2.00%) high mild
formatter/binder.ts     time:   [1.9549 ms 1.9570 ms 1.9591 ms]
                        change: [+0.9608% +1.1245% +1.2814%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) low mild
  1 (1.00%) high mild

I guess the binder.rs is slower than before is due we pre-reserved too much space.

One other thing: The realloc via page tables optimization (if it exists) only applies to the heap managed by Rust (i.e. native types like std::vec::Vec). Bumpalo does not do any page table tricks, so growing an oxc_allocator::Vec always copies memory. So there's no doubt that reserving sufficient capacity upfront for an oxc_allocator::Vec is an important optimization. It's just std::vec::Vec that I'm not so sure about.

Cool. Let me try this.

@Dunqing
Copy link
Member Author

Dunqing commented Jul 24, 2025

I give up on replacing std Vec with our own Vec in this PR for now. The reason is that it will take some time, so I think it is not worth doing it.

@overlookmotel
Copy link
Member

Thanks for testing it out. The results are somewhat odd. Why does binder.ts show a small regression in local benchmark run, but a 13% positive improvement on CodSpeed? But overall, the local bench run shows that mostly preallocating capacity for the Vec is a real world perf improvement.

Let me clarify one thing I said earlier, as I think I phrased it badly:

One other thing: The realloc via page tables optimization (if it exists) only applies to the heap managed by Rust (i.e. native types like std::vec::Vec). Bumpalo does not do any page table tricks, so growing an oxc_allocator::Vec always copies memory. So there's no doubt that reserving sufficient capacity upfront for an oxc_allocator::Vec is an important optimization. It's just std::vec::Vec that I'm not so sure about.

I meant:

  • When using std::vec::Vec, the system allocator may be able to grow the Vec without memory copying (realloc alters page tables instead). So preallocating sufficient capacity upfront may or may not be a perf optimization.
  • When using oxc_allocator::Vec, bumpalo will always have to copy memory to grow the Vec. So preallocating sufficient capacity upfront is always a perf optimization (assuming you can guess fairly accurately in advance how much capacity will be required).

So switching from std::vec::Vec to oxc_allocator::Vec is not an optimization in this respect. In fact it may have negative consequences if large Vecs have to grow, because growing oxc_allocator::Vec always involves memory copying.

oxc_allocator::Vec does have other advantages - notably that it's non-Drop. But its high cost of growth is a potential disadvantage.

I don't know if that's any clearer?!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast-tools Area - AST tools A-formatter Area - Formatter C-test Category - Testing. Code is missing test cases, or a PR is adding them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants