Skip to content

Substantially improve uncommon-case memory-system performance #1966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 29, 2025

Conversation

aswaterman
Copy link
Collaborator

@aswaterman aswaterman commented Apr 29, 2025

Several commingled improvements, but no API changes:

  • Allow TLB to be used for MMIO accesses. This speeds up simple MMIO accesses (e.g. reading the boot ROM) by >2x when VM is disabled or 5x when VM is enabled.
  • Skip checking the debug triggers for MMIO accesses and misaligned-but-intrapage memory accesses unless necessary, speeding up these cases by another >2x.
  • Allow TLB to be used even when a memtracer is registered on an address. This speeds up D$ simulation by 6x.
  • Give the load/store/fetch TLBs their own tags, since they already had their own data anyway. This eliminates a common source of conflict misses and simplifies the implementation.
  • Also fix an ubsan error associated with outside-of-object memory accesses when computing host addresses.
  • Also improve common-case performance by moving uncommon-case checks (commit log, matched debug trigger) off the critical code path.

Needed for valgrind safety, coupled with next commit.
Avoid accessing pointers to outside of the ultimately referenced object.
This is a performance enhancement, because it prevents some pathological
conflict cases (e.g. aligned memcpy), but it also cleans up some aspects
of the code (e.g. ITLB refills don't interact with the DTLB).
Suppress TLB refill when commit logging is enabled to facilitate this
strategy.
Mostly matters for misaligned loads and stores and MMIO accesses.

Opportunistically skip checking the triggers and some other
less-common checks.
@aswaterman aswaterman requested a review from jerryz123 April 29, 2025 00:10
Copy link
Collaborator

@jerryz123 jerryz123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite a lot of code change, I think I'd need to reread the fetch/load/store path in its entirety to appreciate the new flow, but the individual commits here look good.

How much faster is this now when running in a bus-functional model? I'm interpreting a 4x speedup from your changelog.

@aswaterman
Copy link
Collaborator Author

Yeah, if I knew what I was getting into beforehand, I would've split this up into multiple PRs over multiple days...

Yeah, aggregate speedup is in the 4x range when half of instructions are loads/stores and use of the functional I$ is allowed. I didn't write down the measurements for the case where use of the I$ is disallowed, but it's at least another 2x on top of that.

@aswaterman aswaterman merged commit 77ea9de into master Apr 29, 2025
3 checks passed
@aswaterman aswaterman deleted the tlb-rework branch April 29, 2025 01:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants