Substantially improve uncommon-case memory-system performance #1966

aswaterman · 2025-04-29T00:10:02Z

Several commingled improvements, but no API changes:

Allow TLB to be used for MMIO accesses. This speeds up simple MMIO accesses (e.g. reading the boot ROM) by >2x when VM is disabled or 5x when VM is enabled.
Skip checking the debug triggers for MMIO accesses and misaligned-but-intrapage memory accesses unless necessary, speeding up these cases by another >2x.
Allow TLB to be used even when a memtracer is registered on an address. This speeds up D$ simulation by 6x.
Give the load/store/fetch TLBs their own tags, since they already had their own data anyway. This eliminates a common source of conflict misses and simplifies the implementation.
Also fix an ubsan error associated with outside-of-object memory accesses when computing host addresses.
Also improve common-case performance by moving uncommon-case checks (commit log, matched debug trigger) off the critical code path.

Needed for valgrind safety, coupled with next commit.

Avoid accessing pointers to outside of the ultimately referenced object.

This is a performance enhancement, because it prevents some pathological conflict cases (e.g. aligned memcpy), but it also cleans up some aspects of the code (e.g. ITLB refills don't interact with the DTLB).

Suppress TLB refill when commit logging is enabled to facilitate this strategy.

Mostly matters for misaligned loads and stores and MMIO accesses. Opportunistically skip checking the triggers and some other less-common checks.

jerryz123

This is quite a lot of code change, I think I'd need to reread the fetch/load/store path in its entirety to appreciate the new flow, but the individual commits here look good.

How much faster is this now when running in a bus-functional model? I'm interpreting a 4x speedup from your changelog.

aswaterman · 2025-04-29T01:20:22Z

Yeah, if I knew what I was getting into beforehand, I would've split this up into multiple PRs over multiple days...

Yeah, aggregate speedup is in the 4x range when half of instructions are loads/stores and use of the functional I$ is allowed. I didn't write down the measurements for the case where use of the I$ is disallowed, but it's at least another 2x on top of that.

aswaterman added 14 commits April 28, 2025 16:47

Make mmu_t::fetch_temp an entire page in size

597a897

Needed for valgrind safety, coupled with next commit.

Fix UB in TLB, making Spike valgrind-safe

734bd97

Avoid accessing pointers to outside of the ultimately referenced object.

Remove unused code in mmu.h

840e9ba

Separate ITLB/LTLB/STLB into separate structures

52517f7

This is a performance enhancement, because it prevents some pathological conflict cases (e.g. aligned memcpy), but it also cleans up some aspects of the code (e.g. ITLB refills don't interact with the DTLB).

Move commit logging check off the critical path

92e4f02

Suppress TLB refill when commit logging is enabled to facilitate this strategy.

DRY in mmu_t load/store

18baf4c

DRY in instruction fetch; eliminate fetch_temp

104c99e

Avoid memory-allocation anti-pattern on matched_trigger

59eebf0

Move matched_trigger check off the critical path

cf9488b

Factor out load/store execution from permissions checks

2d846b1

Allow use of TLB even when memtracers are registered

c483949

Allow use of TLB for MMIO accesses

8518255

Factor out instruction fetch from permissions checks

8bd26af

Significantly up uncommon-case load/store/fetch

607ba10

Mostly matters for misaligned loads and stores and MMIO accesses. Opportunistically skip checking the triggers and some other less-common checks.

aswaterman requested a review from jerryz123 April 29, 2025 00:10

jerryz123 approved these changes Apr 29, 2025

View reviewed changes

aswaterman merged commit 77ea9de into master Apr 29, 2025
3 checks passed

aswaterman deleted the tlb-rework branch April 29, 2025 01:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Substantially improve uncommon-case memory-system performance #1966

Substantially improve uncommon-case memory-system performance #1966

aswaterman commented Apr 29, 2025 •

edited

Loading

jerryz123 left a comment

aswaterman commented Apr 29, 2025

Substantially improve uncommon-case memory-system performance #1966

Substantially improve uncommon-case memory-system performance #1966

Conversation

aswaterman commented Apr 29, 2025 • edited Loading

jerryz123 left a comment

Choose a reason for hiding this comment

aswaterman commented Apr 29, 2025

aswaterman commented Apr 29, 2025 •

edited

Loading