-
Notifications
You must be signed in to change notification settings - Fork 925
Substantially improve uncommon-case memory-system performance #1966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Needed for valgrind safety, coupled with next commit.
Avoid accessing pointers to outside of the ultimately referenced object.
This is a performance enhancement, because it prevents some pathological conflict cases (e.g. aligned memcpy), but it also cleans up some aspects of the code (e.g. ITLB refills don't interact with the DTLB).
Suppress TLB refill when commit logging is enabled to facilitate this strategy.
Mostly matters for misaligned loads and stores and MMIO accesses. Opportunistically skip checking the triggers and some other less-common checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a lot of code change, I think I'd need to reread the fetch/load/store path in its entirety to appreciate the new flow, but the individual commits here look good.
How much faster is this now when running in a bus-functional model? I'm interpreting a 4x speedup from your changelog.
Yeah, if I knew what I was getting into beforehand, I would've split this up into multiple PRs over multiple days... Yeah, aggregate speedup is in the 4x range when half of instructions are loads/stores and use of the functional I$ is allowed. I didn't write down the measurements for the case where use of the I$ is disallowed, but it's at least another 2x on top of that. |
Several commingled improvements, but no API changes: