AOS RISC-V - TowardsAlways OnHeapMemorySafety
AOS RISC-V - TowardsAlways OnHeapMemorySafety
AOS RISC-V - TowardsAlways OnHeapMemorySafety
Figure 2: pacma instruction using QARMA to generate a PAC. Figure 3: Data-pointer signing proposed in AOS [10].
Load/store bndstr/bndclr
Table 1: Code examples in C, LLVM IR, and assembly code.
63 VA_SIZE-1 0
PAC Pointer Addr Code Examples
Memory C code char *ptr = (char *) malloc(10);
Load-Store
Check Unit Perform
Unit (LSU) != 0? LLVM IR code %3 = call noalias i8* @malloc(i64 10) #3
(MCU) bounds checking
(frontend) %4 = call i8* @llvm.aos.pacma.p0i8(i8* %3, i64 0)
No bounds checking %5 = call i8* @llvm.aos.bndstr.p0i8(i8* %4, i64 10)
L1 Cache
Assembly Code call malloc@plt
(backend) pacma a0, a0, a1
Figure 4: Memory check unit (MCU) structure. bndstr a0, a0, a1
3.2 Memory Check Unit (MCU) Table 2: New control and status registers in AOS-RISC-V.
To process new instructions, AOS adds a memory check unit (MCU)
responsible for bounds checking and metadata management. Being CSR Name Permission Description
located next to a load-store unit (LSU), the MCU takes all memory
enableAOS R/W Switch to enable AOS-RISC-V
instructions, i.e., loads and stores, as well as bounds instructions,
baseAddrOfHBT R/W Base address of an HBT
i.e., bndstr and bndclr. Depending on the instruction type, the numWaysOfHBT R/W Number of ways of an HBT
MCU generates memory requests to load or store bounds from the
HBT in memory. Since the HBT is indexed by PACs (embedded in numBndstrFails R/W Number of bounds-store failures
numBndclrFails R/W Number of bounds-clear failures
memory addresses), the locations of bounds are calculated using
numBndchkFails R/W Number of bounds-check failures
the base address of the HBT and PACs, i.e., HBT[PAC].
To adapt the MCU design to a real processor design, we decide
to break it into two separate queues, namely memory check queue
(MCQ) and bounds queue (BDQ). This design choice is based on the of an HBT is stored in the baseAddrOfHBT CSR, and the number
following two observations: 1) only bounds instructions require the of ways of an HBT is configured via the numWaysOfHBT CSR. To
bounds metadata field and 2) the number of bounds instructions is interface with such CSRs, we provide a custom system call, namely
much less than the number of memory instructions, so the MCU __aos_set(), that is inserted into the entry of a user program at
mostly gets full with memory instructions, causing backpressure compilation and sets the CSRs with given argument values. Be-
to the issue stage. For better resource utilization and performance, sides to those CSRs for configuration, we also add extra CSRs to
we choose to size the MCQ sufficiently large such that it can hold count the number of failures of bounds operations; numBndstrFails,
as many inflight memory instructions as possible and size the BDQ numBndclrFails, and numBndchkFails. After a program is termi-
reasonably small. nated, the kernel reads and prints those CSRs to let a user know
whether any bounds operation has failed during program execution.
3.3 Compiler Support Process management. Besides the hardware configurations through
Since the LLVM 9.0.0 release, the RISC-V target became no-long ex- our system call, the kernel needs to keep track of the information of
perimental, and the backend started to support full codegen for the each user process. To do so, we add new fields to the process struc-
RV32I and RV64I base RISC-V instruction set variants. As such, we ture in the linux kernel, i.e., task_struct. Those fields are initialized
design new compiler passes to the optimizer and the RISC-V back- upon process creation and are properly set by our custom system
end in the LLVM 9.0.1 [11]. First, the aos-riscv-opt optimizer pass call. During a context switch, if the current process is enabled with
is designed to detect dynamic memory allocation and deallocation AOS-RISC-V, the kernel saves its configuration information in the
calls and insert new intrinsic functions at the LLVM intermediate process structure, including the base address and the number of
representation (IR) level. The inserted intrinsic functions are de- ways of an HBT assigned to the process. Then, the kernel checks if
tected at the aos-riscv backend pass and are replaced with new the next process to execute is also enabled with AOS-RISC-V. If so,
instructions, as shown in Table 1. Additionally, the aos-riscv-opt the kernel overwrites the CSRs with the configuration information
pass inserts a custom system call to a program entry, which enables of the next process before it begins its execution. Otherwise, the
the AOS mode and configures hardware registers. We introduce kernel initializes the CSRs with zero to disable AOS features for the
more details of the custom system call in Section 3.4. next process.
Ratio (%)
5 EVALUATION 60
Figure 5 illustrates the normalized execution time of AOS-RISC-V 40
across the SPEC 2006 workloads. Most benchmarks show moderate
runtime overhead (20% on average) in our evaluation. Our analysis 20
reveals that the performance overhead is mainly derived from 1) in- 0
creased cache port contentions due to additional memory accesses bzip2 gobmk hmmer sjeng libquantum
for bounds checking and 2) the cache pollution due to the extra
bounds metadata. As the number of memory accesses requiring
Figure 6: The ratio of signed loads and stores requiring
bounds checking increases, the increased cache port contentions
bounds checking over the total memory accesses.
can delay regular memory accesses, slowing down normal program
execution. In addition, as the memory footprint of bounds metadata
increases, useful cache lines that could have been accessed by regu-
lar memory accesses in the near future can be evicted from caches, more bounds access requests need to be generated during iterative
leading to increased memory latency for subsequent accesses. bounds search in the HBT.
Notably, we observe that sjeng has near-zero runtime overhead.
As shown in Figure 6, the ratio of signed loads and stores over the 7 FUTURE WORK
total memory accesses is only 1%. This result indicates that only 1% AOS-RISC-V is currently under active development, and we leave
of the entire memory accesses require bounds checking, and 99% of several tasks as future work.
memory accesses do not cause extra overhead. In contrast, bzip2 Dynamic bounds-table resizing. In AOS, the set-associative HBT
and hmmer exhibit the high ratios of signed memory accesses close structure is introduced to handle possible PAC collisions and to
to 95% and 56%, respectively. Table 4 shows the number of sign- accommodate multiple bounds metadata for each PAC. Neverthe-
ing and bounds instructions executed. Note that we insert a pacma less, the HBT can still overflow when a certain application creates
and a bndstr after eachmalloc() and a bndclr and xpacm before numerous bounds metadata at runtime. AOS addresses this con-
each free(), as shown in Figure 3. While most applications exe- cern by adopting the dynamic bounds-table resizing method. In
cute a marginal number of additional instructions, hmmer is shown our current design, we only allocate a fixed-size HBT and leave the
to be the most malloc-intensive application among the evaluated implementation of dynamic bounds-table resizing as future work.
applications. Exception handling for bounds-operation failure. To support
precise debugging or promptly prevent malicious attacks at runtime,
6 DISCUSSION a new class of exception would need to be defined to alert the user
In our current design, we observe a higher runtime overhead than of a memory safety violation case. Currently, we only count the
that of AOS. This discrepancy seems to be caused by the limited number of bounds-operation failures and report the number after a
data fetch width supported in the BOOM core. AOS assumes the user process is terminated.
data fetch width of 64 bytes supported in modern processors, and Enhancing security guarantees. As mentioned, AOS considers
therefore up to eight sets of bounds metadata can be brought into the heap exploitation as the most prevalent and problematic attack
the CPU pipeline with a single memory request. However, our vector and focus on heap memory safety. To achieve complete
baseline BOOM core supports at most 8-byte data fetch width, so memory safety, more sophisticated compiler techniques could be
AOS-RISC-V: Towards Always-On Heap Memory Safety
Table 4: Number of additional signing and bounds instruc- [10] Yonghae Kim, Jaekyu Lee, and Hyesoon Kim. 2020. Hardware-based Always-on
tions executed. Heap Memory Safety. In Proceedings of the 53rd Annual IEEE/ACM International
Symposium on Microarchitecture (MICRO). IEEE Computer Society, Los Alamitos,
CA, 1153–1166.
Name pacma xpacm bndstr bdclr [11] Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Framework for
Lifelong Program Analysis & Transformation. In Proceedings of the International
bzip2 28 24 24 24 Symposium on Code Generation and Optimization (CGO): Feedback-Directed and
Runtime Optimization (Palo Alto, California). IEEE Computer Society, USA, 75–
gobmk 4181 4172 4181 4172 86.
hmmer 90138 90138 90138 90138 [12] Michael LeMay, Joydeep Rakshit, Sergej Deutsch, David M. Durham, Santosh
sjeng 4 0 4 0 Ghosh, Anant Nori, Jayesh Gaur, Andrew Weiler, Salmin Sultana, Karanvir Gre-
wal, and Sreenivas Subramoney. 2021. Cryptographic Capability Computing. In
libquantum 95 95 95 95 Proceedings of the 54th Annual IEEE/ACM International Symposium on Microarchi-
tecture (MICRO) (Virtual Event, Greece). Association for Computing Machinery,
New York, NY, USA, 253–267. https://doi.org/10.1145/3466752.3480076
[13] Matt Miller. 2019. Trends, challenges, and strategic shifts in the software
developed to extend the security coverage to other memory types, vulnerability mitigation landscape. https://github.com/microsoft/MSRC-
such as stack and global memory. Security-Research/blob/master/presentations/2019_02_BlueHatIL/2019_01%20-
%20BlueHatIL%20-%20Trends%2C%20challenge%2C%20and%20shifts%20in%
20software%20vulnerability%20mitigation.pdf.
8 CONCLUSIONS [14] Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic.
In this paper, we presented AOS-RISC-V, a full-stack memory safety 2009. SoftBound: Highly Compatible and Complete Spatial Memory Safety
for c. In Proceedings of the 30th ACM SIGPLAN Conference on Programming
framework. Based on the open-source RISC-V BOOM core, we proto- Language Design and Implementation (PLDI) (Dublin, Ireland). Association for
typed AOS-RISC-V, a full-system level framework for heap memory Computing Machinery, New York, NY, USA, 245–258. https://doi.org/10.1145/
1542476.1542504
safety, with our modifications encompassing architecture, compiler, [15] J. Newsome and D. Song. 2005. Dynamic Taint Analysis for Automatic Detection,
and OS support. Under the Linux kernel running on Amazon EC2 F1 Analysis, and SignatureGeneration of Exploits on Commodity Software. In NDSS.
instances, we conducted performance evaluation and showed that The Internet Society, USA.
[16] Hiroshi Sasaki, Miguel A. Arroyo, M. Tarek Ibn Ziad, Koustubha Bhat, Kanad
AOS-RISC-V incurred a 20% average slowdown across the selected Sinha, and Simha Sethumadhavan. 2019. Practical Byte-Granular Memory
SPEC 2006 workloads. Blacklisting Using Califorms. In Proceedings of the 52nd Annual IEEE/ACM In-
ternational Symposium on Microarchitecture (MICRO) (Columbus, OH, USA).
Association for Computing Machinery, New York, NY, USA, 558–571. https:
REFERENCES //doi.org/10.1145/3352460.3358299
[1] Periklis Akritidis, Manuel Costa, Miguel Castro, and Steven Hand. 2009. Baggy [17] Konstantin Serebryany, Derek Bruening, Alexander Potapenko, and Dmitriy
Bounds Checking: An Efficient and Backwards-Compatible Defense against Vyukov. 2012. AddressSanitizer: A Fast Address Sanity Checker. In Proceedings
out-of-Bounds Errors. In Proceedings of the 18th USENIX Security Symposium of the 2012 USENIX Conference on Annual Technical Conference (ATC). USENIX,
(Security) (Montreal, Canada). USENIX Association, USA, 51–66. 309–318.
[2] Krste Asanović, Rimas Avizienis, Jonathan Bachrach, Scott Beamer, David Bian- [18] Blaise Tine, Krishna Praveen Yalamarthy, Fares Elsabbagh, and Kim Hyesoon.
colin, Christopher Celio, Henry Cook, Daniel Dabbelt, John Hauser, Adam Izraele- 2021. Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics. In MICRO-
vitz, Sagar Karandikar, Ben Keller, Donggyu Kim, John Koenig, Yunsup Lee, 54: 54th Annual IEEE/ACM International Symposium on Microarchitecture (Virtual
Eric Love, Martin Maas, Albert Magyar, Howard Mao, Miquel Moreto, Albert Event, Greece) (MICRO ’21). Association for Computing Machinery, New York,
Ou, David A. Patterson, Brian Richards, Colin Schmidt, Stephen Twigg, Huy NY, USA, 754–766. https://doi.org/10.1145/3466752.3480128
Vo, and Andrew Waterman. 2016. The Rocket Chip Generator. Technical Re- [19] Andrew Waterman and Krste Asanović. 2019. The RISC-V Instruction Set Manual.
port UCB/EECS-2016-17. EECS Department, University of California, Berkeley. https://riscv.org/wp-content/uploads/2019/12/riscv-spec-20191213.pdf.
http://www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-17.html [20] Nathaniel Wesley Filardo, Brett F. Gutstein, Jonathan Woodruff, Sam Ainsworth,
[3] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Lucian Paul-Trifu, Brooks Davis, Hongyan Xia, Edward Tomasz Napierala,
Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Alexander Richardson, John Baldwin, David Chisnall, Jessica Clarke, Khilan
Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Gudka, Alexandre Joannou, A. Theodore Markettos, Alfredo Mazzinghi, Robert M.
Hill, and David A. Wood. 2011. The Gem5 Simulator. SIGARCH Computer Norton, Michael Roe, Peter Sewell, Stacey Son, Timothy M. Jones, Simon W.
Architecture News 39, 2 (Aug. 2011), 1–7. Moore, Peter G. Neumann, and Robert N. M. Watson. 2020. Cornucopia: Tempo-
[4] Baozeng Ding, Yeping He, Yanjun Wu, Alex Miller, and John Criswell. 2012. Baggy ral Safety for CHERI Heaps. In 2020 IEEE Symposium on Security and Privacy (SP).
Bounds with Accurate Checking. In 2012 IEEE 23rd International Symposium on IEEE, Piscataway, NJ, USA, 608–625. https://doi.org/10.1109/SP40000.2020.00098
Software Reliability Engineering Workshops. IEEE, 195–200. https://doi.org/ [21] J. Woodruff, A. Joannou, H. Xia, A. Fox, R. M. Norton, D. Chisnall, B. Davis, K.
10.1109/ISSREW.2012.24 Gudka, N. W. Filardo, A. T. Markettos, M. Roe, P. G. Neumann, R. N. M. Watson,
[5] Gregory J. Duck and Roland H. C. Yap. 2018. EffectiveSan: Type and Memory and S. W. Moore. 2019. CHERI Concentrate: Practical Compressed Capabilities.
Error Detection Using Dynamically Typed C/C++. In Proceedings of the 39th IEEE Transactions on Computers (TC) 68, 10 (2019), 1455–1469.
ACM SIGPLAN Conference on Programming Language Design and Implementation [22] Jonathan Woodruff, Robert N.M. Watson, David Chisnall, Simon W. Moore,
(PLDI) (Philadelphia, PA, USA). Association for Computing Machinery, New York, Jonathan Anderson, Brooks Davis, Ben Laurie, Peter G. Neumann, Robert Norton,
NY, USA, 181–195. and Michael Roe. 2014. The CHERI Capability Model: Revisiting RISC in an Age
[6] Google. 2017. Google Queue Hardening. https://security.googleblog.com/2019/ of Risk. In Proceedings of the 41st Annual International Symposium on Computer
05/queue-hardening-enhancements.html. Architecture (ISCA) (Minneapolis, Minnesota, USA). IEEE Press, Piscataway, NJ,
[7] John L. Henning. 2006. SPEC CPU2006 Benchmark Descriptions. SIGARCH USA, 457–468.
Computer Architecture News 34, 4 (Sept. 2006), 1–17. [23] Hongyan Xia, Jonathan Woodruff, Sam Ainsworth, Nathaniel W. Filardo, Michael
[8] Mohamed Tarek Ibn Ziad, Miguel A. Arroyo, Evgeny Manzhosov, Ryan Piersma, Roe, Alexander Richardson, Peter Rugg, Peter G. Neumann, Simon W. Moore,
and Simha Sethumadhavan. 2021. No-FAT: Architectural Support for Low Over- Robert N. M. Watson, and Timothy M. Jones. 2019. CHERIvoke: Characterising
head Memory Safety Checks. In Proceedings of the 48th Annual International Pointer Revocation Using CHERI Capabilities for Temporal Memory Safety. In
Symposium on Computer Architecture (ISCA) (Virtual Event, Spain). IEEE Press, Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchi-
Piscataway, NJ, USA, 916–929. https://doi.org/10.1109/ISCA52012.2021.00076 tecture (MICRO) (Columbus, OH, USA). Association for Computing Machinery,
[9] Sagar Karandikar, Howard Mao, Donggyu Kim, David Biancolin, Alon Amid, New York, NY, USA, 545–557.
Dayeol Lee, Nathan Pemberton, Emmanuel Amaro, Colin Schmidt, Aditya [24] Shengjie Xu, Wei Huang, and D. Lie. 2021. In-fat pointer: hardware-assisted
Chopra, Qijing Huang, Kyle Kovacs, Borivoje Nikolic, Randy Katz, Jonathan tagged-pointer spatial memory safety defense with subobject granularity pro-
Bachrach, and Krste Asanović. 2018. FireSim: FPGA-accelerated Cycle-exact tection. In Proceedings of the 26th ACM International Conference on Architectural
Scale-out System Simulation in the Public Cloud. In Proceedings of the 45th An- Support for Programming Languages and Operating Systems (ASPLOS). Associa-
nual International Symposium on Computer Architecture (ISCA) (Los Angeles, tion for Computing Machinery, New York, NY, USA, 224–240.
California). IEEE Press, Piscataway, NJ, USA, 29–42. https://doi.org/10.1109/ [25] Jerry Zhao, Ben Korpan, Abraham Gonzalez, and Krste Asanovic. 2020. Sonic-
ISCA.2018.00014 BOOM: The 3rd Generation Berkeley Out-of-Order Machine. (May 2020).