AOS RISC-V - TowardsAlways OnHeapMemorySafety
Figure 2: pacma instruction using QARMA to generate a PAC. Figure 3: Data-pointer signing proposed in AOS [10].
Load/store bndstr/bndclr
Table 1: Code examples in C, LLVM IR, and assembly code.
63 VA_SIZE-1 0
PAC Pointer Addr Code Examples
Memory C code char *ptr = (char *) malloc(10);
Check Unit Perform
Unit (LSU) != 0? LLVM IR code %3 = call noalias i8* @malloc(i64 10) #3
(MCU) bounds checking
(frontend) %4 = call i8* @llvm.aos.pacma.p0i8(i8* %3, i64 0)
No bounds checking %5 = call i8* @llvm.aos.bndstr.p0i8(i8* %4, i64 10)
L1 Cache
Assembly Code call malloc@plt
(backend) pacma a0, a0, a1
Figure 4: Memory check unit (MCU) structure. bndstr a0, a0, a1
3.2 Memory Check Unit (MCU) Table 2: New control and status registers in AOS-RISC-V.
To process new instructions, AOS adds a memory check unit (MCU)
responsible for bounds checking and metadata management. Being CSR Name Permission Description
located next to a load-store unit (LSU), the MCU takes all memory
enableAOS R/W Switch to enable AOS-RISC-V
instructions, i.e., loads and stores, as well as bounds instructions,
baseAddrOfHBT R/W Base address of an HBT
i.e., bndstr and bndclr. Depending on the instruction type, the numWaysOfHBT R/W Number of ways of an HBT
MCU generates memory requests to load or store bounds from the
HBT in memory. Since the HBT is indexed by PACs (embedded in numBndstrFails R/W Number of bounds-store failures
numBndclrFails R/W Number of bounds-clear failures
memory addresses), the locations of bounds are calculated using
numBndchkFails R/W Number of bounds-check failures
the base address of the HBT and PACs, i.e., HBT[PAC].
To adapt the MCU design to a real processor design, we decide
to break it into two separate queues, namely memory check queue
(MCQ) and bounds queue (BDQ). This design choice is based on the of an HBT is stored in the baseAddrOfHBT CSR, and the number
following two observations: 1) only bounds instructions require the of ways of an HBT is configured via the numWaysOfHBT CSR. To
bounds metadata field and 2) the number of bounds instructions is interface with such CSRs, we provide a custom system call, namely
much less than the number of memory instructions, so the MCU __aos_set(), that is inserted into the entry of a user program at
mostly gets full with memory instructions, causing backpressure compilation and sets the CSRs with given argument values. Be-
to the issue stage. For better resource utilization and performance, sides to those CSRs for configuration, we also add extra CSRs to
we choose to size the MCQ sufficiently large such that it can hold count the number of failures of bounds operations; numBndstrFails,
as many inflight memory instructions as possible and size the BDQ numBndclrFails, and numBndchkFails. After a program is termi-
reasonably small. nated, the kernel reads and prints those CSRs to let a user know
whether any bounds operation has failed during program execution.
3.3 Compiler Support Process management. Besides the hardware configurations through
Since the LLVM 9.0.0 release, the RISC-V target became no-long ex- our system call, the kernel needs to keep track of the information of
perimental, and the backend started to support full codegen for the each user process. To do so, we add new fields to the process struc-
RV32I and RV64I base RISC-V instruction set variants. As such, we ture in the linux kernel, i.e., task_struct. Those fields are initialized
design new compiler passes to the optimizer and the RISC-V back- upon process creation and are properly set by our custom system
end in the LLVM 9.0.1 [11]. First, the aos-riscv-opt optimizer pass call. During a context switch, if the current process is enabled with
is designed to detect dynamic memory allocation and deallocation AOS-RISC-V, the kernel saves its configuration information in the
calls and insert new intrinsic functions at the LLVM intermediate process structure, including the base address and the number of
representation (IR) level. The inserted intrinsic functions are de- ways of an HBT assigned to the process. Then, the kernel checks if
tected at the aos-riscv backend pass and are replaced with new the next process to execute is also enabled with AOS-RISC-V. If so,
instructions, as shown in Table 1. Additionally, the aos-riscv-opt the kernel overwrites the CSRs with the configuration information
pass inserts a custom system call to a program entry, which enables of the next process before it begins its execution. Otherwise, the
the AOS mode and configures hardware registers. We introduce kernel initializes the CSRs with zero to disable AOS features for the
more details of the custom system call in Section 3.4. next process.
Ratio (%)
Figure 5 illustrates the normalized execution time of AOS-RISC-V 40
across the SPEC 2006 workloads. Most benchmarks show moderate
runtime overhead (20% on average) in our evaluation. Our analysis 20
reveals that the performance overhead is mainly derived from 1) in- 0
creased cache port contentions due to additional memory accesses bzip2 gobmk hmmer sjeng libquantum
for bounds checking and 2) the cache pollution due to the extra
bounds metadata. As the number of memory accesses requiring
Figure 6: The ratio of signed loads and stores requiring
bounds checking increases, the increased cache port contentions
bounds checking over the total memory accesses.
can delay regular memory accesses, slowing down normal program
execution. In addition, as the memory footprint of bounds metadata
increases, useful cache lines that could have been accessed by regu-
lar memory accesses in the near future can be evicted from caches, more bounds access requests need to be generated during iterative
leading to increased memory latency for subsequent accesses. bounds search in the HBT.
Notably, we observe that sjeng has near-zero runtime overhead.
As shown in Figure 6, the ratio of signed loads and stores over the 7 FUTURE WORK
total memory accesses is only 1%. This result indicates that only 1% AOS-RISC-V is currently under active development, and we leave
of the entire memory accesses require bounds checking, and 99% of several tasks as future work.
memory accesses do not cause extra overhead. In contrast, bzip2 Dynamic bounds-table resizing. In AOS, the set-associative HBT
and hmmer exhibit the high ratios of signed memory accesses close structure is introduced to handle possible PAC collisions and to
to 95% and 56%, respectively. Table 4 shows the number of sign- accommodate multiple bounds metadata for each PAC. Neverthe-
ing and bounds instructions executed. Note that we insert a pacma less, the HBT can still overflow when a certain application creates
and a bndstr after eachmalloc() and a bndclr and xpacm before numerous bounds metadata at runtime. AOS addresses this con-
each free(), as shown in Figure 3. While most applications exe- cern by adopting the dynamic bounds-table resizing method. In
cute a marginal number of additional instructions, hmmer is shown our current design, we only allocate a fixed-size HBT and leave the
to be the most malloc-intensive application among the evaluated implementation of dynamic bounds-table resizing as future work.
applications. Exception handling for bounds-operation failure. To support
precise debugging or promptly prevent malicious attacks at runtime,
6 DISCUSSION a new class of exception would need to be defined to alert the user
In our current design, we observe a higher runtime overhead than of a memory safety violation case. Currently, we only count the
that of AOS. This discrepancy seems to be caused by the limited number of bounds-operation failures and report the number after a
data fetch width supported in the BOOM core. AOS assumes the user process is terminated.
data fetch width of 64 bytes supported in modern processors, and Enhancing security guarantees. As mentioned, AOS considers
therefore up to eight sets of bounds metadata can be brought into the heap exploitation as the most prevalent and problematic attack
the CPU pipeline with a single memory request. However, our vector and focus on heap memory safety. To achieve complete
baseline BOOM core supports at most 8-byte data fetch width, so memory safety, more sophisticated compiler techniques could be
AOS-RISC-V: Towards Always-On Heap Memory Safety
