Hazard 3
Hazard 3
Updated: 2024-Aug-07
Table of Contents
1. Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1. Architectural Overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1. Pipeline Stages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2. Bus Interfaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.3. Multiply/Divide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2. List of RISC-V Specifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Configuration and Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Hazard3 Source Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Top-level Modules. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3. FPGA Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4. ASIC Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5. Interfaces (Top-level Ports). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5.1. Interfaces Common to All Wrappers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5.2. Interfaces for 1-port AHB5 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5.3. Interfaces for 2-port AHB5 CPU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.6. Configuration Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.1. Reset state configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.2. Standard RISC-V ISA support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.6.3. Custom Hazard3 Extensions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6.4. CSR support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6.5. External interrupt support . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.6. Identification Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6.7. Performance/size options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3. CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1. Standard M-mode Identification CSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.1. mvendorid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.2. marchid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.3. mimpid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.1.4. mhartid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.5. mconfigptr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.6. misa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2. Standard M-mode Trap Handling CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.1. mstatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.2. mstatush . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3. medeleg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.4. mideleg . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.5. mie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.6. mip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.7. mtvec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.8. mscratch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.9. mepc. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.10. mcause. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.11. mtval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2.12. mcounteren . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3. Standard Memory Protection CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.1. pmpcfg0…3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.3.2. pmpaddr0…15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4. Standard M-mode Performance Counters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.1. mcycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.2. mcycleh . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.3. minstret . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.4.4. minstreth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.5. mhpmcounter3…31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.6. mhpmcounter3…31h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.7. mcountinhibit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.4.8. mhpmevent3…31 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5. Standard Trigger CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.1. tselect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5.2. tdata1…3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.6. Standard Debug Mode CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6.1. dcsr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.6.2. dpc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6.3. dscratch0. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6.4. dscratch1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7. Custom Debug Mode CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.7.1. dmdata0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.8. Custom Interrupt Handling CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8.1. meiea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8.2. meipa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8.3. meifa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.8.4. meipra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8.5. meinext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.8.6. meicontext . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.9. Custom Memory Protection CSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.9.1. pmpcfgm0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.10. Custom Power Control CSRs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.10.1. msleep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4. Custom Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1. Xh3irq: Hazard3 interrupt controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2. Xh3pmpm: M-mode PMP regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3. Xh3power: Hazard3 power management. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.1. h3.block . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.2. h3.unblock . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4. Xh3bextm: Hazard3 bit extract multiple . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1. h3.bextm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.2. h3.bextmi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5. Debug . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1. Debug Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.2. Implementation-defined behaviour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3. Debug Module to Core Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Appendix A: Instruction Cycle Counts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.1. RV32I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
A.2. M Extension. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.3. A Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.4. C Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.5. Privileged Instructions (including Zicsr) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
A.6. Bit Manipulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
A.7. Zcb Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.8. Zcmp Extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
A.9. Branch Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
Chapter 1. Introduction
Hazard3 is a configurable 3-stage RISC-V processor, implementing:
• M: integer multiply/divide/modulo
• C: compressed instructions
• F: Fetch
◦ Predecodes register numbers rs1/rs2, for faster register file read and register bypass
◦ Contains the address match logic for the optional branch predictor
• X: Execute
◦ Contains the read and write ports for the CSR file
1
◦ The ALU result is valid by the end of stage X
• M: Memory
The instruction fetch address phase is best thought of as residing in stage X. The 2-cycle feedback
loop between jump/branch decode into address issue in stage X, and the fetch data phase in stage F,
is what defines Hazard3’s jump/branch performance.
This document often refers to F, X and M as stages 1, 2 and 3 respectively. This numbering is useful
when describing dependencies between values held in different pipeline stages, as it makes the
direction and distance of the dependency more apparent.
Hazard3 implements either one or two AHB5 bus manager ports. Use the single-port configuration
when ease of integration is a priority, since it supports simpler bus topologies. The dual-port
configuration adds a dedicated port for instruction fetch. Use the dual-port configuration for
maximum frequency and the best clock-for-clock performance.
Hazard3 uses AHB5 specifically, rather than older versions of the AHB standard, because of its
support for global exclusives. This is a bus feature that allows a processor to perform an ordered
read-modify-write sequence with a guarantee that no other processor has written to the same
address range in between. Hazard3 uses this to implement multiprocessor support for the A
(atomics) extension. Single-processor support for the A extension does not require these additional
signals.
AHB5 is one of the two protocols described in the AMBA 5 AHB protocol specification. Its full name
is (perhaps surprisingly) AMBA 5 AHB5. Refer to the protocol specification for more information
about this standard bus protocol.
1.1.3. Multiply/Divide
Set MUL_FAST to instantiate the single-cycle multiplier circuit. The fast multiplier returns results
either to stage 3 or stage 2, depending on the MUL_FASTER parameter.
By default the single-cycle multiplier only supports 32-bit mul, which is by far the most common of
the four multiply instructions. The remaining instructions still execute on the sequential
multiply/divide circuit. Set the MULH_FAST parameter to add single-cycle support for the high-half
instructions (mulh, mulhu and mulhsu), at the cost of additional logic delay and area.
2
The single-cycle multiplier is implemented as a simple * behavioural multiply, so that your tools can
infer the best multiply circuit for your platform. For example, Yosys infers DSP tiles on iCE40 UP5k
FPGAs. The multiplier is a self-contained module (in hdl/arith/hazard3_mul_fast.v), so you can
replace its implementation if you know of a faster or lower-area method for your platform.
Extension Specification
3
Chapter 2. Configuration and Integration
2.1. Hazard3 Source Files
Hazard3’s source is written in Verilog 2005, and is self-contained. It can be found here:
github.com/Wren6991/Hazard3/blob/stable/hdl. The file hdl/hazard3.f is a list of all the source files
required to instantiate Hazard3.
For more information on the Verilog 2005 language, refer to IEEE 1364-2005 (a PDF can be found
online).
Files ending with .vh are preprocessor include files used by the Hazard3 source. The following two
are particularly noteworthy:
• hazard3_config.vh: the main Hazard3 configuration header. Lists and describes Hazard3’s
global configuration parameters, such as ISA extension support
There are two ways to configure Hazard3 using these two files:
• Directly edit the parameter defaults in hazard3_config.vh in your local Hazard3 checkout (and
then let the top-level parameters default when instantiating Hazard3)
• Set all configuration parameters in your Hazard3 instantiation, and let the parameters
propagate down through the hierarchy
The latter method is recommended for mature projects because it supports multiple distinct
configurations of Hazard3 in the same system (for instance, a high-performance applications core
and a low-area control-plane core). You may find the former method more convenient for quick
hacking on the configuration.
• hazard3_cpu_1port
• hazard3_cpu_2port
These are both thin wrappers around the hazard3_core module. hazard3_cpu_1port has a single
AHB5 bus port which is shared for instruction fetch, loads, stores and AMOs. hazard3_cpu_2port has
two AHB5 bus ports, one for instruction fetch, and the other for loads, stores and AMOs. The 2-port
wrapper has higher potential for performance, but the 1-port wrapper may be simpler to integrate,
since there is no need to arbitrate multiple bus managers externally.
The core module hazard3_core can also be instantiated directly, which may be more efficient if
support for some other bus standard is desired. However, the interface of hazard3_core will not be
documented and is not guaranteed to be stable. By instantiating this module directly you are taking
4
on the risk that future Hazard3 releases may be incompatible with your integration.
You should synchronise the rst_n reset input externally. An example reset synchroniser is included
in the example SoC file, but the details depend on your FPGA synthesis flow and your platform-level
reset requirements.
It’s recommended to tie clk and clk_always_on to the exact same clock net to conserve global buffer
resources. Clock gating is supported on FPGA, but you must consult your toolchain documentation
for the correct primitives or infererence techniques.
When applying the clk_en clock enable signal to the clk input in conjunction with the Xh3power
extension, you must instantiate an external clock gate cell appropriate to your platform (such as an
AND-and-latch type). Do not use a behavioural AND gate to gate the clock.
You must synchronise resets externally according to your STA constraints and your system-level
reset strategy. Hazard3 uses an asynchronous active-low reset internally, but this can be adapted to
other types by inserting an appropriate synchroniser in your core integration.
5
Width In/Out Name Description
These signals are used in the implementation of internal sleep states as configured by the msleep
csr. They are used only when the Xh3power extension is enabled.
6
Width In/Out Name Description
All Debug Module signals should be connected to the signal with the matching name on the
Hazard3 Debug Module implementation.
7
Width In/Out Name Description
This subordinate bus port allows the standard System Bus Access (SBA) feature of the Debug
Module to share bus access with the core. Alternatively, use the standalone hazard3_sbus_to_ahb
adapter to provide dedicated SBA access to the system bus.
8
Width In/Out Name Description
Interrupt requests
NUM_IRQS In irq If Xh3irq is not configured, this is the RISC-
V external interrupt line (mip.meip) which
you should connect to an external
interrupt controller such as a standard
RISC-V PLIC. If Xh3irq is configured, this is
a vector of level-sensitive active-high
system interrupt requests, which the core’s
internal interrupt controller can route
through the mip.meip vector. Tie low if
unused.
This wrapper (hazard3_cpu_1port) adds a single standard AHB5 manager port. See the AMBA 5 AHB
specification from Arm for definitions of these signals in the context of the bus protocol.
9
Width In/Out Name Description
1 Out hwrite Driven high for a write transfer, low for a read
transfer.
3 Out hburst Tied off to 0 (SINGLE). Hazard3 does not issue bursts.
1 Out hmastlock Hazard3 does not use legacy bus locking, so this bit is
tied to 0.
1 In hresp Bus error signal. You must generate the complete two-
phase AHB response as per the AHB5 specification.
10
Width In/Out Name Description
32 Out hwdata Write data bus. The LSB of the bus is always aligned to
a 4-byte boundary. Hazard3 drives the correct byte
lanes depending on the transfer size and bits 1:0 of
the address. Remaining byte lanes have undefined
contents.
32 In hrdata Read data bus. The LSB of the bus is always aligned to
a 4-byte boundary, so ensure you drive the correct
byte lanes for narrow transfers.
This wrapper (hazard3_cpu_2port) adds two standard AHB5 manager ports, with signals prefixed i_
for instruction and d_ for data. See the AMBA 5 AHB specification from Arm for definitions of these
signals in the context of the bus protocol.
The I port only generates word-aligned word-sized read accesses. It does not use AHB5 exclusives.
When shared System Bus Access (SBA) is used, the SBA bus accesses are routed through the D port.
Port I (Instruction)
3 Out i_hburst Tied off to 0 (SINGLE). Hazard3 does not issue bursts.
1 Out i_hmastlock Hazard3 does not use legacy bus locking, so this bit is
tied to 0.
11
Port I (Instruction)
1 In i_hresp Bus error signal. You must generate the complete two-
phase AHB response as per the AHB5 specification.
32 Out i_hwdata Write data bus. Tied to all-zeroes as this port is read-
only.
Port D (Data)
1 Out d_hwrite Driven high for a write transfer, low for a read
transfer.
3 Out d_hburst Tied off to 0 (SINGLE). Hazard3 does not issue bursts.
1 Out d_hmastlock Hazard3 does not use legacy bus locking, so this bit is
tied to 0.
12
Port I (Instruction)
1 In d_hresp Bus error signal. You must generate the complete two-
phase AHB response as per the AHB5 specification.
32 Out d_hwdata Write data bus. The LSB of the bus is always aligned to
a 4-byte boundary. Hazard3 drives the correct byte
lanes depending on the transfer size and bits 1:0 of
the address. Remaining byte lanes have undefined
contents.
32 In d_hrdata Read data bus. The LSB of the bus is always aligned to
a 4-byte boundary, so ensure you drive the correct
byte lanes for narrow transfers.
RESET_VECTOR
Address of the first instruction executed after Hazard3 comes out of reset.
MTVEC_INIT
Bits clear in MTVEC_WMASK will never change from this initial value. Bits set in MTVEC_WMASK
can be written/set/cleared as normal.
EXTENSION_A
Support for the A extension: atomic read/modify/write. 0 for disable, 1 for enable.
Default value: 1
13
EXTENSION_C
Support for the C extension: compressed (variable-width). 0 for disable, 1 for enable.
Default value: 1
EXTENSION_M
Support for the M extension: hardware multiply/divide/modulo. 0 for disable, 1 for enable.
Default value: 1
EXTENSION_ZBA
Support for Zba address generation instructions. 0 for disable, 1 for enable.
Default value: 0
EXTENSION_ZBB
Support for Zbb basic bit manipulation instructions. 0 for disable, 1 for enable.
Default value: 0
EXTENSION_ZBC
Support for Zbc carry-less multiplication instructions. 0 for disable, 1 for enable.
Default value: 0
EXTENSION_ZBS
Support for Zbs single-bit manipulation instructions. 0 for disable, 1 for enable.
Default value: 0
EXTENSION_ZBKB
Requires: EXTENSION_ZBB. (Since Zbb and Zbkb have a large overlap, this flag enables only those
instructions which are in Zbkb but aren’t in Zbb. Therefore both flags must be set for full Zbkb
support.)
Default value: 0
EXTENSION_ZCB:
Requires: EXTENSION_C. (Some Zcb instructions also require Zbb or M, as they are 16-bit aliases of
32-bit instructions present in those extensions.)
14
Note Zca is equivalent to C, as we do not support the F extension.
Default value: 0
EXTENSION_ZCMP
Requires: EXTENSION_C.
Default value: 0
EXTENSION_ZIFENCEI
Support for the fence.i instruction. When the branch predictor is not present, this instruction is
optional, since a plain branch/jump is sufficient to flush the instruction prefetch queue. When the
branch predictor is enabled (BRANCH_PREDICTOR is 1), this instruction must be implemented.
Default value: 0
EXTENSION_XH3BEXTM
Custom bit manipulation instructions for Hazard3: h3.bextm and h3.bextmi. See Xh3bextm: Hazard3
bit extract multiple.
Default value: 0
EXTENSION_XH3IRQ
Default value: 0
EXTENSION_XH3PMPM
Custom PMPCFGMx CSRs to enforce PMP regions in M-mode without locking. See Xh3pmpm: M-
mode PMP regions.
Default value: 0
EXTENSION_XH3POWER
Custom power management controls for Hazard3. This adds the msleep CSR, and the h3.block and
h3.unblock hint instructions. See Xh3power: Hazard3 power management
Default value: 0
15
2.6.4. CSR support
CSR_M_MANDATORY
Bare minimum CSR support e.g. misa. This flag is an absolute requirement for compliance with the
RISC-V privileged specification. However, the privileged specification itself is an optional extension.
Hazard3 allows the mandatory CSRs to be disabled to save a small amount of area in deeply-
embedded implementations.
Default value: 1
CSR_M_TRAP
Default value: 1
CSR_COUNTER
Include the basic performance counters (cycle/instret) and relevant CSRs. Note that these
performance counters are now in their own separate extension (Zicntr) and are no longer
mandatory.
Default value: 0
U_MODE
Support the U (user) privilege level. In U-mode, the core performs unprivileged bus accesses, and
software’s access to CSRs is restricted. Additionally, if the PMP is included, the core may restrict U-
mode software’s access to memory.
Requires: CSR_M_TRAP.
Default value: 0
PMP_REGIONS
Number of physical memory protection regions, or 0 for no PMP. PMP is more useful if U-mode is
supported, but this is not a requirement.
Hazard3’s PMP supports only the NAPOT and(if PMP_GRAIN is 0) NA4 region types.
Requires: CSR_M_TRAP.
Default value: 0
16
PMP_GRAIN
This is the G parameter in the privileged spec, which defines the granularity of PMP regions.
Minimum PMP region size is 1 << (G + 2) bytes.
If G > 0, pmcfg.a can not be set to NA4 (attempting to do so will set the region to OFF instead).
If G > 1, the G - 1 LSBs of pmpaddr are read-only-0 when pmpcfg.a is OFF, and read-only-1 when
pmpcfg.a is NAPOT.
Default value: 0
PMP_HARDWIRED
PMPADDR_HARDWIRED: If a bit is 1, the corresponding region’s pmpaddr and pmpcfg registers are
read-only, with their values fixed when the processor is instantiated. PMP_GRAIN is ignored on
hardwired regions.
Hardwired regions are far cheaper, both in area and comparison delay, than dynamically
configurable regions.
Hardwired PMP regions are a good option for setting default U-mode permissions on regions which
have access controls outside of the processor, such as peripheral regions. For this case it’s
recommended to make hardwired regions the highest-numbered, so they can be overridden by
lower-numbered dynamic regions.
PMP_HARDWIRED_ADDR
Values of pmpaddr registers whose PMP_HARDWIRED bits are set to 1. Has no effect on PMP
regions which are not hardwired.
PMP_HARDWIRED_CFG
Values of pmpcfg registers whose PMP_HARDWIRED bits are set to 1. Has no effect on PMP regions
which are not hardwired.
DEBUG_SUPPORT
Support for run/halt and instruction injection from an external Debug Module, support for Debug
Mode, and Debug Mode CSRs.
Default value: 0
17
BREAKPOINT_TRIGGERS
Requires: DEBUG_SUPPORT
Default value: 0
NUM_IRQS
NUM_IRQS: Number of external IRQs. Minimum 1, maximum 512. Note that if EXTENSION_XH3IRQ
(Hazard3 interrupt controller) is disabled then multiple external interrupts are simply OR’d into
mip.meip.
Default value: 1
IRQ_PRIORITY_BITS
Default value: 0
IRQ_INPUT_BYPASS
Disable the input registers on the external interrupts, to reduce latency by one cycle. Can be applied
on an IRQ-by-IRQ basis.
MVENDORID_VAL
Value of the mvendorid CSR. JEDEC JEP106-compliant vendor ID, or all-zeroes. 31:7 is continuation
code count, 6:0 is ID. Parity bit is not stored.
MIMPID_VAL
Value of the mimpid CSR. Implementation ID for this specific version of Hazard3. Should be a git
hash, or all-zeroes.
18
Default value: all-zeroes.
MHARTID_VAL
Value of the mhartid CSR. Each Hazard3 core has a single hardware thread. Multiple cores should
have unique IDs.
MCONFIGPTR_VAL
Value of the mconfigptr CSR. Pointer to configuration structure blob, or all-zeroes. Must be at least
4-byte-aligned.
REDUCED_BYPASS
Remove all forwarding paths except X→X (so back-to-back ALU ops can still run at 1 CPI), to save
area. This has a significant impact on per-clock performance, so should only be considered for
extremely low-area implementations.
Default value: 0
MULDIV_UNROLL
Default value: 1
MUL_FAST
Use single-cycle multiply circuit for MUL instructions, retiring to stage 3. The sequential
multiply/divide circuit is still used for MULH*
Default value: 0
MUL_FASTER
Retire fast multiply results to stage 2 instead of stage 3. Throughput is the same, but latency is
reduced from 2 cycles to 1 cycle.
Requires: MUL_FAST.
Default value: 0
MULH_FAST
Extend the fast multiply circuit to also cover MULH*, and remove the multiply functionality from
the sequential multiply/divide circuit.
19
Requires: MUL_FAST
Default value: 0
FAST_BRANCHCMP
Instantiate a separate comparator (eq/lt/ltu) for branch comparisons, rather than using the ALU.
Improves fetch address delay, especially if Zba extension is enabled. Disabling may save area.
Default value: 1
RESET_REGFILE
Whether to support reset of the general purpose registers. There are around 1k bits in the register
file, so the reset can be disabled e.g. to permit block-RAM inference on FPGA.
Default value: 1
BRANCH_PREDICTOR
Enable branch prediction. The branch predictor consists of a single BTB entry which is allocated on
a taken backward branch, and cleared on a mispredicted nontaken branch, a fence.i or a trap.
Successful prediction eliminates the 1-cyle fetch bubble on a taken branch, usually making tight
loops faster.
Requires: EXTENSION_ZIFENCEI
Default value: 0
MTVEC_WMASK
MTVEC_WMASK: Mask of which bits in mtvec are writable. Full writability (except for bit 1) is
recommended, because a common idiom in setup code is to set mtvec just past code that may trap,
as a hardware try {…} catch block.
• The vectoring mode can be made fixed by clearing the LSB of MTVEC_WMASK
• In vectored mode, the vector table must be aligned to its size, rounded up to a power of two.
20
Chapter 3. CSRs
The RISC-V privileged specification affords flexibility as to which CSRs are implemented, and how
they behave. This section documents the concrete behaviour of Hazard3’s standard and
nonstandard M-mode CSRs, as implemented.
All CSRs are 32-bit; MXLEN is fixed at 32 bits on Hazard3. All CSR addresses not listed in this section
are unimplemented. Accessing an unimplemented CSR will cause an illegal instruction exception
(mcause = 2). This includes all U-mode and S-mode CSRs.
Address: 0xf11
Vendor identifier. Read-only, configurable constant. Should contain either all-zeroes, or a valid
JEDEC JEP106 vendor ID using the encoding in the RISC-V specification.
31:7 bank The number of continuation codes in the vendor JEP106 ID. One less
than the JEP106 bank number.
6:0 offset Vendor ID within the specified bank. LSB (parity) is not stored.
3.1.2. marchid
Address: 0xf12
31 - 0: Open-source implementation
3.1.3. mimpid
Address: 0xf13
21
Bits Name Description
31:0 - Should contain the git hash of the Hazard3 revision from which the
processor was synthesised, or all-zeroes.
3.1.4. mhartid
Address: 0xf14
31:0 - Hazard3 cores possess only one hardware thread, so this is a unique
per-core identifier, assigned consecutively from 0.
3.1.5. mconfigptr
Address: 0xf15
3.1.6. misa
Address: 0x301
Read-only, constant. Value depends on which ISA extensions Hazard3 is configured with. The table
below lists the fields which are not always hardwired to 0:
22
3.2.1. mstatus
Address: 0x300
The below table lists the fields which are not hardwired to 0:
12:11 mpp Previous privilege level. If U-mode is supported, this register can
store the values 3 (M-mode) or 0 (U-mode). Otherwise, only 3 (M-
mode). If another value is written, hardware rounds to the nearest
supported mode.
3.2.2. mstatush
Address: 0x310
Hardwired to 0.
3.2.3. medeleg
Address: 0x302
Unimplemented, as neither U-mode traps nor S-mode are supported. Access will cause an illegal
instruction exception.
3.2.4. mideleg
Address: 0x303
Unimplemented, as neither U-mode traps nor S-mode are supported. Access will cause an illegal
instruction exception.
3.2.5. mie
Address: 0x304
23
Interrupt enable register. Not to be confused with mstatus.mie, which is a global enable, having the
final say in whether any interrupt which is both enabled in mie and pending in mip will actually
cause the processor to transfer control to a handler.
The table below lists the fields which are not hardwired to 0:
RISC-V reserves bits 16+ of mie/mip for platform use, which Hazard3 could use for
external interrupt control. On RV32I this could only control 16 external interrupts,
NOTE
so Hazard3 instead adds nonstandard interrupt enable registers starting at meiea,
and keeps the upper half of mie reserved.
3.2.6. mip
Address: 0x344
The RISC-V specification lists mip as a read-write register, but the bits which are
NOTE writable correspond to lower privilege modes (S- and U-mode) which are not
implemented on Hazard3, so it is documented here as read-only.
The table below lists the fields which are not hardwired to 0:
24
3.2.7. mtvec
Address: 0x305
Trap vector base address. Read-write. Exactly which bits of mtvec can be modified (possibly none) is
configurable when instantiating the processor, but by default the entire register is writable. The
reset value of mtvec is also configurable.
31:2 base Base address for trap entry. In Vectored mode, this is OR’d with the
trap offset to calculate the trap entry address, so the table must be
aligned to its total size, rounded up to a power of 2. In Direct mode,
base is word-aligned.
In the RISC-V specification, mode is a 2-bit write-any read-legal field in bits 1:0.
NOTE
Hazard3 implements this by hardwiring bit 1 to 0.
3.2.8. mscratch
Address: 0x340
Read-write 32-bit register. No specific hardware function — available for software to swap with a
register when entering a trap handler.
3.2.9. mepc
Address: 0x341
Exception program counter. When entering a trap, the current value of the program counter is
recorded here. When executing an mret, the processor jumps to mepc. Can also be read and written
by software.
On Hazard3, bits 31:2 of mepc are capable of holding all 30-bit values. Bit 1 is writable only if the C
extension is implemented, and is otherwise hardwired to 0. Bit 0 is hardwired to 0, as per the
specification.
All traps on Hazard3 are precise. For example, a load/store bus error will set mepc to the exact
address of the load/store instruction which encountered the fault.
3.2.10. mcause
Address: 0x342
Exception cause. Set when entering a trap to indicate the reason for the trap. Readable and writable
by software.
25
On Hazard3, most bits of mcause are hardwired to 0. Only bit 31, and enough least-
significant bits to index all exception and all interrupt causes (at least four bits), are
NOTE
backed by registers. Only these bits are writable; the RISC-V specification only
requires that mcause be able to hold all legal cause values.
The most significant bit of mcause is set to 1 to indicate an interrupt cause, and 0 to indicate an
exception cause. The following interrupt causes may be set by Hazard3 hardware:
Cause Description
Cause Description
2 Illegal instruction
3 Breakpoint
11 Environment call
3.2.11. mtval
Address: 0x343
Hardwired to 0.
3.2.12. mcounteren
Address: 0x306
Counter enable. Control access to counters from U-mode. Not to be confused with mcountinhibit.
31:3 - RES0
26
Bits Name Description
Configuration registers for up to 16 physical memory protection regions. Only present if PMP
support is configured. If so, all 4 registers are present, but some registers may be
partially/completely hardwired depending on the number of PMP regions present.
By default, M-mode has full permissions (RWX) on all of memory, and U-mode has no permissions.
A PMP region can be configured to alter this default within some range of addresses. For every
memory location executed, loaded or stored, the processor looks up the lowest active region that
overlaps that memory location, and applies its permissions to determine whether this access is
allowed. The full description can be found in the RISC-V privileged ISA manual.
Each pmpcfg register divides into four identical 8-bit chunks, each corresponding to one region, and
laid out as below:
6:5 - RES0
2 X Execute permission
1 W Write permission
0 R Read permission
3.3.2. pmpaddr0…15
27
Address registers for up to 16 physical memory protection regions. Only present if PMP support is
configured. If so, all 16 registers are present, but some may fully/partially hardwired.
pmpaddr registers express addresses in units of 4 bytes, so on Hazard3 (a 32-bit processor with no
virtual address support) only the lower 30 bits of each address register are implemented.
The interpretation of the pmpaddr bits depends on the A mode configured in the corresponding
pmpcfg register field:
• For NA4, the entire 30-bit PMP address is matched against the 30 MSBs of the checked address.
• For NAPOT, pmpaddr bits up to and including the least-significant zero bit are ignored, and the
remaining bits are matched against the MSBs of the checked address.
Address: 0xb00
Lower half of the 64-bit cycle counter. Readable and writable by software. Increments every cycle,
unless mcountinhibit.cy is 1, or the processor is in Debug Mode (as dcsr.stopcount is hardwired to 1).
If written with a value n and read on the very next cycle, the value read will be exactly n. The RISC-
V spec says this about mcycle: "Any CSR write takes effect after the writing instruction has otherwise
completed."
3.4.2. mcycleh
Address: 0xb80
Upper half of the 64-bit cycle counter. Readable and writable by software. Increments on cycles
where mcycle has the value 0xffffffff, unless mcountinhibit.cy is 1, or the processor is in Debug
Mode.
This includes when mcycle is written on that same cycle, since RISC-V specifies the CSR write takes
place after the increment for that cycle.
3.4.3. minstret
Address: 0xb02
Lower half of the 64-bit instruction retire counter. Readable and writable by software. Increments
with every instruction executed, unless mcountinhibit.ir is 1, or the processor is in Debug Mode (as
dcsr.stopcount is hardwired to 1).
If some value n is written to minstret, and it is read back by the very next instruction, the value read
will be exactly n. This is because the CSR write logically takes place after the instruction has
otherwise completed.
28
3.4.4. minstreth
Address: 0xb82
Upper half of the 64-bit instruction retire counter. Readable and writable by software. Increments
when the core retires an instruction and the value of minstret is 0xffffffff, unless mcountinhibit.ir
is 1, or the processor is in Debug Mode.
3.4.5. mhpmcounter3…31
Hardwired to 0.
3.4.6. mhpmcounter3…31h
Hardwired to 0.
3.4.7. mcountinhibit
Address: 0x320
Counter inhibit. Read-write. The table below lists the fields which are not hardwired to 0:
3.4.8. mhpmevent3…31
Hardwired to 0.
Address: 0x7a0
3.5.2. tdata1…3
29
3.6. Standard Debug Mode CSRs
This section describes the Debug Mode CSRs, which follow the 0.13.2 RISC-V debug specification.
The Debug section gives more detail on the remainder of Hazard3’s debug implementation,
including the Debug Module.
3.6.1. dcsr
Address: 0x7b0
Debug control and status register. Access outside of Debug Mode will cause an illegal instruction
exception. Relevant fields are implemented as follows:
31:28 xdebugver Hardwired to 4: external debug support as per RISC-V 0.13.2 debug
specification.
1:0 prv Read the privilege state the core was in when it entered Debug
Mode, and set the privilege state it will be in when it exits Debug
Mode. If U-mode is implemented, the values 3 and 0 are supported.
Otherwise hardwired to 3.
Cause Description
30
Cause Description
3 Processor entered Debug Mode due to a halt request, or a reset-halt request present
when the core reset was released.
4 Processor entered Debug Mode after executing one instruction with single-stepping
enabled.
Cause 5 (resethaltreq) is never set by hardware. This event is reported as a normal halt, cause 3.
Cause 2 (trigger) is never used because there are no triggers. (TODO?)
3.6.2. dpc
Address: 0x7b1
Debug program counter. When entering Debug Mode, dpc samples the current program counter,
e.g. the address of an ebreak which caused Debug Mode entry. When leaving debug mode, the
processor jumps to dpc. The host may read/write this register whilst in Debug Mode.
3.6.3. dscratch0
Address: 0x7b2
To provide data exchange between the Debug Module and the core, the Debug Module’s data0
register is mapped into the core’s CSR space at a read/write M-custom address — see dmdata0.
3.6.4. dscratch1
Address: 0x7b3
Address: 0xbff
The Debug Module’s internal data0 register is mapped to this CSR address when the core is in debug
mode. At any other time, access to this CSR address will cause an illegal instruction exception.
The 0.13.2 debug specification allows for the Debug Module’s abstract data registers
to be mapped into the core’s CSR address space, but there is no Debug-custom space,
NOTE
so the read/write M-custom space is used instead to avoid conflict with future
versions of the debug specification.
The Debug Module uses this mapping to exchange data with the core by injecting csrr/csrw
instructions into the prefetch buffer. This in turn is used to implement the Abstract Access Register
31
command. See Debug.
This CSR address is given by the dataaddress field of the Debug Module’s hartinfo register, and
hartinfo.dataaccess is set to 0 to indicate this is a CSR mapping, not a memory mapping.
Address: 0xbe0
External interrupt enable array. Contains a read-write bit for each external interrupt request: a 1
bit indicates that interrupt is currently enabled. At reset, all external interrupts are disabled.
If enabled, an external interrupt can cause assertion of the standard RISC-V machine external
interrupt pending flag (mip.meip), and therefore cause the processor to enter the external interrupt
vector. See meipa.
There are up to 512 external interrupts. The upper half of this register contains a 16-bit window
into the full 512-bit vector. The window is indexed by the 5 LSBs of the write data. For example:
csrrs a0, meiea, a0 // Read IRQ enables from the window selected by a0
csrw meiea, a0 // Write a0[31:16] to the window selected by a0[4:0]
csrr a0, meiea // Read from window 0 (edge case)
The purpose of this scheme is to allow software to index an array of interrupt enables (something
not usually possible in the CSR space) without introducing a stateful CSR index register which may
have to be saved/restored around IRQs.
31:16 window 16-bit read/write window into the external interrupt enable array
15:5 - RES0
4:0 index Write-only self-clearing field (no value is stored) used to control
which window of the array appears in window.
3.8.2. meipa
Address: 0xbe1
External interrupt pending array. Contains a read-only bit for each external interrupt request.
Similarly to meiea, this register is a window into an array of up to 512 external interrupt flags. The
status appears in the upper 16 bits of the value read from meipa, and the lower 5 bits of the value
written by the same CSR instruction (or 0 if no write takes place) select a 16-bit window of the full
interrupt pending array.
A 1 bit indicates that interrupt is currently asserted. IRQs are assumed to be level-sensitive, and the
relevant meipa bit is cleared by servicing the requestor so that it deasserts its interrupt request.
32
When any interrupt of sufficient priority is both set in meipa and enabled in meiea, the standard
RISC-V external interrupt pending bit mip.meip is asserted. In other words, meipa is filtered by meiea
to generate the standard mip.meip flag. So, an external interrupt is taken when all of the following
are true:
• The interrupt priority is greater than or equal to the preemption priority in meicontext
31:16 window 16-bit read-only window into the external interrupt pending array
15:5 - RES0
4:0 index Write-only, self-clearing field (no value is stored) used to control
which window of the array appears in window.
3.8.3. meifa
Address: 0xbe2
External interrupt force array. Contains a read-write bit for every interrupt request. Writing a 1 to a
bit in the interrupt force array causes the corresponding bit to become pending in meipa. Software
can use this feature to manually trigger a particular interrupt.
There are no restrictions on using meifa inside of an interrupt. The more useful case here is to
schedule some lower-priority handler from within a high-priority interrupt, so that it will execute
before the core returns to the foreground code. Implementers may wish to reserve some external
IRQs with their external inputs tied to 0 for this purpose.
Bits can be cleared by software, and are cleared automatically by hardware upon a read of meinext
which returns the corresponding IRQ number in meinext.irq (no matter whether meinext.update is
written).
meifa implements the same array window indexing scheme as meiea and meipa.
31:16 window 16-bit read/write window into the external interrupt force array
15:5 - RES0
33
Bits Name Description
4:0 index Write-only, self-clearing field (no value is stored) used to control
which window of the array appears in window.
3.8.4. meipra
Address: 0xbe3
External interrupt priority array. Each interrupt has an (up to) 4-bit priority value associated with
it, and each access to this register reads and/or writes a 16-bit window containing four such priority
values. When less than 16 priority levels are available, the LSBs of the priority fields are hardwired
to 0.
When an interrupt’s priority is lower than the current preemption priority meicontext.preempt, it is
treated as not being pending. The pending bit in meipa will still assert, but the machine external
interrupt pending bit mip.meip will not, so the processor will ignore this interrupt. See meicontext.
31:16 window 16-bit read/write window into the external interrupt priority array,
containing four 4-bit priority values.
15:7 - RES0
6:0 index Write-only, self-clearing field (no value is stored) used to control
which window of the array appears in window.
3.8.5. meinext
Address: 0xbe4
Get next interrupt. Contains the index of the highest-priority external interrupt which is both
asserted in meipa and enabled in meiea, left-shifted by 2 so that it can be used to index an array of
32-bit function pointers. If there is no such interrupt, the MSB is set.
When multiple interrupts of the same priority are both pending and enabled, the lowest-numbered
wins. Interrupts with priority less than meicontext.ppreempt — the previous preemption
priority — are treated as though they are not pending. This is to ensure that a preempting interrupt
frame does not service interrupts which may be in progress in the frame that was preempted.
30:11 - RES0
10:2 irq Index of the highest-priority active external interrupt. Zero when no
external interrupts with sufficient priority are both pending and
enabled.
34
Bits Name Description
1 - RES0
3.8.6. meicontext
Address: 0xbe5
External interrupt context register. Configures the priority level for interrupt preemption, and
helps software track which interrupt it is currently in. The latter is useful when a common
interrupt service routine handles interrupt requests from multiple instances of the same
peripheral.
A three-level stack of preemption priorities is maintained in the preempt, ppreempt and pppreempt
fields. The priority stack is saved when hardware enters the external interrupt vector, and restored
by an mret instruction if meicontext.mreteirq is set.
The top entry of the priority stack, preempt, is used by hardware to ensure that only higher-priority
interrupts can preempt the current interrupt. The next entry, ppreempt, is used to avoid servicing
interrupts which may already be in progress in a frame that was preempted. The third entry,
pppreempt, has no hardware effect, but ensures that preempt and ppreempt can be correctly
saved/restored across arbitary levels of preemption.
31:28 pppreempt Previous ppreempt. Set to ppreempt on priority save, set to zero on
priority restore. Has no hardware effect, but ensures that when
meicontext is saved/restored correctly, preempt and ppreempt stack
correctly through arbitrarily many preemption frames.
23:21 - RES0
35
Bits Name Description
14:13 - RES0
1 clearts Write-1 self-clearing field. Writing 1 will clear mie.mtie and mie.msie,
and present their prior values in the mtiesave and msiesave of this
register. This makes it safe to re-enable IRQs (via mstatus.mie)
without the possibility of being preempted by the standard timer
and soft interrupt handlers, which may not be aware of Hazard3’s
interrupt hardware.
The clear due to clearts takes precedence over the set due to
mtiesave/msiesave, although it would be unusual for software to
write both on the same cycle.
0 mreteirq Enable restore of the preemption priority stack on mret. This bit is
set on entering the external interrupt vector, cleared by mret, and
cleared upon taking any trap other than an external interrupt.
The following is an example of an external interrupt vector (mip.meip) which implements nested,
prioritised interrupt dispatch using meicontext and meinext:
36
isr_external_irq:
// Save caller saves and exception return state whilst IRQs are disabled.
// We can't be pre-empted during this time, but if a higher-priority IRQ
// arrives ("late arrival"), that will be the one displayed in meinext.
addi sp, sp, -80
sw ra, 0(sp)
... snip
sw t6, 60(sp)
j get_next_irq
dispatch_irq:
// Preemption priority was configured by meinext update, so enable preemption:
csrsi mstatus, 0x8
// meinext is pre-shifted by 2, so only an add is required to index table
la a1, _external_irq_table
add a1, a1, a0
jalr ra, a1
get_next_irq:
// Sample the current highest-priority active IRQ (left-shifted by 2) from
// meinext, and write 1 to the LSB to tell hardware to tell hw to update
// meicontext with the preemption priority (and IRQ number) of this IRQ
csrrsi a0, meinext, 0x1
// MSB will be set if there is no active IRQ at the current priority level
bgez a0, dispatch_irq
no_more_irqs:
// Restore saved context and return from handler
lw a0, 64(sp)
csrw mepc, a0
lw a0, 68(sp)
csrw meicontext, a0
lw a0, 72(sp)
csrw mstatus, a0
lw ra, 0(sp)
... snip
lw t6, 60(sp)
37
addi sp, sp, 80
mret
Address: 0xbd0
PMP M-mode configuration. One bit per PMP region. Setting a bit makes the corresponding region
apply to M-mode (like the pmpcfg.L bit) but does not lock the region.
PMP is useful for non-security-related purposes, such as stack guarding and peripheral emulation.
This extension allows M-mode to freely use any currently unlocked regions for its own purposes,
without the inconvenience of having to lock them.
Note that this does not grant any new capabilities to M-mode, since in the base standard it is
already possible to apply unlocked regions to M-mode by locking them. In general, PMP regions
should be locked in ascending region number order so they can’t be subsequently overridden by
currently unlocked regions.
Note also that this is not the same as the "rule locking bypass" bit in the ePMP extension, which
does not permit locked and unlocked M-mode regions to coexist.
31:16 - RES0
15:0 m Regions apply to M-mode if this bit or the corresponding pmpcfg.L bit
is set. Regions are locked if and only if the corresponding pmpcfg.L
bit is set.
Address: 0xbf0
31:3 - RES0
38
Bits Name Description
1 powerdown Release the external power request when going to sleep. The
function of this is platform-defined — it may do nothing, it may do
something simple like clock-gating the fabric, or it may be tied to
some complex system-level power controller.
0 deepsleep Deassert the processor clock enable when entering the sleep state. If
a clock gate is instantiated, this allows most of the processor
(everything except the power state machine and the interrupt and
halt input registers) to be clock gated whilst asleep, which may
reduce the sleep current. This adds one cycle to the wakeup latency.
39
Chapter 4. Custom Extensions
Hazard3 implements a small number of custom extensions. All are optional: custom extensions are
only included if the relevant feature flags are set to 1 when instantiating the processor
(Configuration Parameters). Hazard3 is always a conforming RISC-V implementation, and when
these extensions are disabled it is also a standard RISC-V implementation.
If any one of these extensions is enabled, the x bit in misa is set to indicate the presence of a
nonstandard extension.
This extension does not add any instructions, but does add several CSRs:
• meiea
• meipa
• meifa
• meipra
• meinext
• meicontext
If this extension is disabled then Hazard3 supports a single external interrupt input (or multiple
inputs that it simply ORs together in an uncontrolled fashion), so an external PLIC can be used for
standard interrupt support.
Note that, besides the additional CSRs, this extension is effectively a slightly more complicated way
of driving the standard mip.meip flag (mip). The RISC-V trap handling CSRs themselves are always
completely standard.
This is useful when the PMP is used for non-security-related purposes such as stack guarding, or
trapping and emulation of peripheral accesses.
The msleep CSR controls how deeply the processor sleeps in the WFI sleep state. By default, a WFI is
40
implemented as a normal pipeline stall. By configuring msleep appropriately, the processor can gate
its own clock when asleep or, with a simple 4-phase req/ack handshake, negotiate power up/down
of external hardware with an external power controller. These options can improve the sleep
current at the cost of greater wakeup latency.
The hints allow processors to sleep until woken by other processors in a multiprocessor
environment. They are implemented on top of the standard WFI state, which means they interact in
the same way with external debug, and benefit from the same deep sleep states in msleep.
4.3.1. h3.block
Enter a WFI sleep state until either an unblock signal is received, or an interrupt is asserted that
would cause a WFI to exit.
If mstatus.tw is set, attempting to execute this instruction in privilege modes lower than M-mode
will generate an illegal instruction exception.
If an unblock signal has been received in the time since the last h3.block, this instruction executes
as a nop, and the processor does not enter the sleep state. Conceptually, the sleep state falls through
immediately because the corresponding unblock signal has already been received.
This instruction is encoded as slt x0, x0, x0, which is part of the custom nop-compatible hint
encoding space.
Example C macro:
.macro h3.block
slt x0, x0, x0
.endm
4.3.2. h3.unblock
Post an unblock signal to other processors in the system. For example, to notify another processor
that a work queue is now nonempty.
If mstatus.tw is set, attempting to execute this instruction in privilege modes lower than M-mode
will generate an illegal instruction exception.
This instruction is encoded as slt x0, x0, x1, which is part of the custom nop-compatible hint
encoding space.
41
Example C macro:
.macro h3.unblock
slt x0, x0, x1
.endm
4.4.1. h3.bextm
"Bit extract multiple", a multi-bit version of the bext instruction from Zbs. Perform a right-shift
followed by a mask of 1-8 LSBs.
Encoding (R-type):
42
: "r" (rs1), "r" (rs2), "i" ((((nbits) - 1) & 0x7) << 1)\
); \
__h3_bextm_rd; \
})
4.4.2. h3.bextmi
Encoding (I-type):
43
: "=r" (__h3_bextmi_rd) \
: "r" (rs1), "i" ((((nbits) - 1) & 0x7) << 6 | ((shamt) & 0x1f)) \
); \
__h3_bextmi_rd; \
})
44
Chapter 5. Debug
Hazard3, along with its external debug components, implements version 0.13.2 of the RISC-V debug
specification. It supports the following:
• Automatic trigger of abstract command (abstractauto) on data0 or Program Buffer access for
efficient memory block transfers from the host
• Support for multiple harts (multiple Hazard3 cores) connected to a single Debug Module (DM)
• The hart array mask registers, for applying run/halt/reset controls to multiple cores
simultaneously
• (Optional) System Bus Access, either through a dedicated AHB5 manager interface, or
multiplexed with a processor load/store port
• An upstream AMBA 3 APB port — the "Debug Module Interface" — for host access to the Debug
Module
• Some reset request/acknowledge signals which require careful handshaking with system-level
reset logic
45
The DM must be connected directly to the processors without intervening registers. This implies the
DM is in the same clock domain as the processors, so multiple processors on the same DM must
share a common clock.
Upstream of the DM is at least one Debug Transport Module, which bridges some host-facing
interface such as JTAG to the APB DM Interface. Hazard3 provides an implementation of a standard
RISC-V JTAG-DTM, but any APB master could be used. The DM requires at least 7 bits of word
addressing, i.e. 9 bits of byte address space.
An APB arbiter could be inserted here, to allow multiple transports to be used, provided the host(s)
avoid using multiple transports concurrently. This also admits simple implementation of self-hosted
debug, by mapping the DM to a system-level peripheral address space.
The clock domain crossing (if any) occurs on the downstream port of the Debug Transport Module.
Hazard3’s JTAG-DTM implementation runs entirely in the TCK domain, and instantiates a bus clock-
crossing module internally to bridge a TCK-domain internal APB bus to an external bus in the
processor clock domain.
It is possible to instantiate multiple DMs, one per core, and attach them to a single Debug Transport
Module. This is not the preferred topology, but it does allow multiple cores to be independently
clocked. In this case, the first DM must be located at address 0x0 in the DMI address space, and you
must set the NEXT_DM_ADDR parameter on each DM so that the debugger can walk the (null-
terminated) linked list and discover all the DMs.
46
• Halt-on-reset, selectable per-hart
Not implemented:
• Branch, jal, jalr and auipc are illegal in debug mode, because they observe PC: attempting to
execute will halt Program Buffer execution and report an exception in abstractcs.cmderr
• The dret instruction is not implemented (a special purpose DM-to-core signal is used to signal
resume)
• The DM’s data0 register is mapped into the core as a CSR, dmdata0, address 0xbff.
◦ The DM ignores attempted core writes to the CSR, unless the DM is currently executing an
abstract command on that core
◦ Used by the DM to implement abstract GPR access, by injecting CSR read/write instructions
• dcsr.stopcount and dcsr.stoptime are hardwired to 1 (no counter or internal timer increment in
debug mode)
• dcsr.mprven is hardwired to 0
See also Standard Debug Mode CSRs for more details on the core-side Debug Mode registers.
The debug host must use the Program Buffer to access CSRs and memory. This carries some
overhead for individual accesses, but is efficient for bulk transfers: the abstractauto feature allows
the DM to trigger the Program Buffer and/or a GPR tranfer automatically following every data0
access, which can be used for e.g. autoincrementing read/write memory bursts. Program Buffer
read/writes can also be used as abstractauto triggers: this is less useful than the data0 trigger, but
takes little extra effort to implement, and can be used to read/write a large number of CSRs
efficiently.
47
Abstract memory access is not implemented because, for bulk transfers, it offers no better
throughput than Program Buffer execution with abstractauto. Non-bulk transfers, while slower, are
still instantaneous from the perspective of the human at the other end of the wire.
The Hazard3 DM has experimental support for multi-core debug. Each core possesses exactly one
hardware thread (hart) which is exposed to the debugger. The RISC-V specification does not
mandate what mapping is used between the DM hart index hartsel and each core’s mhartid CSR, but
a 1:1 match of these values is the least likely to cause issues. Each core’s mhartid can be configured
using the MHARTID_VAL parameter during instantiation.
The DM’s data0 register is exposed to the core as a debug mode CSR. By issuing instructions to make
the core read or write this dummy CSR, the DM can exchange data with the core. To read from a
GPR x into data0, the DM issues a csrw data0, x instruction. Similarly csrr x, data0 will write data0
to that GPR. The DM always follows the CSR instruction with an ebreak, just like the implicit ebreak
at the end of the Program Buffer, so that it is notified by the core when the GPR read instruction
sequence completes.
48
Appendix A: Instruction Cycle Counts
All timings are given assuming perfect bus behaviour (no downstream bus stalls), and that the core
is configured with MULDIV_UNROLL = 2 and all other configuration options set for maximum
performance.
A.1. RV32I
Instruction Cycles Note
Integer Register-register
add rd, rs1, rs2 1
sub rd, rs1, rs2 1
slt rd, rs1, rs2 1
sltu rd, rs1, rs2 1
and rd, rs1, rs2 1
or rd, rs1, rs2 1
xor rd, rs1, rs2 1
sll rd, rs1, rs2 1
srl rd, rs1, rs2 1
sra rd, rs1, rs2 1
Integer Register-immediate
addi rd, rs1, imm 1 nop is a pseudo-op for addi x0, x0, 0
slti rd, rs1, imm 1
sltiu rd, rs1, imm 1
andi rd, rs1, imm 1
ori rd, rs1, imm 1
xori rd, rs1, imm 1
slli rd, rs1, imm 1
srli rd, rs1, imm 1
srai rd, rs1, imm 1
Large Immediate
lui rd, imm 1
auipc rd, imm 1
Control Transfer
[1]
jal rd, label 2
[1]
jalr rd, rs1, imm 2
49
Instruction Cycles Note
[1]
beq rs1, rs2, label 1 or 2 1 if correctly predicted, 2 if mispredicted.
[1]
bne rs1, rs2, label 1 or 2 1 if correctly predicted, 2 if mispredicted.
[1]
blt rs1, rs2, label 1 or 2 1 if correctly predicted, 2 if mispredicted.
[1]
bge rs1, rs2, label 1 or 2 1 if correctly predicted, 2 if mispredicted.
[1]
bltu rs1, rs2, label 1 or 2 1 if correctly predicted, 2 if mispredicted.
[1]
bgeu rs1, rs2, label 1 or 2 1 if correctly predicted, 2 if mispredicted.
A.2. M Extension
Timings assume the core is configured with MULDIV_UNROLL = 2 and MUL_FAST = 1. I.e. the sequential
multiply/divide circuit processes two bits per cycle, and a separate dedicated multiplier is present
for the mul instruction.
32 × 32 → 32 Multiply
mul rd, rs1, rs2 1
50
A.3. A Extension
Instruction Cycles Note
Load-Reserved/Store-Conditional
[2]
lr.w rd, (rs1) 1 or 2 2 if next instruction is dependent , an lr.w, sc.w or
[3]
amo*.w.
[2]
sc.w rd, rs2, (rs1) 1 or 2 2 if next instruction is dependent , an lr.w, sc.w or
[3]
amo*.w.
A.4. C Extension
All C extension 16-bit instructions are aliases of base RV32I instructions. On Hazard3, they perform
identically to their 32-bit counterparts.
A consequence of the C extension is that 32-bit instructions can be non-naturally-aligned. This has
no penalty during sequential execution, but branching to a 32-bit instruction that is not 32-bit-
aligned carries a 1 cycle penalty, because the instruction fetch is cracked into two naturally-aligned
bus accesses.
CSR Access
csrrw rd, csr, rs1 1
csrrc rd, csr, rs1 1
csrrs rd, csr, rs1 1
csrrwi rd, csr, imm 1
csrrci rd, csr, imm 1
csrrsi rd, csr, imm 1
51
Instruction Cycles Note
Trap Request
ecall 3 Time given is for jumping to mtvec
ebreak 3 Time given is for jumping to mtvec
52
Instruction Cycles Note
clmulr rd, rs1, rs2 1
• RV32I base ISA: lbu, lh, lhu, sb, sh, zext.b (alias of andi), not (alias of xori)
• M extension: mul
53
A.9. Branch Predictor
Hazard3 includes a minimal branch predictor, to accelerate tight loops:
• If a predicted-taken branch is not taken, the predictor state is cleared, and it will be predicted
nontaken on its next execution.
Correctly predicted branches execute in one cycle: the frontend is able to stitch together the two
nonsequential fetch paths so that they appear sequential. Mispredicted branches incur a penalty
cycle, since a nonsequential fetch address must be issued when the branch is executed.
[1] A jump or branch to a 32-bit instruction which is not 32-bit-aligned requires one additional cycle, because two naturally aligned
bus cycles are required to fetch the target instruction.
[2] If an instruction in stage 2 (e.g. an add) uses data from stage 3 (e.g. a lw result), a 1-cycle bubble is inserted between the pair. A
load data → store data dependency is not an example of this, because data is produced and consumed in stage 3. However, load
data → load address would qualify, as would e.g. sc.w → beqz.
[3] A pipeline bubble is inserted between lr.w/sc.w and an immediately-following lr.w/sc.w/amo*, because the AHB5 bus standard
does not permit pipelined exclusive accesses. A stall would be inserted between lr.w and sc.w anyhow, so the local monitor can be
updated based on the lr.w data phase in time to suppress the sc.w address phase.
[4] AMOs are issued as a paired exclusive read and exclusive write on the bus, at the maximum speed of 2 cycles per access, since
the bus does not permit pipelining of exclusive reads/writes. If the write phase fails due to the global monitor reporting a lost
reservation, the instruction loops at a rate of 4 cycles per loop, until success. If the read reservation is refused by the global
monitor, the instruction generates a Store/AMO Fault exception, to avoid an infinite loop.
[5] The single-register variants of cm.popret and cm.popretz take the same number of cycles as the two-register variants, because of
an internal load-use dependency on the loaded return address.
54