Processor Design Suite
Processor Design Suite
Processor Design Suite
University of Toronto
Final Report
Title: Parameterized Processor Design Suite
Project ID #: 0192000
Section: 5
Page 1
2 Executive Summary
This document describes the design of a Parameterized Processor Design Suite.
Processors are traditionally implemented on Application Specific Integrated Circuit
(ASIC) chips. Designing a processor on an ASIC chip is usually very costly and time
consuming. As a result, programmable logic chips have been used as an alternative
processor platform. These chips allow users to change their designs without incurring
the manufacturing costs and delays involved in ASIC design. Programmable logic
chips, however, require the user to know a Hardware Description Language (HDL) to be
able to program them. Our project removes this need while retaining the benefits of
using programmable logic chips. Our software suite allows a user to easily generate
custom processors based on their needs without knowing an HDL.
The software suite is a collection of three programs, a Graphical User Interface
(GUI), a Hardware Generation Program (HGP), and an assembler. The GUI allows a
user to specify the processor parameters. The GUI passes on the information that it
received from the user to the HGP. The HGP uses this information to identify all of the
hardware resources that the processor will need, all of the signals that will be
transmitted between these resources, and a set of instructions that the processor will
implement. Once the identification is complete, the HGP generates a set of HDL files
that describe the processor. These HDL files are based on a set of hardware templates
that are a major portion of this design project. These templates dictate the underlying
structure of the processor. The HGP also passes on its information to the assembler.
The assembler can take this information from the HGP and create machine-readable
code from a user’s source code.
Several processors that have been generated using our software suite have
been analyzed. The analysis shows that the size and speed of the processor are
negatively affected by larger parameter values. An increase in bus width has an
especially detrimental effect on the ALU. It is therefore beneficial to use special
purpose processors that exactly meet the user’s needs.
The objectives of this project are to generate a set of customizable processor
components to be used as a template for any user specified processor, to create an
easy to use, flexible, and portable software suite, and to analyze the performance
trade-offs of different parameterizations of a processor. We have met all three of these
objectives.
Page 2
3 Team Members’ Contributions
Tables 3.1 through 3.4 outline the contributions made by the authors to the
design project. Table 3.5 outlines the contributions each author made to the writing of
the final report.
Page 3
Task Individual Responsible For the Task
Page 4
Task Individual Responsible For the Task
Page 5
Task Individual Responsible For the Task
Page 6
Section Individual Responsible for the Section
9 Design
• Sections 9.1, 9.3, 9.7.1, Navid Azizi
9.7.3, 9.10
• Sections 9.2, 9.8, 9.9 Michael Krejcik
• Sections 9.4, 9.6, 9.7.4 Tomasz Czajkowski
• Sections 9.5, 9.7.2, 9.7.5 Borys Bradel
Table 3.5: Contributions made by the Team Members for the Final Report
Page 7
4 Old Milestones
There are four main milestones that relate to the actual design of our project:
5 Revised Milestones
The following sections describe the various factors that affected our timeline and
a comparison between our accomplished and original milestones. The different
timelines that we had throughout the project are described in Appendices 1, 2, and 3.
Appendix 1 contains our original timeline. Appendix 2 contains the timeline that we had
at the time of our interim reports and Appendix 3 contains our most recent timeline.
Page 8
this reason, these two tasks were moved to a later date in the schedule. To make room
for these tasks, everything else that could be moved to an earlier date in the schedule
were moved. As a result Mike concentrated exclusively on the GUI and assembler,
while the other three people concentrated on the hardware and the Hardware
Generation Program. Using this division of responsibilities we were able to do
everything in parallel and harness the synergy of having several people working on two
tightly coupled problems.
Unfortunately we had a heavier than expected workload in the second term. This
caused us to complete some of our milestones, namely the controller of the processor
and the Hardware Generation Program, later than we wanted. Hardware limitations
also made it impossible to place our processor on an FPGA. We could not fully compile
the processor at home because our versions of Max+plusII do not have the proper
licenses to create designs for the appropriate hardware. There is not enough RAM on
the undergraduate sparc machines for us to compile the design, and when we tried to
compile the processor on a friend's EECG account, we ran out of hard drive space on
the partition. So we have concentrated on compiling individual sections of the processor
and looking at their performance characteristics. Our final timeline is presented in
Appendix 3.
Page 9
6 Table of Contents
2 Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 2
3 Team Members’ Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 3
4 Old Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 8
5 Revised Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 8
5.1 Reasons for Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 8
5.2 Milestone Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 9
6 Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 10
7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 14
8 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 15
8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 15
8.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 15
8.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 16
8.4 Design and Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 16
8.4.1 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 16
8.4.2 Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 17
8.4.3 Report Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 17
9 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 18
9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 18
9.1.1 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 18
9.1.2 Hardware Generation Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 19
9.1.2.1 Parameterized HDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 19
9.1.3 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 20
9.2 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 21
9.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 21
9.2.1.1 Ease of Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 22
9.2.1.2 Provide Information About the Program . . . . . . . . . . . . . . . . . . . . . . . . . Page 22
9.2.1.3 Display Parameters the User can Choose . . . . . . . . . . . . . . . . . . . . . . . . Page 22
9.2.1.4 Limit User Input to only Acceptable Values . . . . . . . . . . . . . . . . . . . . . . . Page 23
9.2.1.5 Cross Platform Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 23
9.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 23
9.2.3 Evolution of Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 24
9.2.3.1 Advantages of the Initial GUI Design . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 24
9.2.3.2 Disadvantages of the Initial GUI Design . . . . . . . . . . . . . . . . . . . . . . . . . Page 24
9.2.3.3 The New GUI Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 25
9.2.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 26
9.2.5 Current Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 28
9.3 Processor Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 29
9.3.1 Registerfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 29
9.3.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 29
9.3.1.1.1 Input Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 30
9.3.1.1.2 Output Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 31
9.3.1.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 31
9.3.2 Arithmetic and Logic Unit (ALU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 32
Page 10
9.3.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 32
9.3.2.1.1 One-Bit ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 32
9.3.2.1.2 Bit-Wise Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 33
9.3.2.1.3 Complete ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 35
9.3.2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 36
9.3.3 Processor Datapath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 37
9.3.3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 38
9.3.3.1.1 IR and Register Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 38
9.3.3.1.2 Input A, ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 39
9.3.3.1.3 Input B, ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 39
9.3.3.1.4 Shift Amount Input, ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 40
9.3.3.1.5 Memory Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 41
9.3.3.1.6 Program Counter (PC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 41
9.3.3.1.7 Register Data Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 42
9.3.3.2 Processor Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 43
9.3.3.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 44
9.4 Processor Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 45
9.4.1 Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 45
9.4.2 Design Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 46
9.4.2.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 46
9.4.2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 46
9.5 Processor Input/Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 47
9.5.1 I/O Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 47
9.5.2 I/O Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 48
9.5.3 I/O Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 49
9.5.3.1 Mouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 49
9.5.3.2 Generic PS/2 Port and the Keyboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 50
9.5.3.3 Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 51
9.5.4 Mouse Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 51
9.5.5 Generic PS/2 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 52
9.5.6 VGA Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 53
9.5.7 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 53
9.6 Memory/Cache Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 54
9.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 54
9.6.2 Mapping Function and Replacement Algorithm . . . . . . . . . . . . . . . . . . . . . . . . Page 54
9.6.3 Design Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 55
9.7 Hardware Generation Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57
9.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57
9.7.2 Read XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57
9.7.3 Instruction Set Based Component Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57
9.7.3.1 ProcParameter Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 58
9.7.3.1.1 Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 59
9.7.3.1.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 59
9.7.3.2 Resource Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 60
9.7.3.3 Signal Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 60
Page 11
9.7.3.4 Instruction Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 61
9.7.3.5 Processor Specification Determination . . . . . . . . . . . . . . . . . . . . . . . . . . Page 62
9.7.4 Write VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 63
9.7.4.1 Script Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 63
9.7.4.2 Script Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 63
9.7.4.2.1 Script Processing Ideology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 64
9.7.4.2.2 Script Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 64
9.7.4.2.3 Expression Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 65
9.7.4.2.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 66
9.7.5 Write XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 66
9.8 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.1.1 Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.1.2 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 69
9.8.4 Current Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 69
9.9 Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 70
9.9.1 User Information XML Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 70
9.9.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 70
9.9.1.2 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 70
9.9.2 Assembler XML Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 72
9.9.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 72
9.9.2.2 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 72
9.9.3 Current Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 73
9.10 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 76
10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 79
10.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 80
Appendix 1: Timeline from Technical Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 81
Appendix 2: Timeline from Interim Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 82
Appendix 3: Final Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 83
Appendix 4: Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 84
Appendix 5: Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 85
Appendix 6: Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 86
A6.1 Registerfile Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 86
A6.2 ALU Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 87
A6.3 PS/2 Mouse Port Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 90
A6.4 Generic PS/2 Port Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 91
A6.5 Memory Mapped Bus Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 91
Appendix 7: Sample Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 93
A7.1 GUI: Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 93
A7.2 XML Input/Output: Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 94
A7.3 Instruction Set Based Component Selection: Java Code . . . . . . . . . . . . . . . . . . . . . Page 95
A7.4 Script Processing: Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 96
A7.5 Assembler: Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 97
Page 12
A7.6 Datapath: VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 98
A7.7 Control: VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 99
A7.8 Cache Controller: VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 100
A7.9 I/O: VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 101
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 102
Page 13
7 Acknowledgments
The authors would like to acknowledge their fellow students for giving their
technical advice during the course of this project. In particular the authors would like to
thank Deshanand Singh for his contribution to the project. Stephen Brown, the project’s
supervisor, is also acknowledged for his inspiration and motivation throughout the term
of the project.
Furthermore, the author’s would like to acknowledge that a significant portion of the
concepts required for the completion of the project were acquired in the various courses
offered by the University of Toronto's Edward S. Rogers Sr. Department of Electrical
and Computer Engineering. Moreover, the publications cited throughout this report
provided the authors with new ideas and different perspectives, and assisted the
authors immeasurably. Applicable references are cited as much as possible in the
report, and we sincerely apologize for any omissions that we have made in the
references.
Page 14
8 Introduction
Our project involves the creation of a software suite with an easy to use interface
that will give users the opportunity to implement custom processors based on their
needs. The software suite includes a Graphical User Interface (GUI) that allows the
user to specify the parameters for a processor, a Hardware Generation Program (HGP)
that produces the Hardware Description Language (HDL) code for the user-specified
processor, and an assembler that allows the user to create programs for this processor.
The project raises questions concerning the extent to which a processor can be
parameterized. Our processor design must achieve a balance between a too generic
model, which will be hard to implement, and a more specialized model that will limit the
choices a user can make.
8.1 Motivation
Processors are traditionally implemented on Application Specific Integrated
Circuit (ASIC) chips. Designing a processor on an ASIC chip is usually very costly and
time consuming due to the long and expensive manufacturing process that is involved
in the physical creation of the chip. As a result, programmable logic chips have been
used as an alternative processor platform. The two main types of programmable logic
chips are Field Programmable Gate Array (FPGA) and Complex Programmable Logic
Device (CPLD) chips. These chips allow users to change their designs without
incurring the manufacturing costs and delays involved in ASIC design. Programmable
logic chips, however, require the user to know an HDL to be able to program them. Our
project removes this need while retaining the benefits of using programmable logic
chips.
8.2 Background
To lay a better foundation for our project a brief overview of the following topics
will be given: an introduction to FPGA/CPLD chips, the differences between Reduced
Instruction Set Computers (RISC) and Complex Instruction Set Computers (CISC) and
the MIPS architecture. These topics are relevant to the report and provide background
to some of the design decisions made in the project.
An FPGA/CPLD chip is an integrated circuit that is characterized by an array of
reprogrammable logic blocks and a flexible configurable interconnect structure. For
example, an Altera FLEX EPF10K40 has 2304 logic blocks [1]. Logic blocks can be
viewed as building blocks and when these building blocks are combined together they
can implement many different functions. The con nection between different blocks is
achieved via a matrix of wires and switches between the blocks. The matrix along with
the logic blocks allows an FPGA/CPLD chip to be customized to meet the individual
user’s needs.
There are two kinds of paradigms for processors: RISC and CISC. CISC chips
are based on the principle of providing users with many complex instructions, with each
complex instruction encapsulating many simpler instructions. Conversely, RISC chips
Page 15
provide users with only simple instructions. RISC instructions are usually a constant
size, perform one function per clock cycle, and access memory with a limited number of
instructions. This allows for a simpler processor design and the ability to pipeline
instructions, which increases the speed of the processor.
The MIPS architecture is a specific implementation of the RISC ideas presented
above. MIPS makes an additional simplification that memory can only be accessed
with load and store instructions, thus allowing the processor control and datapath to be
simplified even further. The processor implemented in this project is based on the
MIPS architecture.
8.3 Objectives
There are several objectives that we wanted to achieve. Section 8.4, Design and
Measurement Methodology, explains our approach to attaining these objectives while
Section 9, Design, describes how we actually achieved them. The objectives of our
project are to:
Design a set of hardware components to be used as a basis of a
user-defined processor
Create a set of easy to use, portable, and flexible software that allows a user
to create a processor without knowing a Hardware Description Language
Analyze the performance and benefits of using our approach
Page 16
run on many operating systems and XML files are well-defined text files that should be
readable on any operating system.
Processors are usually implemented on ASIC chips. FPGA chips have been
used only to add functionality to these processors in the form of coprocessors [2-6]. The
methodology for our project differs from the above in that, in addition to producing
reconfigurable coprocessors or modules and connecting them to the processor, the
whole processor is reconfigurable. Our project follows many of the design principles
from “Rapid Prototyping of a RISC Architecture for Implementation in FPGAs[7] ”, but
allows for much greater flexibility.
Page 17
9 Design
9.1 Overview
Our project can be divided into many different components. These components
include a Graphical User Interface (GUI), a Hardware Generation Program (HGP), and
an assembler.
The design flow for the use of the suite of software tools is show below in Figure
9.1.1.
An overview of all three components, GUI, HGP, and assembler will be given
below, and then a discussed of the whole project will be given in greater detail in the
following chapters.
Page 18
fundamentally the same on all platforms and no extra time will be needed to port the
program to other platforms.
To pass the information gained from the user to the HGP, the GUI must create a
file standard. An XML (Extensible Markup Language) file has been chosen for this
purpose. XML is very flexible in its ability to provide information about objects; custom
fields and their properties can be created and parsed easily. Due to its flexibility and
simplicity XML has become a de facto standard for many data sharing protocols, and
therefore for future expandability and extensibility XML was decided upon as the file
format for the transfer of information between different modules of the project.
Page 19
processor that perform functions on data. These components in a MIPS processor
implementation include the register file, the ALU, and the internal memory. Since the
project will be a modified version of the MIPS implementation, the datapath will contain
other components such as special function generator s. The control system of the
processor commands the datapath and memor y according to the instructions of the
program.
The HDL code written to implement these components was written in a fashion
that allows the HGP to easily load and change the code to implement the desired user
configured processor. More specifically, the customization of the HDL code was
achieved by using techniques such as writing components as an amalgamation of
repeated smaller modules, and other techniques.
9.1.3 Assembler
An assembler was written to transform source code written by a human into
machine readable code that is decoded by the custom processor. Un like most other
assemblers, this assembler must be able to deal with the availability of different
instructions and different machine codings for instructions. The assembler will gather
all the information needed about the particular hardware it is assembling from the XML
file produced by the HGP, and then translate the user supplied assembly language
source code.
The assembler will output the machine code that will be stored in the processors
instruction memory in a format understandable by the FPGA used.
Page 20
9.2 Graphical User Interface
The GUI needs to inform the user about how to use our program and to display
the parameters of the processor that the user can choose. Currently, the design allows
the user to choose values for the number of registers, bus width, address width, and the
set of assembly instructions the processor will recognize.
9.2.1 Design
In order to meet our design requirements, the GUI should, in decreasing order of
importance:
1. Be easy to use
2. Provide information about the program
3. Display parameters the user can choose
Assembly instructions
Bus width
Address width
Number of registers
4. Limit user input to only acceptable values
5. Be cross platform portable
The current design of the GUI that meets these requirements is shown in Figure
9.2.1.
Figure 9.2.1: Graphical User Interface for the Parameterized Processor Design Suite
Page 21
9.2.1.1 Ease of Use
Ease of use was chosen as the number one criterion for the GUI because the
GUI is the first thing potential users see. If users are intimidated, have a hard time
using the program, or never learn about all the features, then a lot of our hard work
goes to waste.
A number of features have been added to the GUI to make it easy to use. The
top-level menu contains a Help menu item. The Help menu item is one of the first
things the user sees when the application is displayed. Clicking on it will provide
detailed instructions to the user. Another way ease of use was added into the interface
was through the use of standard controls. Users of common graphical based operating
systems recognize standard controls like buttons, check boxes, menus, sliders, and
tabbed panes. This makes the interface intuitive even though the user may not
understand the purpose of the controls. Tool Tips were added to all of the standard
controls to give the user extra information about the functionality of the controls. Also
standard controls were grouped into four logical sections: Assembly Commands,
Number of Registers, Bus Width, and Address Width. This way the user knows what
choices they are making depending on what area they are in. All of these features
provide the user with an easily accessible interface.
Page 22
9.2.1.4 Limit User Input to only Acceptable Values
Limiting user input to acceptable values removes the need for parsing the input
after it is entered. This also adds less complexity to the user interface.
The GUI achieves this goal by using checkboxes and sliders for input. These
controls have only a limited range of values that the designer can control.
9.2.2 Implementation
The GUI can be partitioned into two sections: the front end and the back end.
The front end collects user information about the processor and the back end outputs
the user information into an XML file. The XML file is used by the next process in the
Parameterized Processor Design Suite.
The front end of the GUI is written in Java using the Java Swing library. One of
the major concerns about the implementation of the front end of the GUI was the
possibility of future changes to the specification. Therefore the implementation had to
be written in a way to make it easily adaptable to change.
The implementation of the front end of the GUI has very few hard coded
elements in it so as to be as flexible as possible. Instead, the front end relies on an
external file to produce content and only hard codes the manner in which the content
will be displayed. For instance the GUI knows that it will display a number of check
boxes organized into categories. The actual categories for the checkboxes, the number
of checkboxes, the captions that will go on the checkboxes, help content for the
checkboxes and Tool Tips are all read in from a file before the GUI is displayed. The
same holds true for the sliders in the GUI. This scheme means that no extra code has
to be written if the project requires additional assembly instructions to be added to the
GUI checkbox list. To accomplish this the GUI uses an XML file that contains all the
necessary information. A lower level discussion of this process is included in Section
9.2.3.3.
The back end of the GUI has one requirement, which is to output the user’s
selections into a User Information XML Language (UIXL) compliant XML file.
Information about our design specifications of UIXL are included in Section 9.9. The
backend starts generating the file when the user presses the button on the user
interface marked “Create It!”
Page 23
9.2.3 Evolution of Graphical User Interface
The GUI has gone through a cycle of design, coding, and testing. The current
model of the interface is a result of what was learned from the initial design. Figure
9.2.2 shows the initial design of the GUI.
The initial GUI design already contained several key features that have been
used in the current design. The initial design effectively separates the interface into
three regions where the user selects the assembly commands to be included, the
number of registers and the bus width. Standard components, such as check boxes
and sliders, have been used in the design. As mentioned earlier, using standard
components has two advantages. First it minimizes the user’s learning curve since
these components are familiar to the user. Second it limits the user’s input to correct
values.
Page 24
User testing was conducted in order to obtain useful feedback about the
interface. This testing revealed that although the interface was straightforward it
provided little additional information about the choices the user was making. For
instance, the assembly instructions should have detailed information about the
instructions upon request. Also the commands should be grouped in a logical order so
that similar commands are presented next to each other. Finally general instructions
for the entire program should be included.
The initial design also lacks flexibility. Instructions are hard coded into the
program. This makes changes to the instruction set difficult to implement in the GUI.
After the initial design was completed it was determined that another option should be
added to the user interface that allows the user to specify the address width. With the
initial design this option would have to be hard coded into the design as well.
Page 25
<?xml version='1.0'?>
<!-- Parameters XML File used by Graphical User Interface -->
<ParameterList>
<Category name="Arithmetic">
<Instruction>
<Name>
Add
</Name>
<InstructionName>
add
</InstructionName>
<Parameter1>
rd
</Parameter1>
<Parameter2>
rs
</Parameter2>
<Parameter3>
rt
</Parameter3>
<Result>
rd = rs + rt
</Result>
<Description>
The sum of registers rs and rt is placed in register rd
</Description>
</Instruction>
.
.
.
</Category>
.
.
.
</ParameterList>
Figure 9.2.3 : Layout of the Parameters XML file used by the GUI to read in instructions. Note that
ellipses are used in the diagram to indicate that a parent tag may include more than one child tag.
Specifically the ParameterList tag may include many Category tags and Category tags may include
many Instruction tags.
In the current design, after the GUI reads in the Parameter XML file, it displays
the options to the user. Assembly commands in the same category are now displayed
on the same tabbed pane. Using tabbed panes allows the number of commands on
the screen at one time to be reduced. A help menu has been added to increase
usability. The help menu contains information about the instructions as well as the
overall usage of the program. Features such as tool tips have also been added to help
users understand the options they are choosing. Context sensitive help has also been
added to the GUI. Context sensitive help means that when a user presses F1, a help
menu appears with information specific to the part of the GUI that currently has focus.
Finally another slider has been added to allow the user to select the address width
parameter. A diagram of the new features added to the GUI is shown in Figure 9.2.4.
9.2.4 Testing
Testing of the interface was done using two methods. First, users unfamiliar with
the GUI performed usability testing. Secondly tests were developed to make sure the
GUI was stable. These tests were further divided into testing the reading of the
parameter XML file, general GUI functionality, and writing of the UIXL output file.
Finally integration testing was performed to make sure the interface between the GUI
and the HGP works.
Page 26
Usability testing is important to the GUI in order to discover what potential users
have trouble with. Since the change from the old design of the GUI and the addition of
all the help features usability tests results have improved greatly.
The actual code of the GUI was tested in two different ways. First general
behavior of functions was tested to make sure they perform adequately under normal
conditions. Next, testing was performed to make sure the program degrades gracefully
under adverse conditions. For instance tests such as removing the input file, entering
erroneous data, or inputting too much data were performed.
Since the interface between modules was decided on near the beginning of the
project, integration testing produced few errors.
Figure 9.2.4: Revised design of the graphical user interface with added Help Menu, Tabbed panes, Tool
Tips, and Address Width selection
Page 27
9.2.5 Current Status
Currently the graphical user interface is complete. All of the features discussed
have been implemented. This includes the new features discussed in the interim
report. Testing of the GUI has resulted in a stable product .
Page 28
9.3 Processor Organization
9.3.1 Registerfile
The registerfile is the portion of the processor that keeps intermediate results of
program instructions. It is characterized by a range of registers, which hold the
intermediate results, along with functions to retrieve values and to store new values.
9.3.1.1 Design
The registerfile required for the design of the processor needed two main
capabilities. First, due to the Reduced Instruction Set C omputing (RISC) paradigm that
the processor design was following, the registerfile needed to be able to supply two
values at once. Second, the registerfile needed to be able to be parameterized to
handle different bus widths and to have a varying amount of registers available.
To accommodate the above two requirements the registerfile needed to fit the
Very High Speed Integrated Circuit Hardware Description Language (VHDL) component
description seen in Figure 9.3.1.
COMPONENT registerfile
GENERIC (
BUSWIDTH: INTEGER:=32;
NUMREG: INTEGER:=32;
LOG2NUMREG: INTEGER:=5);
PORT (
write : IN STD_LOGIC;
clk : IN STD_LOGIC;
readreg1 : IN STD_LOGIC_VECTOR(LOG2NUMREG - 1 DOWNTO 0);
readreg2 : IN STD_LOGIC_VECTOR(LOG2NUMREG - 1 DOWNTO 0);
writereg : IN STD_LOGIC_VECTOR(LOG2NUMREG - 1 DOWNTO 0);
writedata : IN STD_LOGIC_VECTOR(BUSWIDTH - 1 DOWNTO 0);
readdata1 : OUT STD_LOGIC_VECTOR(BUSWIDTH - 1 DOWNTO 0);
readdata2 : OUT STD_LOGIC_VECTOR(BUSWIDTH - 1 DOWNTO 0));
END COMPONENT;
Figure 9.3.1: Registerfile VHDL Component Declaration
Page 29
Figure 9.3.2: Registerfile
Page 30
9.3.1.1.2 Output Circuitry
The output circuitry of the registerfile allows data to be extracted from the
registerfile. As per the requirements for RISC architecture, there are two output ports.
The Read Reg 1 and Read Reg 2 signals (which are integers from 0 to the number of
registers -1) select the appropriate register through the use of a multiplexor .
The multiplexors in the design are quite large (due to the multiplexing of a large
number of registers and the fact that each register output is also composed of many
wires) and therefore use many logic cells within the FPGA. A, simpler and more space
efficient design would consist of tri-state buffers, as seen in Figure 9.3.3, used to
connect all the registers to the output signal, but since FPGA’s do not have
programmable tri-state buffers, multiplexors must be used instead.
The multiplexors were created with the aid of LPMs (Library of Parametrizable
Modules) from within Max+plus II and are parameterizable, and therefore the specific
multiplexor needed can be easily created.
9.3.1.2 Testing
The registerfile module has been completed and fully tested in simulation thus
meeting the mid-November deadline. Please see Appendix 6 where the timing
diagrams for the test cases are available. The test cases outline a scenario where
information is stored in each register and then the information is retrieved from each
output port.
Page 31
9.3.2 Arithmetic and Logic Unit (ALU)
The ALU is the component of the processor that performs all the arithmetic
operations such as addition and multiplication as well as all the logical operations such
as comparison testing and shifting.
9.3.2.1 Design
The ALU required for the design of the processor needed to be extensively
modular so that the removal and addition of a rithmetic and logic operations by the HGP
could be performed without upsetting the rest of the ALU. Furthermore, the ALU
needed to be able to handle different bus-widths and therefore, not only needed to be
modular per instruction, but also in terms of the size of inputs it could handle.
The design of the ALU thus proceeded from the design of a one-bit ALU, which
could then be integrated into any size ALU.
END COMPONENT;
Figure 9.3.4: One-Bit ALU VHDL Description
The one-bit ALU is characterized by two parameters and a series of inputs. The
NUMBERBITWISEALUOP parameter is the number of bit-wise operations the ALU can
perform. The second parameter is actually a transformation of the first
(ceil(log2(NUMBERBITWISEALUOP)) and is included due to the inflexibility of VHDL
noted above.
The inputs to the one-bit ALU also constitute a parameterizable aspect of the
ALU that the HGP can play with. As can be seen from the design of the one-bit ALU in
Figure 9.3.5 the inputs slt (set on less than) through sne (set on not equal) can be
included, or not included depending on the user specifications. For example, If the user
Page 32
Figure 9.3.5: Design of One-Bit ALU
needs a ‘less than’ comparison then the slt (Less in Figure 9.3.5) will be included,
otherwise it will not be part of the design. The other sXX inputs can be added or
removed in the same fashion by the HGP. Furthermore, as can be seen from Figure
9.3.4, one-bit operations such as AND and XNOR can be removed from the one-bit
ALU without affecting the other ALU operations. The u ser specifications will determine
which gates are included in the one-bit ALU during compile time.
The control signal Bit Invert is used to allow for subtraction. If subtraction, and
all other operations that need subtraction (such as branching or comparisons) are not
required in the ALU, this control signal and the accompanying multiplexor can be
removed from the one-bit ALU by the HGP.
Page 33
HGP can generate as many one-bit ALUs as necessary to develop the customized
processor.
The only difference between the one-bit ALU’s used in the total ALU is the
source of their input. The first one-bit ALU receives its Less, Greater and Equal signals
from the comparison-checking module, and all others receive ‘0’ as their input. This
mechanism is used to provide the ‘set on less than’ and comparable instructions. The
comparison-checking module receives the result of input A minus input B (not shown in
Figure 9.3.6) and thus can determine if the input A was equal, less than or greater than
input B. Furthermore the comparison-checking module will determine if overflow has
occurred during the arithmetic operation.
Page 34
9.3.2.1.3 Complete ALU
With the ALU designed above, the processor can perform a limited number of
operations. For a more extensive list of operations, such as multiply and divide, the
ALU designed above must be incorporated into a larger ALU. In Figure 9.3.7, the
bit-wise ALU is just one part of a larger modular ALU.
The complete ALU contains modules for shi fting, rotating, multiplying, and
dividing, as well as containing the bit wise ALU. These former modules were created
with the aid of LPM’s contained in MAX+plusII and therefore can be easily modified by
the HGP to handle different bus-widths. Furthermore, any or all of these modules can
be removed by the HGP without affecting the functionality of the other modules.
Page 35
With the removal of either multiply or divide, the two mu ltiplexors adjacent to the
Hi and Lo registers in Figure 9.3.7 can be removed by the HGP. With the removal of
both multiply and divide the two registers can also be removed. The Hi and Lo registers
are contained in the ALU due to observation that both multiply and divide will produce
results that are 2*N bits wide given inputs that are N bits wide, and thus the result must
be stored within the ALU so that the program can store each portion of the result
individually.
COMPONENT total alu
From the VHDL component description of the o f the complete ALU it can be
seen that the ALU can be parameterized with five parameters:
1. N: bus-width
2. ALUOPSIZE: ceil(log2(number of operations available))
3. BITWISEALUOPSIZE: ceil(log2(number of bit-wise operations))
4. FUNCTALUOPSIZE: ceil(log2(number of operations - number of bit-wise opr.)
5. SHIFTSIZE: ceil(log2(N))
With these parameters the number of operations and the bus-widths can be
selected, and the HGP must then remove the unne eded modules from the ALU to
construct the final ALU needed for the customized processor.
9.3.2.2 Testing
The ALU module has been completed and fully tested in simulation thus meeting
the end of November deadline. Please see Appendix 6 where the timing d iagrams for
the test cases are available. The test cases outline all the ALU operations being
performed on 2 sets of inputs.
Page 36
9.3.3 Processor Datapath
The processor datapath is the organization of different components within the
processor. The datapath displays how components such as the registerfile and ALU
are connected to each other. The general datapath for the processor that the HGP will
create can be seen in Figure 9.3.9. (For simplicity control signals are not drawn in full,
but are replaced with red stubs)
Page 37
9.3.3.1 Design
The design of the datapath followed the MIPS architecture and was influenced
by the instruction set that the HGP supports (displayed in Appendix 5). The datapath
needs to be able to handle all possible instructions, and their combinations. To
illustrate the design decisions made for the datapath, the use of each multiplexor in the
design will be discussed. The Instruction Register (IR), which plays an important part in
the datapath, will also be discussed.
Page 38
With the three different instruction formats it can be seen that the inputs to ‘Read
Reg 1’ and ‘Read Reg 2’ always come from the same portion of the IR. The ‘Write Reg’
input however may come from the 4th portion of the IR (in reg-type instructions) or the
3rd portion of the IR (in immediate-type instructions). Since the ‘Write Reg’ input may
come from different portions of the IR, a multiplexor is needed to select between the
different inputs depending on which type of instruction is being processed . The select
signal (not shown in Figure 10) for the multiplexor comes from the processor control
which determines what instruction is being performed from analyzing the opcode field in
the IR.
Note that this portion of the datapath cannot be simplified unless only reg-type
instructions were available in the processor, but such a design would serve no practical
purpose due to the inability to use memory.
Page 39
3. The output of the Sign Extension Module, which receives an immediate value
from the IR and expands it to fill the whole bus-width. This scenario is used
with any immediate instruction such as ‘add immediate.’ (immediate
instructions have a constant embedded into them)
4. The output of the Shift Left Module, wh ich receives a sign extended address
from the Sign Extension Module and shifts it left to convert the word offset.
This scenario is used with branch instructions to convert the word offset to a
byte offset.
If immediate instructions are not needed then the third input to the mu ltiplexor
can be eliminated by the HGP, and if there are no branch instructions in the user
specifications the fourth input to the multiplexor can also be eliminated. Furthermore, if
both immediate and branch instructions are not needed by the user, the Sign Extension
and Shift Left Modules can be both eliminated by the HGP during the creation of the
customized processor.
Page 40
Figure 9.3.13: Shift Amount Input, ALU
The address for the memory may originate from either the PC, to retrieve the
next instruction in the program, or AluOut, where the computed address for a load or
store instruction is stored temporarily. Regardless of the user specifications, the HGP
may not simplify this portion of the design as it is required for even the simplest
processor.
Page 41
Figure 9.3.15: PC input
As can be seen from Figure 9.3.15, the new value for the PC may come from
four sources:
1. AluOut: To handle regular PC + 4 update
2. Reg A: To handle the jump register instruction where the next value of
the PC is held in the registerfile.
3. ALU: To handle branches where the old value of the PC is added to the
branch offset.
4. Shift Left: To handle jump instructions where the word address in the IR is
shifted left to construct a byte address and then combined with
the most significant bits of the original value PC to obtain the new
value of the PC.
This portion of the datapath can be considerably simplified by the HGP if jump
instructions are not required; the top two branches in Figure 9.3.15 can then be
eliminated. In addition, if branches are not required then the multiplexor can also be
removed.
Page 42
Figure 9.3.16: Register Data Input
The HGP cannot simplify this portion of the design since a processor without
load or arithmetic operations would be useless.
ALUSRCASIZE_A: INTEGER:=1;
ALUSRCBSIZE_A: INTEGER:=2;
POWERALUSRCBSIZE_A:INTEGER:=4;
PCSOURCESIZE_A: INTEGER:=2;
POWERPCSOURCESIZE_A:INTEGER:=4;
END proc;
Figure 9.3.17: VHDL Description for complete processor
The parameters include all the parameters needed for the creation of the
registerfile and ALU as well as parameters such as PCSOURCESIZE_A which
indicates the number of possible locations that t he value of the PC can come from.
Furthermore, parameters such as REGFIELDSIZE are included so that the different
content within the IR can be connected to the appropriate places.
Page 43
9.3.3.3 Testing
The complete datapath has not been tested since the computing resources
needed to compile the processor are not available (Please see Section 9.10).
The testing of the modules in the datapath, including the Sign Extension
Modules and Shift Left Modules have been fully tested in simulation. Please see
Appendix 6 where the timing diagrams for the test cases are available
Page 44
9.4 Processor Control
The control circuitry is what controls how each instruction is being processed by
the processor. The control system is essentially a finite state machine (FSM) [8]. The
state machine transitions through a set of states for each instruction in order to ensure
proper instruction execution.
The program will use the state information for each instruction along with
predefined states to generate a new transition table with all necessary states. Each
state will be responsible for handling a part of the execution of a single instruction.
Page 45
9.4.2 Design Approach
In order to properly design the control unit for the parameterizable processor we
decided to create a generic control unit upon which all control units generated by our
software are based. The main advantage of this decision, is that if a processor with a
full instruction set worked with the specified control units, then removing control steps
for unused instructions would not affect the flow of control for other instructions. The
following sections will explain how the control unit was designed and tested.
9.4.2.2 Testing
The testing of the control unit was done based on the full instruction set. The
basis for our testing method was the fact that if the control unit was able to properly
control all instructions provided by the design suite, then it would also properly control a
smaller subset of those instructions.
Once the testing was complete, we have decided to test a reduced version of the
control unit. This stage of testing was necessary in order to verify that there are no
dependencies between states of different instructions, as well as to prove that the HGP
properly optimized the control circuitry.
Page 46
9.5 Processor Input/Output
An I/O interface was designed, created, and tested for the processor. The
interface uses memory mapped I/O that allows the processor to only communicate with
a limited set of devices. There are two different ways to access input and output
devices in the majority of computer systems. The two different ways are direct port
access and memory mapped access. Direct port access requires that devices have a
separate set of connections to the processor a nd that the processor has extra
instructions to deal with these devices. Memory mapped access on the other hand
allows the processor to have a simpler design. Extra logic, however, must be added
outside of the processor to figure out if the processor want s to access memory or an I/O
device [8]. Only memory mapped I/O was implemented so that the processor, the most
complex hardware, component is as simple as possible.
The VHDL files that are being generated can be divided into two large sections,
the processor, and everything outside the processor. The processor can be further
subdivided into a register file, an Arithmetic Logic Unit (ALU), a data path, and a control
circuit. The processor can communicate with everything outside of it through a set of
address, data, and status lines that connect it to everything else. Everything else
includes memory for storage, I/O devices that allow the processor to interact with the
outside world, and a way to figure out if the processor wants to communicate with th e
memory or with the I/O devices. Figure 9.5.1 is a block diagram that represents how the
design looks like at a high level of abstraction.
Page 47
Figure 9.5.2: I/O Interface
Page 48
unselected and cannot send any more signals to the processor unless it is selected
again.
9.5.3.1 Mouse
A PS/2 mouse communicates with another device through a 6-pin PS/2
connector. The 6 pins on the connector are:
1. Data
2. No connection
3. Ground
4. +5 V
5. Clock
6. No connection
The pin layout for the socket and the plug is as follows in Figure 9.5.3:
The mouse sends data three packets (bytes) at a time. The data contains
information concerning the last movement the mouse has made and the states of the
buttons (either pressed or not pressed). The way the information is stored is shown in
Table 9.5.1.
Page 49
Bit 7 6 5 4 3 2 1 0
Byte 1 YV XV YS XS 1 0 R L
Byte 2 X7 X6 X5 X4 X3 X2 X1 X0
Byte 3 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0
YV, XV
XS, YS
L, R
X0-X7
Y0-Y7
Table 9.5.1 : The format of the information that the mouse transmits [1Borys]
The clock frequency that the mouse generates is somewhere between 10 kHz
and 33 kHz. The mouse uses this clock signal to synchronize its communications.
When a mouse transmits a byte, it sends 11 bits:
• A start bit, which is always ‘0’
• Eight data bits
• A parity bit that is ‘1’ if the data contains an even number of ‘1’s
• A stop bit, which is always ‘1’. [10]
The mouse can send data when its state has changed and it detects that both
clock and data are high. The mouse then sends data so that whatever it is connected to
can capture the data on the falling edge of a signal . [9]
Page 50
the keyboard sends make and break codes respectively. A buffer does not have to be
implemented to store past keystrokes because the keyboard has an internal buffer.
The keyboard has this buffer so it does not loose any information if it is stopped from
transmitting data to the computer [11].
9.5.3.3 Monitor
The monitor displays images on the screen by quickly going through each pixel
on the screen and turning it into a certain colour. The monitor does this about 60 times
a second. The pixels are set starting with the top left corner, going to the right, line by
line until the bottom right is reached. The process is then repeated . Each line can be
thought of as a horizontal cycle. The monitor knows what to do because it is sent
information concerning the red, green, and blue intensities of each signal and a
horizontal synchronization indicator that indicates when a line is finished. There is also
a vertical synchronization indicator that indicates when a screen is finished being
updated. [4Borys] A simplified waveform is shown in Figure 9.5.4. The most difficult
part about sending information to a monitor is making sure that everything is sent at the
appropriate time.
Figure 9.5.4: A single screen write to a screen with a resolution of 2 pixels by 2 pixels
Page 51
Figure 9.5.5: Mouse Interface
Page 52
be used to communicate with the keyboard can be generated using software. This
approach allows for simple hardware and greater flexibility as to what is actually
implemented to communicate with the keyboard.
9.5.7 Testing
Each of the components in the I/O interface has been tested. The tests
encompass the following simulations:
• The mouse interface’s actions when a mouse writes to it
• The generic PS/2 interface’s actions when it has to read from and write to a
device
• The entire I/O interface’s responses when the processor communicates with all
of the devices
All of the simulations that were performed show that the interfaces work as
expected. The simulations are presented in Appendix 6.
Page 53
9.6 Memory/Cache Design
In this section the design specification for the memory and cache controller will
be layed out. Further, the design choices will be explained as well as how the controller
fits together with the rest of the processor.
9.6.1 Overview
The memory and cache controller is an essential part of a processor. The
controller takes care of memory reads and writes and abstracts the memory circuitry
from the processor. This abstraction is necessary due to the fact that memory chips are
not created equal, and thus the controller abstracts away the differences, allowing the
processor to see a common interface. To speed up the memory access times a cache
is introduced.
A cache refers to memory that holds a copy of data stored in main memory, but
the access time required to read from a cache is much smaller than that required to
read from main memory. Since a cache is very fast, it is also very expensive and does
not hold much data compared to the main memory within a computer system. In order
to make good use of the cache, fo r example so that it holds more frequently used data
along with most recently accessed data, a set of mapping functions and replacement
algorithms have been developed.
Page 54
keep track of which page in the set was accessed last and replace the other page in the
set. The larger the set size, the more complicated the algorithm becomes.
Page 55
Figure 9.6.2: Memory And Cache Controller module
Page 56
9.7 Hardware Generation Program
The HGP takes in an XML file from the GUI and then generates the HDL for the
specified processor and the XML for the assembler. This section will explain the design
methods and issues involved in the creation of the HGP . First an overview of the HGP
operation will be explained and then each sub module will be explained in detail.
9.7.1 Overview
Page 57
Figure 9.7.1: HGP Overview
Page 58
9.7.3.1.1 Members
Table 9.7.1 lists the members of the ProcParameter class along with their utility:
Member Utility
protected boolean doShiftsExist; Determines if Shifts are included in the
processor
protected int busWidth; The buswidth of the processor
protected int numReg; The number of registers in the processor
protected int log2NumReg; log2(numReg)
protected int numInstructions; The number of instruction available in the
processor
protected int opcodeFieldSize; The number of bits available for the
opcode in the instruction word
protected int regFieldSize; The number of bits available to indicate a
register in the instruction word
protected int shamtFieldSize; The number of bits available for the shift
amount in the instruction word
protected int functFieldSize; The number of bits available for the
function field in the instruction word
protected int jumpFieldSize; The number of bits available for a jumpto
address in the instruction word
protected int immediateFieldSize; The number of bits available for an
immediate value in the instruction word
protected int wordToByteOffset; The number of bytes in a word
protected int aluSrcASize_A; log2(number of paths into Input A of the
ALU)
protected int aluSrcBSize_A; log2(number of paths into Input B of the
ALU)
protected int PCSourceSize A; log2(number of paths into the PC)
protected int powerPCSourceSize A; Number of paths into the PC
protected int aluOpSize_A; The number of bits specified to the ALU to
indicate an operation
protected int bitWiseAluOpSize_A; The number of bits used by the bitwise
ALU to select among different outputs
protected int functAluOpSize A; The number of bits used by the ALU to
select among different outputs
Table 9.7.1: ProcParameter Members
9.7.3.1.2 Methods
The ProcParameter class has three types of methods :
1. Methods which allow the busWidth, numReg, numInstructions, and
doShiftsExist members to be set or initialized.
Page 59
2. Methods which allow the value of each member (except doShiftsExist) to
be viewed.
3. The fixProcParameters method which analyzes the input received through
the set methods and determines a suitable value for all other members.
Page 60
The classes that extend the Signal class such as the FullAdderInputSignal class
contain another static count to determine how many instances of the particular class
have been instantiated. Please see Figure 9.7.3 for the Signal hierarchy.
if(!Instruction.resourceAlreadyThere("RegLo")) {
Resource.addElement(new RegLo());
Signals.addElement(new ALUOutputSignal ());
}
if(!Instruction.resourceAlreadyThere("RegHi")) {
Resource.addElement(new RegHi());
Signals.addElement(new ALUOutputSignal ());
}
Page 61
if(!Instruction.resourceAlreadyThere("Divide")) {
Resource.addElement(new Divide());
Signals.addElement(new HiInputSignal ());
Signals.addElement(new LoInputSignal ());
}
}
}
Figure 9.7.4: div Class
The code within the constructor first calls the super constructor, then sets the
name of the Instruction as well as the opcode. Then by using the static method
resourceAlreadyThere in the Instruction class, the divide constructor searches for
resources that the divide function needs. If the resource has not already been created
by another instruction, the resource is instantiated and added to the Vector.
Furthermore, the Signals needed by the addition of that resource are also instantiated.
The divide constructor does this for every resource it needs.
One important point is that every third level class first calls the constructor of the
second level class. It was noted before that there is extra code within the second level
constructors as well. This code is very similar to the code seen in Figure 9.7.4, but it
allows for code reuse. For example, an “and immediate” class will only contain code in
its constructor to search for resources that it needs in addition to the resources that all
immediate functions need. The resources that are needed by all immediate functions
would be searched for in the constructor of the Immediate class which is a superclass
of the andi class. Please see Figure 9.7.5 for a subset of the Instruction class
Hierarchy.
Page 62
need to be added to the available instruction set, then no classes must be modified, but
only new classes defining the resources and signals needed for the particular
instruction created.
Page 63
The following sections will describe how the script processor has been designed
and describes features provided by the scripts.
Page 64
Command Parameters Purpose
define name value Allows declaration of a variable that can be used in
a loop or as a parameter.
undefine name Removes declaration from the list. It is needed in
order to ensure that local declarations do not
interfere with global variables. Additionally it
prevents creating a full set of unique variable
names for local segments.
label name Enables the script to recognize a subsection of the
database. The program can look for the label and
process only the data within the label scope.
name Ends a label scope.
end
load filename label Loads contents from a database “filename” under a
label “label.” The contents of the label scope are not
executed, just copied.
execute filename label Executes the contents of the label scope “label” in
the database file “filename.” The commands within
this scope are executed as though they applied to
the caller of this command.
instructions instruction1 instruction2 ... Specifies the set of instructions that are to be used
instructionN EOI by the processor. The list is terminated by the
keyword EOI.
generate param The parameter “param” has to be one of the
following:
- STATES - to generate a set of states for the circuit
to follow, based on instruction set and declare them
in VHDL
- FLOW - to generate the states and their transition
information based on instruction set
for variable start end step Creates a set of VHDL commands for the code
provided in the for loop scope. For each iteration of
the loop the “variable” will change value by step.
close Closes the for loop scope.
Keyword Purpose
EOI Specifies the End Of Instruction set in the instruction command
Specifies the size of the bus to be used
BUSWIDTH
Specifies the address space for the processor
ADDRESSWIDTH
Specifies the size of a memory page
PAGESIZE
REGISTERCOUNT Specifies the number of registers to be used
Page 65
script processor to use script variables not only as macros to be placed within the VHDL
code, but to actually manipulate them.
The main goal that has been achieved by adding this feature is the ability to
handle more complex operations within the script, thus making the script processor
something more than just a variable substitution tool. The expressions handled by the
script processor are all of an integer or string type. The integer type allows the results of
the expression to be used for width evaluation, counting, and also reusing VHDL
defined generic variables. The string type gives the script processor a little more
flexibility. Now an expression can be also treated as a parameter to a VHDL
instantiated module. This feature is very useful, because it allows the script processor
to change connections between different modules or eliminate them completely in order
to optimize the design.
9.7.4.2.4 Testing
The list of currently available commands in Table 9.7.2 has been fully tested. We
were able to use the script to develop many different processors without any problems.
Some of the templates used by the design tool have been modified to include script
commands. Generating proper VHDL code using the script enhanced code has been
successful.
Page 66
9.8 Assembler
The main function of any assembler is to convert assembly language into
machine code for a target processor. The target processor in this case is generated
according to user specifications. Therefore the assembler is dynamic and it can deal
with several different processors and a large number of permutations of assembly
instructions. The assembler also provides meaningful errors when it encounters them
in the assembly code.
9.8.1 Design
The assembler design requires that the assembler be able to handle many
different instruction sets. To accommodate this requirement, the HGP must output all
the assembly instruction details. The assembler proceeds by reading in these details
and forming assembly rules. The second phase of the assembler applies the assembly
rules to a user created assembly file.
9.8.2 Implementation
The implementation of the assembler design was done using Java. The
assembler itself can be partitioned into two phases. The first phase is the construction
Page 67
of the assembler and the second phase is the application of the assembler onto the
assembly code.
In the first phase the assembler is constructed to fulfill the needs of a target
processor and assembly code. All the information the assembler needs is located in an
Assembler XML Language (AXL) compliant file. The details of AXL are provided in
section 9.9.2. The AXL file contains information on the instructions the assembler
recognizes and the code it will output. Internally each assembler instruction is stored in
an object that is placed in a hash table for quick lookup.
The second phase of the assembler applies the rules stored in the AXL file. In
this phase the assembler parses the user created assembly file. The user assembly
language file is read one line at a time. An outline of the parsing procedure for each
line is shown below:
1. Line is read in
2. Comments, denoted by a #, are removed
3. Labels, denoted by a :, are stored in a symbol table and are removed
4. The assembly instruction is looked up in the hash table generated when
reading the AXL file
5. Parameter 1 is evaluated
6. Parameter 2 is evaluated
7. Parameter 3 is evaluated
8. Finished machine code is generated
Table 9.8.1 summarizes the following example of how the parsing procedure
may be applied to a sample assembly instruction. Line one, in the table, indicates the
assembler has read in the following line:
start: addi $1, $2, 15 # reg1 = reg2 + #15.
This assembly code is based on the MIPS instruction set. The line contains a
label, “start”, and a comment, “reg1 = reg2 + # 15”. The command used is add
immediate, “addi”, which adds a register and an immediate value and stores the result
in another register. In this particular case, the parameters to the addition are register 2,
denoted by $2, and the number 15. The destination is register 1, denoted by $1.
Line two of the table applies the parsing procedure to remove the comment.
Thus, in the assembly code column of line two, the comment “reg1 = reg2 + #15”, is
removed.
The parsing procedure next stores the label in a symbol table for future
reference. This is shown in line three of the table. The assembly code that is displayed
in line three of the table is what remains after both the comments and the label have
been processed.
The next step of the parsing procedure processes the assembly instruction,
“addi”. “addi” is looked up in the AXL file and the code associated with the instruction is
retrieved. In this example the code associated with “addi” is
Page 68
“00100000000000000000000000000000”. Line four of the table shows the remaining
assembly code to be processed at this stage, in the assembly code column, and the
code that has been generated so far, in the action column.
Lines five through seven of the table process each the parameters of the
assembly instruction. Each parameter has code generated for it. This code is inserted
into the code already retrieved for the instruction. The table shows where the new code
for each parameter is inserted by boldfacing the type in the Action column. The AXL
file contains information regarding the position and value of the code that is generated.
The final step of the parsing procedure is to output the generated code.
This sample assumes that the AXL file contained an entry for addi, $1 and $2
and code associated with these entries.
Step Assembly code being processed Action
1 start: addi $1, $2, 15 # reg1 = reg2 + #15
2 start: addi $1, $2, 15 Removed comment
3 addi $1, $2, 15 Add ‘start’ to symbol table
4 $1, $2, 15 00100000000000000000000000000000
5 $2, 15 00100000000000010000000000000000
6 15 00100000010000010000000000000000
7 00100000010000010000000000001111
8 Output:
00100000010000010000000000001111
Table 9.8.1 Application of parsing procedure on a sample assembly instruction.
Internally the assembler must deal with many different circumstances that lead to
errors. At this point the assembler has to stop generating code for the current line,
report an error and reset its state to begin the parsing of the next line. To best
accomplish this, the assembler throws a custom exception each time an error is
encountered. A class called AssemblerException that extends java.lang.Exception was
written to represent the exception. This allows all errors to be reported in a uniform
manner because they are all generated from the same class of objects. It also enables
the assembler to quickly reset its state for the next instruction.
9.8.3 Testing
Testing of the assembler has proved to be difficult because of the number of
combinations of inputs. Currently test cases have been developed to test for all errors
the assembler reports. Tests have also been performed to account for erroneous data
read by the assembler. Interface testing between the Assembler and the HGP was
eased because interfaces between the sections of the project were decided early on.
Page 69
9.9 Interfaces
One of the most important parts of any project is to clearly define interfaces
between sections early on. This way sections are written against interfaces without
worrying about the underlying code. This also allows test cases to be written before the
sections are complete. Interfaces can be thought of as a contract that the different
sections must fulfill. Each component in our design interfaces with the next component
using ASCII text files. In order to write clear interfaces we chose to develop XML
languages that these files must conform to.
9.9.1.1 Design
The design of UIXL was intended to be human readable. UIXL conveys
information about the parameters the user selected in the GUI. When looking at a UIXL
document one can see what the user has selected. The attribute tags allow information
to be conveyed using name value pairs. While the command tags specify what
commands the processor should be able to process.
The structure of UIXL is specific but the content is not. This was done to allow
the future additions of attribute and command tags. The program that reads this file will
look to see if an attribute or command is present and act accordingly. That is if a
required attribute or command is not set then the program can set a default value or
raise an error.
9.9.1.2 Specification
A tree diagram of the UIXL specification is shown in Figure 9.9.2. In order for a
file to conform to UIXL it must contain a CPU tag as the parent of all other tags. The
CPU tag may optionally have the following children tags: the Attributes tag and the
Commands tag.
The Attributes tag may have zero or more Attribute tags contained inside it.
Each Attribute tag must contain a Name tag and a Value tag. For example an Attribute
tag has a Name tag that contains NumberOfRegisters and a Value tag, which contains
2.
The Commands tag may have zero or more Command tags contained inside it.
Each Command tag must contain a text value. For example in a Command tag
contains add, and the other Command tag contains sub.
<?xml version='1.0'?>
<!-- User Information XML Language (UIXL) file -->
<CPU>
<Attributes>
<Attribute>
<Name>
NumberOfRegisters
</Name>
Page 70
<Value>
2
</Value>
</Attribute>
<Attribute>
<Name>
BusWidth
</Name>
<Value>
2
</Value>
</Attribute>
.
.
.
</Attributes>
<Commands>
<Command>
add
</Command>
<Command>
sub
</Command>
.
.
.
</Commands>
</CPU>
Figure 9.9.1: Layout of the User Information XML Language (UIXL) used to transfer information from
the graphical user interface to the hardware generator. Note that ellipses are used in the diagram to
indicate that a parent tag may include more than one child tag. Specifically the Attributes tag may
include many Attribute tags and Commands tags may include many Command tags.
Page 71
9.9.2 Assembler XML Language
The Assembler XML Language (AXL) is used to pass information from the HGP
to the assembler. AXL is defined for this project to give the assembler all the
information it needs to convert assembly code to machine code for a specific processor.
Since the assembler is written in a very generic manner the AXL language must be rich
enough to allow for very precise specification of how the assembler should behave. An
example of the layout of AXL is shown in Figure 9.9.3.
9.9.2.1 Design
AXL can be interpreted in three parts: assembly instructions, instruction types,
and parameters. These tree parts are represented by Instruction tags, TypDef tags,
and Parameter tags respectively.
The Instruction tags define a set of instructions the assembler is aware of. Each
Instruction tag contains the name of the instruction, the binary code associated with it,
and the type of instruction. The value in the type of instruction field must refer to an
instruction type defined by a TypeDef tag.
The second part of AXL is the user defined instruction types represented by the
set of TypeDef tags. Many instructions may be associated to one type. This feature of
AXL reduces data repetition and the overall length considerably. For instance, the
assembly language recognized by the MIPS processor will only require four TypeDef
tags. If the writers of an AXL file need a new instruction type for each instruction this
can also be accomplished. The instruction type contains information on the number of
parameters, the parameter widths, the parameter types, and the location of each of the
parameter’s code inside the instruction code (by using an offset).
Finally the Parameter tags contain information about each of the processor
named parameters accessible to the assembly code. For each parameter there must
be a name and code.
Like UIXL, the structure of AXL is specific but the content is not. This was done
in order to allow the assembler to be as flexible as possible. This also allows for future
additions and changes to the instructions, their types and parameters. Programs that
parse files that conform to the AXL language should query if an element exists and take
appropriate action if it does not.
9.9.2.2 Specification
A tree diagram of the AXL specification is shown in Figure 9.9.4. AXL requires
that the top-level tag be the Language tag. Under the Language tag there can be at
most one of each of the following tags: TypeDefs, Parameters, and Instructions.
The TypeDefs tag can have zero or more TypeDef tags as children. Each
TypeDef tag requires a Name tag and can have at most one of each of the following
tags: Param1, Param2, and Param3. Each of the Param1, Param2, and Param3 tags
Page 72
must contain tags RightShift, Width, and Type. Each of the RightShift, Width, and Type
tags must contain a text literal.
The Parameters tag can have zero or more Parameter tags as children. Each
Parameter tag must contain a Name tag and a Code tag. The Name and Code tags
must contain text literals.
The Instructions tag can have zero or more Instruction tags as children. Each
Instruction tag must contain a Name tag, a Type tag, and a Code tag. The Name,
Type, and Code tags must contain text literals.
Page 73
<Instructions>
<Instruction>
<Name>
add
</Name>
<Type>
RegType
</Type>
<Code>
00000000000000000000000000000000
</Code>
</Instruction> ...
</Instructions>
</Language>
Figure 9.9.3 Layout of the Assembler XML Language (AXL) used to transfer information from the
Hardware Generator to the Assembler. Note that ellipses are used in the diagram to indicate that a
parent tag may include more than one child tag. Specifically the TypeDefs, Parameters, and
Instructions tag may include many TypeDef, Parameter, and Instruction tags respectively
Page 74
Figure 9.9.4: Tree diagram of the AXL language
Page 75
9.10 Results
The objective of this project was to create a design suite that generates custom
processors so that the most efficient processor could be created for a specific task. To
measure if the objective was met the speed and space utilization of different custom
processors would be measured. However, due to a lack of computing power these
measurements could not be performed in full. The lack of computing power included a
lack of physical memory on the ugsparc machines, a lack of disk space on a partition
on the eecg network, and the lack of large devices included in the student license for
Max+plus2 on Windows machines.
With these computing restrictions in place, only subsections of the whole
processor could be compiled and analyzed. Please see Figures 9.10.1 through 9.10.3
for the results.
As can be seen on the plots, the size and speed of the com ponents change with
different parameters. The registerfile’s size increases drastically when either t he
buswidth or the number of registers is increased. Therefore it is profitable to define a
processor with the exact specifications needed to save space.
Furthermore, the size and speed of the ALU also increase with increased
functionality or buswidth, again showing that a processor exactly suited to a task is the
best option, and a general processor with all the options is not the correct tool to use.
Thus, the results point to the observation that the tool does create customized
processors that are better suited to the task than a complete general processor by
allowing them to be faster and take up less space.
Page 76
Size of Customizable ALU
2000
1000
500
0
8 16 32
BusWidth
Basic No Multiplier/Division No Shifting Everything
140
Tcritical (ns)
120
100
80
60
8 16 32
BusWidth
Basic No Multiplier/Division No Shift Everything
Page 77
Size of the Customizable RegisterFile
800
600
8 Bits Wide
500
16 Bits Wide
400
32 Bits Wide
300
200
100
8 16 32
Number of Registers
Page 78
10 Conclusions
The project included three main objectives as indicated in Section 8 and
repeated below. Each of the objectives and their state will be individually discussed
below.
Design a set of hardware components to be used as a basis of a
user-defined processor
Create a set of easy to use, portable, and flexible software that allows a user
to create a processor without knowing a Hardware Description Language
Analyze the performance and benefits of using our approach
The first objective was successfully met. Throughout the design of the hardware
description language for the processor it became clear that certain coding practices
made it very easy to create customizable and parameterizable hardware components.
These practices included the use of Generic Maps and multiplexor coding standards.
When the HDL was used in conjunction with the script processor to create scripts to
make customizable processors, the true flexibility and ease of the coding standard was
discovered. Thus, not only was the objective met, but a standard for creating new
hardware modules that could easily be integrated into future software suites was also
found.
The second objective, which is the most important objective, was also met
successfully. The software suite allows a user to create a hardware description of a
processor and a custom assembler for that processor in a matter of minutes. The GUI,
with its help features, allows the user to start using the program without a steep learning
curve since there are no command line options. The HGP does not need any user
interaction and thus is very easy to use. Finally the assembler produces machine
readable code for correct source code, and outputs meaningful errors if there are
source code errors.
In addition to its ease of use, the software suite created for the project has many
other desirable characteristics including portability, loose coupling, flexibility, and,
modularity. The whole design suite was designed in Java and Java Swing and
therefore can be used on most computer platforms including Windows, UNIX, Linux,
and Apple systems. The portability of our software suite does not limit its use, which is
an important ease of use factor.
The XML files used between different components in the software suite allow the
software to be loosely coupled. By using the XML as a defined interface between the
components, each component could be designed independently of all other
components. For example, if another implementation of the HGP was produced, then
the old HGP could be removed and the new one used without affecting the GUI or the
assembler since the XML allows for a clean division between the components.
Finally, the software suite is both flexible and modular. Due to the inherent
nature of XML to be extensible (as the name implies) and the design of each software
component, new features or instructions can be added to the software suite with very
little change. As an example, take the scenario where a new instruction needs to be
included in the software suite, then only the following actions need to be taken:
Page 79
1. XML input into GUI must be changed to include the instruction, instruction
format, and tool tips. (GUI code is unchanged)
2. New Java classes for the instruction and any new resources it uses has to be
created in the HGP, but because of the hierarchy in the Instruction and
Resource classes, each one of these classes will be quite small.
Notice that the assembler is completely unchanged. Of course the addition of a
new instruction will need more work in terms of creating a customizable and
parameterizable hardware description in all of the components used for the new
instruction, but the changes in the software suite are minimal thus allowing the software
suite to be able to be upgradeable.
Thus the creation of the software suite not only meets our objectives, but
provides a framework for an extension of its use in the future.
The third and final objective was not completely met. As indicated in previous
sections, due to resource limitations, the compiled processor could not be analyzed in
full. However, components of the custom processor were analyzed and they do point to
the benefits to having a custom processor suited to a particular task. In particular, the
processor will be able to operate faster, and take up less area, allowing for other logic
to be placed on the same programmable chip.
Thus, the motivation for the project, which was to create a tool that could easily
and quickly develop processors suited to a task so that it would perform those
operations most efficiently, has been met. The tools created and described above are
easy to use, need no hardware knowledge, and create processors that are faster and
more space efficient compared to a processor that includes the full functionality.
Page 80
Appendix 1: Timeline from Technical Proposal
Page 81
Appendix 2: Timeline from Interim Reports
Page 82
Appendix 3: Final Timeline
Page 83
Appendix 4: Acronyms
Page 84
Appendix 5: Instruction Set
hello
Page 85
Appendix 6: Test Cases
Page 86
A6.2 ALU Simulation
Figure A.6.2 outlines a test simulation where both positive and negative
numbers are added and subtracted. Results are correct for all four cases.
Figure A.6.3 outlines a test scenario where all the logical operations are
applied to the same inputs. All results are correct.
Figure A.6.5 outlines a test scenario where 13 is divided by 5, and then the
remainder and quotient are subsequently extracted with the “Move from Lo” and “Move
Page 87
from Hi” instructions respectively. Then, 2147483635 is multiplied by 16 and the result
is again extracted with “Move from Lo” and “Move from Hi” instructions.
Figure A.6.5 outlines a test scenario where the numbers 5,6,7 are compared with
the number 6 with the instructions “set on less than” through “set on not equal.” All test
cases provide the correct output.
Figure A.6.6 outlines a test scenario where the hex number 8FFFFFF8 is shifted
and rotated in all directions by varying amounts. All test cases provide the correct
output.
Page 88
Figure A.6.5: ALU Comparison Simulation Figure A.6.6: ALU Shifting Simulation
Page 89
A6.3 PS/2 Mouse Port Simulation
Page 90
Figure A.6.7 shows the reaction of the mouse interface to a stream of bytes from
the mouse. The bytes are read in serially and then output to the registers byte0, byte1,
and byte2. The interface also shows how byte0 changes when it is read from by the
processor.
Figure A.6.8 shows how the processor shows two cases of accessing a generic
PS/2 port. First, the processor write the value 6C to the data port. The processor then
reads in a value, 96, from the data port.
Page 91
Figure A 6.9: Memory Mapped Bus Simulation
Page 92
Appendix 7: Sample Source Code
Due to the large amount of code in the project, only a selection of code from
each part of the project is shown here.
menuBar.add(menu);
frame.setJMenuBar(menuBar);
Page 93
A7.2 XML Input/Output: Java Code
Sample section from the Java object that writes the Assembler XML file:
/* Write all of the instructions */
Sample section from the Java object that reads the User Interface XML file:
// Traverse over the node list containing Instructions
Page 94
A7.3 Instruction Set Based Component Selection: Java Code
public abstract class Instruction {
public Instruction() {
}
//--------------------------------------------------
public PCInSignal() {
super();
numberOfinner++;
}
Page 95
A7.4 Script Processing: Java Code
/* This method is called to initiate script processing. Once it is completed a
script has been fully processed. */
void Run() {
String a;
int result;
while ((a=input.readLine())!=null) {
/* Process input line and determine if it is within valid block. If not then
ommit it. */
result = ProcessString(a);
switch(result) {
case ENTERED_SEARCH_BLOCK: within_search_block = true;
break;
case LEFT_SEARCH_BLOCK: within_search_block = false;
return;
case RETURN_OK: break;
Default: System.out.println("Error processing code");
System.exit(1); /* On error */
break;
}
}
if (close_destination == true) {
try {
output.close();
}
catch (Exception e) {}
}
input.DoneParsing();
}
if (s.length()>0) {
switch (ProcessCommand(token)) {
case READ_COMMAND: break;
case RETURN_OK: if (SkipToNextEnd==true) break;
if ((within_search_block ||(!specific_search)))
{
if (ExecuteExpressionEvaluation)
WriteToFile(Expression.EvaluateParameters(s,defines));
else
WriteToFile(s);
}
break;
case RETURN_ERROR: System.out.println("Error: Invalid command
encountered\n");
System.exit(1);
break;
default: break;
}
}
else {
a = "";
if( (within_search_block || (!specific_search)) )
WriteToFile(a);
}
return RETURN_OK;
}
Page 96
A7.5 Assembler: Java Code
// Now that all files that we need are opened
// Start parsing the input and generating the output
//
try
{
StringTokenizer st;
String AssemblerLabel;
int nTokenCount;
int nLineNumber = 0;
String sInstruction, sParam1, sParam2, sParam3;
StringBuffer sbOutput;
Instruction inst;
TypeDef tpdef;
Param param1, param2, param3;
try
{
//
// Remove comments indicated by the # character
//
st = new StringTokenizer(line, "#");
nTokenCount = st.countTokens();
if ((st.hasMoreTokens()) && (line.indexOf("#") != 0))
{
line = st.nextToken();
}
else
{
line = "";
}
//
// Check for labels
//
st = new StringTokenizer(line, ":");
nTokenCount = st.countTokens();
if ((nTokenCount == 1) && (line.indexOf(":") != -1))
{
// label was found but it is either blank or not associated
//with a line
throw new AssemblerException ("label is either blank or not associated
with a instruction", nLineNumber);
}
else if (nTokenCount == 2)
{
// label was found
AssemblerLabel = st.nextToken();
line = st.nextToken();
}
else if (nTokenCount > 2)
{
// found more than one colon on a line
// ACTION: report an error and skip this line
throw new AssemblerException ("statment cannot have more than one ':'
per line", nLineNumber);
}
Page 97
A7.6 Datapath: VHDL Code
--Instruction Register-------------------------------------
--Register File--------------------------------------------
rf0: registerfile GENERIC MAP (BUSWIDTH => BUSWIDTH, NUMREG => NUMREG,
LOG2NUMREG => LOG2NUMREG)
PORT MAP (rfwrite,
clk,
irout(BUSWIDTH-OPCODEFIELDSIZE-1 DOWNTO
BUSWIDTH-OPCODEFIELDSIZE-REGFIELDSIZE),
irout(BUSWIDTH-OPCODEFIELDSIZE-REGFIELDSIZE-1 DOWNTO
BUSWIDTH-OPCODEFIELDSIZE-2*REGFIELDSIZE),
rfwritereg,
rfwritedata,
rfreaddata1,
Rfreaddata2);
Page 98
A7.7 Control: VHDL Code
BEGIN
-- Obtain instruction opcode
opcode <= IR_Data(BUSSIZE-1 downto BUSSIZE-OPCODESIZE);
Page 99
A7.8 Cache Controller: VHDL Code
BEGIN
-- Setup registers to hold tags for each bin
REGA : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_a,
clock, reset, en_a, a);
REGB : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_b,
clock, reset, en_b, b);
REGC : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_c,
clock, reset, en_c, c);
REGD : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_d,
clock, reset, en_d, d);
REGE : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_e,
clock, reset, en_e, e);
REGF : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_f,
clock, reset, en_f, f);
REGG : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_g,
clock, reset, en_g, g);
REGH : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_h,
clock, reset, en_h, h);
CONTROL : regcontrol
GENERIC map( WIDTH => BINTAGWIDTH,
ADDRESSWIDTH => ADDRESSWIDTH )
PORT map( Address, a,b,c,d,e,f,g,h,enable,
en_a,en_b,en_c,en_d,en_e,en_f,en_g,en_h,
cl_a,cl_b,cl_c,cl_d,cl_e,cl_f,cl_g,cl_h,
clock, Replace, CellSelect, newaddr);
-- Pass to memory the new address, knowing the cache bin data is
-- located in.
NewAddress <= newaddr;
END Behaviour;
Page 100
A7.9 I/O: VHDL Code
The main process that reads from and writes to the PS/2 port
PROCESS(clk,reset)
BEGIN
IF reset='1' THEN
state<=waiting;
ELSIF clk'EVENT AND clk='0' THEN
CASE state IS
WHEN waiting =>
temp_data <= d_in;
count <= "0000";
IF (rw='0') THEN
state <= writeData;
ELSE
state <= readData;
END IF;
WHEN readData => -- read in the data bit by bit and increment counter
ShiftRead: FOR i IN 7 DOWNTO 1
LOOP
temp_data(i) <= temp_data(i-1);
END LOOP;
temp_data(0)<=data;
count <= count + 1;
IF count="0111" THEN
state <= readParity;
ELSE
state <= readData;
END IF;
WHEN readParity =>
state <= readStop;
WHEN readStop =>
state <= waiting;
WHEN writeData => -- destroy bits
ShiftWrite: FOR i IN 7 DOWNTO 1
LOOP
temp_data(i) <= temp_data(i-1);
END LOOP;
count <= count + 1;
IF count="0111" THEN
state <= writeParity;
ELSE
state <= writeData;
END IF;
WHEN writeParity =>
state <= writeStop;
WHEN writeStop =>
state <= writeAck;
WHEN writeAck =>
state <= done;
WHEN done =>
state <= done;
END CASE;
ELSE
state <= state;
END IF;
END PROCESS;
Page 101
Bibliography
[1] Xilinx Inc., “The Future of FPGAs”, [Online document], 1999 Apr 1, Available HTTP:
http://www.xilinx.com/prs_rls/5yrwhite.htm
[2] B. Kastrup, A. Bink and J. Hoogerbrugge, “ConCISe: A Compiler-Drived CPLD-Based Instruction
Set Accelerator,” in Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable
Custom Computing Machines, K. Pocek and J. Arnold. Los Alamitos, California: IEEE Computer
Society Press, 1999, pp. 92-102.
[3] A. Chien, “Safe and Protected Execution in the Morph/AMRM Reconfigurable Processor,” in
Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing
Machines, K. Pocek and J. Arnold. Los Alamitos, California: IEEE Computer Society Press, 1999,
pp. 209-221.
[4] J. Hauser and J. Wawrzynek,”Garp: A MIPS Processor with a Reconfigurable Coprocessor,” in
Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, K. Pocek and J.
Arnold. Los Alamitos, California: IEEE Computer Society Press, 1997, pp. 12-21.
[5] D. Robinson, and Patrick Lysaght, “Modelling and Synthesis of Configuration Controllers for
Dynamically Reconfigurable Logic Systems Using the DCS CAD Framework,” in
Field-Programmable Logic and Applications, P. Lysaght, J. Irvine, and R. Hartenstein. Berlin,
Germany: Springer, 1999, pp.41-50.
[6] G. McGregor, and Patrick Lysaght, “Self Controlling Dynamic Reconfiguration: A Case Study,” in
Field-Programmable Logic and Applications, P. Lysaght, J. Irvine, and R. Hartenstein. Berlin,
Germany: Springer, 1999, pp.144-154.
[7] R. Meier, “Rapid Prototyping of a RISC Architecture for Implementation in FPGAs,” in Proceedings
of the IEEE Symposium on FPGAs for Custom Computing Machines, P. Athanas and K. Pocek. Los
Alamitos, California: IEEE Computer Society Press, 1995, pp. 190-196.
[8] V. C. Hamacher, Z.G. Vranesic and S.G. Zaky, Computer Organization, Fourth Edition New York,
New York: McGraw-Hill, 1996.
[9] T. Engdahl, “PC Mouse Info”, [Online Document], 1999, Aug 13, Available HTTP:
http://www.hut.fi/Misc/Electronics/docs/pc/mouse.html
[10] A. Chapweske, “The PS/2 Mouse/Keyboard Protocal”, [Online Document], 2000 Oct 13, Available
HTTP: http://panda.cs.ndsu/nodak.edu/~achapwes/PICmicro/PS2/ps2.htm
[11] A. Chapweske, “The AT Keyboard”, [Online Document] 2000, Nov 10, Available HTTP:
http://panda.cs.ndsu.nodak.edu/~achapwes/PCImicro/keyboard/atkeyboard.htm
Page 102