Processor Design Suite

Department of Electrical and Computer Engineering
University of Toronto
Final Report
Title: Parameterized Processor Design Suite
Project ID #: 0192000
Prepared by: Navid Azizi nazizi@ecf.utoronto.ca

Borys Bradel bradel@ecf.utoronto.ca
Tomasz Czajkowski czajkow@ecf.utoronto.ca
Mike Krejcik krejcim@ecf.utoronto.ca
Supervisor: Stephen Brown
Section: 5
Section Coordinator D. Beresford
Date: April 11, 2001
Page 1
2 Executive Summary
This document describes the design of a Parameterized Processor Design Suite.
Processors are traditionally implemented on Application Specific Integrated Circuit
(ASIC) chips. Designing a processor on an ASIC chip is usually very costly and time
consuming. As a result, programmable logic chips have been used as an alternative
processor platform. These chips allow users to change their designs without incurring
the manufacturing costs and delays involved in ASIC design. Programmable logic
chips, however, require the user to know a Hardware Description Language (HDL) to be
able to program them. Our project removes this need while retaining the benefits of
using programmable logic chips. Our software suite allows a user to easily generate
custom processors based on their needs without knowing an HDL.
The software suite is a collection of three programs, a Graphical User Interface
(GUI), a Hardware Generation Program (HGP), and an assembler. The GUI allows a
user to specify the processor parameters. The GUI passes on the information that it
received from the user to the HGP. The HGP uses this information to identify all of the
hardware resources that the processor will need, all of the signals that will be
transmitted between these resources, and a set of instructions that the processor will
implement. Once the identification is complete, the HGP generates a set of HDL files
that describe the processor. These HDL files are based on a set of hardware templates
that are a major portion of this design project. These templates dictate the underlying
structure of the processor. The HGP also passes on its information to the assembler.
The assembler can take this information from the HGP and create machine-readable
code from a user’s source code.
Several processors that have been generated using our software suite have
been analyzed. The analysis shows that the size and speed of the processor are
negatively affected by larger parameter values. An increase in bus width has an
especially detrimental effect on the ALU. It is therefore beneficial to use special
purpose processors that exactly meet the user’s needs.
The objectives of this project are to generate a set of customizable processor
components to be used as a template for any user specified processor, to create an
easy to use, flexible, and portable software suite, and to analyze the performance
trade-offs of different parameterizations of a processor. We have met all three of these
objectives.
Page 2
3 Team Members’ Contributions
Tables 3.1 through 3.4 outline the contributions made by the authors to the
design project. Table 3.5 outlines the contributions each author made to the writing of
the final report.
Task Individual Responsible For the Task
Design Project Research Navid Azizi

(this responsibility was shared with the
other members of the team)
Development of Parameterizable Navid Azizi

Processor Templates in VHDL
Testing and Simulation of Navid Azizi

Parameterizable Processor Templates
Development of Instruction, Resource,

and Signal classes for Instruction Set Navid Azizi
Based Component Selection
Testing of Instruction, Resource and Navid Azizi

Signal classes
Testing and Numerical Analysis of Navid Azizi

Specified Processor and Components
Table 3.1: Contributions Made by Navid Azizi
Page 3
Design Project Research Borys Bradel

Development of I/O Interface to Borys Bradel

Parameterizable Processor
Testing and Simulation of I/O Interface Borys Bradel

to Parameterizable Processor
Development of Java Modules to Read Borys Bradel

and Write defined XML Interfaces (this responsibility was shared with
Michael Krejcik)
Testing and Simulation of Java Borys Bradel

Modules to Read and Write defined
XML Interfaces
Integration Testing of Java Design Borys Bradel

Suite
Table 3.2: Contributions Made by Borys Bradel
Page 4
Design Project Research Tomasz Czajkowski

Development of Memory/Cache and Tomasz Czajkowski

Controller
Testing and Simulation of Tomasz Czajkowski

Memory/Cache and Controller
Development of Script Processing Tomasz Czajkowski

Engine
Testing and Simulation of Script Tomasz Czajkowski

Processing Engine
Development of Processor Control Unit Tomasz Czajkowski
Testing and Simulation of Processor Tomasz Czajkowski

Control Unit
Table 3.3: Contributions Made by Tomasz Czajkowski
Page 5
Design Project Research Michael Krejcik

Development of XML Interfaces Michael Krejcik
Development of Java Modules to Read Michael Krejcik

and Write defined XML Interfaces (this responsibility was shared with
Borys Bradel)
Development of Graphical User Michael Krejcik

Interface
Testing of Graphical User Interface Michael Krejcik
Development of assembler Michael Krejcik
Testing of assembler Michael Krejcik
Table 3.4: Contributions Made by Michael Krejcik
Page 6
Section Individual Responsible for the Section
1 Cover Page Navid Azizi
2 Executive Summary Borys Bradel
3 Team Members Contributions Navid Azizi
4 Old Milestones Borys Bradel
5 Revised Timeline Borys Bradel
6 Table of Contents Navid Azizi
7 Acknowledgments Tomasz Czajkowski
8 Introduction Borys Bradel
9 Design
• Sections 9.1, 9.3, 9.7.1, Navid Azizi
9.7.3, 9.10
• Sections 9.2, 9.8, 9.9 Michael Krejcik
• Sections 9.4, 9.6, 9.7.4 Tomasz Czajkowski
• Sections 9.5, 9.7.2, 9.7.5 Borys Bradel
10 Conclusions Navid Azizi
Table 3.5: Contributions made by the Team Members for the Final Report
Page 7
4 Old Milestones
There are four main milestones that relate to the actual design of our project:
• Our processor design completed by the first week of March

• The processor parameterizer completed by the second week of February
• The assembler completed by the second week of March
• Our testing and analysis completed by the third week of March
The complete list of milestones as stated in our proposal is given in Section 5.2
along with the status of these milestones at the end of our project. Appendix 1 contains
the timeline that corresponds to the initial set of milestones. We found that, although
the initial milestones did not change significantly when examined from afar, some parts
of our project took more time than expected and others took less time than expected.
This phenomenon is interesting in that these parts were sometimes gathered under a
single milestone. A good example of this is the processor parameterizer, which was
divided in two, and yet had a single milestone.
The original division of responsibilities has the same shortcomings as our initial
timeline. The responsibilities were not detailed enough and had to be modified. The
following is the original division of responsibilities taken verbatim from our proposal:
All four people will work equally hard on the submission unit, and the final testing
of the processor. The processor parametrizer and the assembler will be developed
primarily by Borys and Michael. The initial planning and first attempt at designing the
processor will be performed by all four people and divided based on personal
preferences. The latter parts of the processor design will be dealt with primarily by
Navid and Tom. The work will be divided evenly between everybody, and every person
will have to know and understand what the other people are doing so as to gain the
maximum benefit from this project.
5 Revised Milestones
The following sections describe the various factors that affected our timeline and
a comparison between our accomplished and original milestones. The different
timelines that we had throughout the project are described in Appendices 1, 2, and 3.
Appendix 1 contains our original timeline. Appendix 2 contains the timeline that we had
at the time of our interim reports and Appendix 3 contains our most recent timeline.
5.1 Reasons for Modification

The main factor that affected our schedule is the realization that certain parts of
our project are tightly coupled to each other and that the rest are almost completely
independent. Two other factors are a heavier than expected workload in the second
term and a limitation of the hardware that we have available.
We discovered in the latter part of the first term that the Hardware Generation
Program and the controller for the processor are more complicated than we first
anticipated and are dependent on the instruction set and design of the processor. For
Page 8
this reason, these two tasks were moved to a later date in the schedule. To make room
for these tasks, everything else that could be moved to an earlier date in the schedule
were moved. As a result Mike concentrated exclusively on the GUI and assembler,
while the other three people concentrated on the hardware and the Hardware
Generation Program. Using this division of responsibilities we were able to do
everything in parallel and harness the synergy of having several people working on two
tightly coupled problems.
Unfortunately we had a heavier than expected workload in the second term. This
caused us to complete some of our milestones, namely the controller of the processor
and the Hardware Generation Program, later than we wanted. Hardware limitations
also made it impossible to place our processor on an FPGA. We could not fully compile
the processor at home because our versions of Max+plusII do not have the proper
licenses to create designs for the appropriate hardware. There is not enough RAM on
the undergraduate sparc machines for us to compile the design, and when we tried to
compile the processor on a friend's EECG account, we ran out of hard drive space on
the partition. So we have concentrated on compiling individual sections of the processor
and looking at their performance characteristics. Our final timeline is presented in
Appendix 3.
5.2 Milestone Status

The following is a list of our original milestones and their status at the end of our
project:
1. Technical Proposal [3rd week of Oct.]
Achieved on time
2. Completion of all processor units [2nd week of Dec.]
Achieved on time
3. Interim Report [1st week of Jan.]
Achieved on time
4. Debugging of all processor units [3rd week of Jan.]
Only the controller was not completed on time
The controller was completed by the end of the 3rd week of Feb.
5. Completion of the first attempt at the processor parameterizer [4th week of Jan.]
The final version of the GUI was completed by the 3rd week of Jan.
The final version of the HGP was completed by the 3rd week of March
6. Completion of the first attempt at an assembler [3rd week of Feb.]
Reached by the 1st week of Jan.
7. Completion of a debugged version of the processor [1st week of March]
Reached by the 2nd week of March
8. Completion of the assembler [2nd week of March]
The assembler was finished by the 4th week of Feb.
9. Completion of Testing [3rd week of March]
Testing could not be as complete as we wanted it to be due to hardware limitations
Testing was completed in the last week of March
10. Final Report [1st week of April]
Achieved on time
Page 9
6 Table of Contents
2 Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 2
3 Team Members’ Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 3
4 Old Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 8
5 Revised Milestones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 8
5.1 Reasons for Modification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 8
5.2 Milestone Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 9
6 Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 10
7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 14
8 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 15
8.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 15
8.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 15
8.3 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 16
8.4 Design and Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 16
8.4.1 Design Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 16
8.4.2 Measurement Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 17
8.4.3 Report Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 17
9 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 18
9.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 18
9.1.1 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 18
9.1.2 Hardware Generation Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 19
9.1.2.1 Parameterized HDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 19
9.1.3 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 20
9.2 Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 21
9.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 21
9.2.1.1 Ease of Use . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 22
9.2.1.2 Provide Information About the Program . . . . . . . . . . . . . . . . . . . . . . . . . Page 22
9.2.1.3 Display Parameters the User can Choose . . . . . . . . . . . . . . . . . . . . . . . . Page 22
9.2.1.4 Limit User Input to only Acceptable Values . . . . . . . . . . . . . . . . . . . . . . . Page 23
9.2.1.5 Cross Platform Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 23
9.2.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 23
9.2.3 Evolution of Graphical User Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 24
9.2.3.1 Advantages of the Initial GUI Design . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 24
9.2.3.2 Disadvantages of the Initial GUI Design . . . . . . . . . . . . . . . . . . . . . . . . . Page 24
9.2.3.3 The New GUI Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 25
9.2.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 26
9.2.5 Current Status . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 28
9.3 Processor Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 29
9.3.1 Registerfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 29
9.3.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 29
9.3.1.1.1 Input Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 30
9.3.1.1.2 Output Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 31
9.3.1.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 31
9.3.2 Arithmetic and Logic Unit (ALU) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 32
Page 10
9.3.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 32
9.3.2.1.1 One-Bit ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 32
9.3.2.1.2 Bit-Wise Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 33
9.3.2.1.3 Complete ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 35
9.3.2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 36
9.3.3 Processor Datapath . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 37
9.3.3.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 38
9.3.3.1.1 IR and Register Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 38
9.3.3.1.2 Input A, ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 39
9.3.3.1.3 Input B, ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 39
9.3.3.1.4 Shift Amount Input, ALU . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 40
9.3.3.1.5 Memory Inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 41
9.3.3.1.6 Program Counter (PC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 41
9.3.3.1.7 Register Data Input . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 42
9.3.3.2 Processor Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 43
9.3.3.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 44
9.4 Processor Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 45
9.4.1 Flow Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 45
9.4.2 Design Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 46
9.4.2.1 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 46
9.4.2.2 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 46
9.5 Processor Input/Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 47
9.5.1 I/O Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 47
9.5.2 I/O Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 48
9.5.3 I/O Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 49
9.5.3.1 Mouse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 49
9.5.3.2 Generic PS/2 Port and the Keyboard . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 50
9.5.3.3 Monitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 51
9.5.4 Mouse Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 51
9.5.5 Generic PS/2 Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 52
9.5.6 VGA Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 53
9.5.7 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 53
9.6 Memory/Cache Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 54
9.6.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 54
9.6.2 Mapping Function and Replacement Algorithm . . . . . . . . . . . . . . . . . . . . . . . . Page 54
9.6.3 Design Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 55
9.7 Hardware Generation Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57
9.7.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57
9.7.2 Read XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57
9.7.3 Instruction Set Based Component Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 57
9.7.3.1 ProcParameter Class . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 58
9.7.3.1.1 Members . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 59
9.7.3.1.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 59
9.7.3.2 Resource Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 60
9.7.3.3 Signal Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 60
Page 11
9.7.3.4 Instruction Class Hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 61
9.7.3.5 Processor Specification Determination . . . . . . . . . . . . . . . . . . . . . . . . . . Page 62
9.7.4 Write VHDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 63
9.7.4.1 Script Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 63
9.7.4.2 Script Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 63
9.7.4.2.1 Script Processing Ideology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 64
9.7.4.2.2 Script Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 64
9.7.4.2.3 Expression Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 65
9.7.4.2.4 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 66
9.7.5 Write XML . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 66
9.8 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.1.1 Interface Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.1.2 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 67
9.8.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 69
9.9 Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 70
9.9.1 User Information XML Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 70
9.9.1.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 70
9.9.1.2 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 70
9.9.2 Assembler XML Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 72
9.9.2.1 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 72
9.9.2.2 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 72
9.10 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 76
10 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 79
10.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 80
Appendix 1: Timeline from Technical Proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 81
Appendix 2: Timeline from Interim Reports . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 82
Appendix 3: Final Timeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 83
Appendix 4: Acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 84
Appendix 5: Instruction Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 85
Appendix 6: Test Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 86
A6.1 Registerfile Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 86
A6.2 ALU Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 87
A6.3 PS/2 Mouse Port Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 90
A6.4 Generic PS/2 Port Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 91
A6.5 Memory Mapped Bus Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 91
Appendix 7: Sample Source Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 93
A7.1 GUI: Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 93
A7.2 XML Input/Output: Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 94
A7.3 Instruction Set Based Component Selection: Java Code . . . . . . . . . . . . . . . . . . . . . Page 95
A7.4 Script Processing: Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 96
A7.5 Assembler: Java Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 97
Page 12
A7.6 Datapath: VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 98
A7.7 Control: VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 99
A7.8 Cache Controller: VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 100
A7.9 I/O: VHDL Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 101
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Page 102
Page 13
7 Acknowledgments
The authors would like to acknowledge their fellow students for giving their
technical advice during the course of this project. In particular the authors would like to
thank Deshanand Singh for his contribution to the project. Stephen Brown, the project’s
supervisor, is also acknowledged for his inspiration and motivation throughout the term
of the project.
Furthermore, the author’s would like to acknowledge that a significant portion of the
concepts required for the completion of the project were acquired in the various courses
offered by the University of Toronto's Edward S. Rogers Sr. Department of Electrical
and Computer Engineering. Moreover, the publications cited throughout this report
provided the authors with new ideas and different perspectives, and assisted the
authors immeasurably. Applicable references are cited as much as possible in the
report, and we sincerely apologize for any omissions that we have made in the
references.
Page 14
8 Introduction
Our project involves the creation of a software suite with an easy to use interface
that will give users the opportunity to implement custom processors based on their
needs. The software suite includes a Graphical User Interface (GUI) that allows the
user to specify the parameters for a processor, a Hardware Generation Program (HGP)
that produces the Hardware Description Language (HDL) code for the user-specified
processor, and an assembler that allows the user to create programs for this processor.
The project raises questions concerning the extent to which a processor can be
parameterized. Our processor design must achieve a balance between a too generic
model, which will be hard to implement, and a more specialized model that will limit the
choices a user can make.
8.1 Motivation
Processors are traditionally implemented on Application Specific Integrated
Circuit (ASIC) chips. Designing a processor on an ASIC chip is usually very costly and
time consuming due to the long and expensive manufacturing process that is involved
in the physical creation of the chip. As a result, programmable logic chips have been
used as an alternative processor platform. The two main types of programmable logic
chips are Field Programmable Gate Array (FPGA) and Complex Programmable Logic
Device (CPLD) chips. These chips allow users to change their designs without
incurring the manufacturing costs and delays involved in ASIC design. Programmable
logic chips, however, require the user to know an HDL to be able to program them. Our
project removes this need while retaining the benefits of using programmable logic
chips.
8.2 Background
To lay a better foundation for our project a brief overview of the following topics
will be given: an introduction to FPGA/CPLD chips, the differences between Reduced
Instruction Set Computers (RISC) and Complex Instruction Set Computers (CISC) and
the MIPS architecture. These topics are relevant to the report and provide background
to some of the design decisions made in the project.
An FPGA/CPLD chip is an integrated circuit that is characterized by an array of
reprogrammable logic blocks and a flexible configurable interconnect structure. For
example, an Altera FLEX EPF10K40 has 2304 logic blocks [1]. Logic blocks can be
viewed as building blocks and when these building blocks are combined together they
can implement many different functions. The con nection between different blocks is
achieved via a matrix of wires and switches between the blocks. The matrix along with
the logic blocks allows an FPGA/CPLD chip to be customized to meet the individual
user’s needs.
There are two kinds of paradigms for processors: RISC and CISC. CISC chips
are based on the principle of providing users with many complex instructions, with each
complex instruction encapsulating many simpler instructions. Conversely, RISC chips
Page 15
provide users with only simple instructions. RISC instructions are usually a constant
size, perform one function per clock cycle, and access memory with a limited number of
instructions. This allows for a simpler processor design and the ability to pipeline
instructions, which increases the speed of the processor.
The MIPS architecture is a specific implementation of the RISC ideas presented
above. MIPS makes an additional simplification that memory can only be accessed
with load and store instructions, thus allowing the processor control and datapath to be
simplified even further. The processor implemented in this project is based on the
MIPS architecture.
8.3 Objectives
There are several objectives that we wanted to achieve. Section 8.4, Design and
Measurement Methodology, explains our approach to attaining these objectives while
Section 9, Design, describes how we actually achieved them. The objectives of our
project are to:
Design a set of hardware components to be used as a basis of a
user-defined processor
Create a set of easy to use, portable, and flexible software that allows a user
to create a processor without knowing a Hardware Description Language
Analyze the performance and benefits of using our approach
8.4 Design and Measurement Methodology

We have made several decisions at the beginning of our project that determined
the focus of our work. The design that we selected stayed unchanged while our initial
ideas for measurement had to be modified.
8.4.1 Design Methodology

There are two aspects to our design. The first is a set of software tools that
allows the user to specify, generate, and use a processor. The second is a set of core
processor components that indicates the underlying structure of the processor.
We decided to make all of our software flexible, user friendly, and portable. We
designed our software so that it is divided into three components. These components
are a GUI, an HGP, and an assembler. This division was made so as to make each
piece of software as independent of the other pieces of software as possible. The
independent pieces of software can be individually modified without affecting the other
parts of the project. Each piece of software is also designed to take full advantage of
object-oriented design so as to facilitate expansion of the software. The initial interface
is also generated on the fly whenever it is executed based on a file that contains
information regarding the choices that the user can make. To make the software user
friendly, we developed an easy to use and intuitive GUI. All of the software is also
written in Java and all of the interfaces between the different pieces of software use
XML. These two factors make the software extremely portable because Java can be
Page 16
run on many operating systems and XML files are well-defined text files that should be
readable on any operating system.
Processors are usually implemented on ASIC chips. FPGA chips have been
used only to add functionality to these processors in the form of coprocessors [2-6]. The
methodology for our project differs from the above in that, in addition to producing
reconfigurable coprocessors or modules and connecting them to the processor, the
whole processor is reconfigurable. Our project follows many of the design principles
from “Rapid Prototyping of a RISC Architecture for Implementation in FPGAs[7] ”, but
allows for much greater flexibility.
8.4.2 Measurement Methodology

Our design allows the user to customize the whole processor on an FPGA. The
customization is achieved through the use of the HGP that uses parameterizable HDL
modules to create the precise user specified processor. This allow s for the
implementation of a processor that will not contain more functionality than needed. We
have taken measurements of our final product and have answered questions on the
efficiency of our model.
To determine the efficiency of our model we analyzed two aspects of the FPGA
implementation of the processor. First, the maximum clocking frequency of the circuit
was documented for a fully featured processor compared to a processor that lacks the
functionality to perform multiply, divide, and shift operations . Second, the number of
logic blocks used in the FPGA was measured. Both of these measurements have been
performed within the Max+plus II hardware compilation program. Initially we were
thinking of making measurements on the entire circuit. This, however, is impractical,
due to limitations of the hardware that we have access to. The limitations have already
been described in Section 5, Revised Milestones. The measurements that we have
taken are presented in Section 9.10.
8.4.3 Report Outline

The report is divided into three parts. The first part contains introductory and
work management information. The second part describes our design. The final part
contains our conclusions. The sections that precede the introduction of this report give
an overview of the work distribution for this project. The sections contain a description
of the contributions of the different team members, the modifications that were made to
the timeline of this project, and a review of our performance in relation to our initial
milestones. The description of our design is described in the next section. At the end
of our design description we present the results that we have obtained. At the end of
our report we analyze our results, discuss the project’s benefits and limitations, and
describe possible extensions and applications of the project.
Page 17
9 Design
9.1 Overview
Our project can be divided into many different components. These components
include a Graphical User Interface (GUI), a Hardware Generation Program (HGP), and
an assembler.
The design flow for the use of the suite of software tools is show below in Figure
9.1.1.
Figure 9.1.1: Design flow for design tool
An overview of all three components, GUI, HGP, and assembler will be given
below, and then a discussed of the whole project will be given in greater detail in the
following chapters.
9.1.1 Graphical User Interface

The GUI allows the user to create a customized processor. The GUI includes a
list of supplied instructions with checkboxes. If the user wants to include an instruction
into the processor, they can check the corresponding checkbox. Furthermore, there are
other parameters such as bus width and register sizes that have sliders for the user to
specify. Once the user is satisfied with all of the parameters, they will click on the
“Create It” button and the GUI will create an XML file which will be submitted to the
HGP.
The GUI is written in Java and Java Swing. The reasoning behind this choice is
that Java can be run on a multitude of hardware and operating systems such as UNIX,
Linux, Microsoft Windows, and MacOS. Therefore the GUI will look and act
Page 18
fundamentally the same on all platforms and no extra time will be needed to port the
program to other platforms.
To pass the information gained from the user to the HGP, the GUI must create a
file standard. An XML (Extensible Markup Language) file has been chosen for this
purpose. XML is very flexible in its ability to provide information about objects; custom
fields and their properties can be created and parsed easily. Due to its flexibility and
simplicity XML has become a de facto standard for many data sharing protocols, and
therefore for future expandability and extensibility XML was decided upon as the file
format for the transfer of information between different modules of the project.
9.1.2 Hardware Generation Program

The HGP is the program that performs the most critical part of the work in the
project. It uses the information contained in the Us er Information XML file to create a
customized processor. The datapath of the processor is created by using a library of
parametrized HDL code. The HGP extracts the relevant information from the XML file,
and instantiates the HDL modules with the proper parameter s. Some modules will
always be instantiated by the HGP, although with different parameters, while other
modules will only be instantiated when needed by the user specifications .
While the construction of the datapath of the RISC processor is a comparatively
simple process, the creation of the control is complex. The HGP will need to analyze all
the modules present in the datapath and create opcodes , as well as the sequencing for
all control signals inside the processor. Once this is complete an HDL file describing
the control module is created and a complete processor is available.
The HGP is written in Java for the same reasons that the GUI is written in Ja va.
Furthermore the object-oriented nature of Java allowed for reasonable implementations
of instructions and control signals.
The HGP has two outputs: the first is the complete customized processor written
in HDL, and the second is an Assembler XML file. The A ssembler XML file allows the
HGP to communicate to the assembler (discussed below) what instructions are
available in the custom processor and the instruction format for each instruction. Once
again XML is used due to its flexibility to transfer information.
9.1.2.1 Parameterized HDL Code

The reconfigurable HDL code that is the source of the generated processor is
based on the MIPS architecture. The MIPS architecture is a simple RISC architecture
that is the foundation of many commercial processors, such as the processors found in
Silicon Graphics workstations. MIPS processors can only access memory through load
and store instructions, which move a variable from memory to the register file (or
vice-versa). Other instructions such as add or subtract cannot have operands that exist
in memory, the operands must exist in the register file. Such a sys tem simplified the
design of the control for the processor.
The MIPS architecture has four major components: a datapath , a control system,
input/output (I/O), and memory. The datapath is comprised of the components in the
Page 19
processor that perform functions on data. These components in a MIPS processor
implementation include the register file, the ALU, and the internal memory. Since the
project will be a modified version of the MIPS implementation, the datapath will contain
other components such as special function generator s. The control system of the
processor commands the datapath and memor y according to the instructions of the
program.
The HDL code written to implement these components was written in a fashion
that allows the HGP to easily load and change the code to implement the desired user
configured processor. More specifically, the customization of the HDL code was
achieved by using techniques such as writing components as an amalgamation of
repeated smaller modules, and other techniques.
9.1.3 Assembler
An assembler was written to transform source code written by a human into
machine readable code that is decoded by the custom processor. Un like most other
assemblers, this assembler must be able to deal with the availability of different
instructions and different machine codings for instructions. The assembler will gather
all the information needed about the particular hardware it is assembling from the XML
file produced by the HGP, and then translate the user supplied assembly language
source code.
The assembler will output the machine code that will be stored in the processors
instruction memory in a format understandable by the FPGA used.
Page 20
9.2 Graphical User Interface
The GUI needs to inform the user about how to use our program and to display
the parameters of the processor that the user can choose. Currently, the design allows
the user to choose values for the number of registers, bus width, address width, and the
set of assembly instructions the processor will recognize.
9.2.1 Design
In order to meet our design requirements, the GUI should, in decreasing order of
importance:
1. Be easy to use
2. Provide information about the program
3. Display parameters the user can choose
Assembly instructions
Bus width
Address width
Number of registers
4. Limit user input to only acceptable values
5. Be cross platform portable
The current design of the GUI that meets these requirements is shown in Figure
9.2.1.
Figure 9.2.1: Graphical User Interface for the Parameterized Processor Design Suite
Page 21
9.2.1.1 Ease of Use
Ease of use was chosen as the number one criterion for the GUI because the
GUI is the first thing potential users see. If users are intimidated, have a hard time
using the program, or never learn about all the features, then a lot of our hard work
goes to waste.
A number of features have been added to the GUI to make it easy to use. The
top-level menu contains a Help menu item. The Help menu item is one of the first
things the user sees when the application is displayed. Clicking on it will provide
detailed instructions to the user. Another way ease of use was added into the interface
was through the use of standard controls. Users of common graphical based operating
systems recognize standard controls like buttons, check boxes, menus, sliders, and
tabbed panes. This makes the interface intuitive even though the user may not
understand the purpose of the controls. Tool Tips were added to all of the standard
controls to give the user extra information about the functionality of the controls. Also
standard controls were grouped into four logical sections: Assembly Commands,
Number of Registers, Bus Width, and Address Width. This way the user knows what
choices they are making depending on what area they are in. All of these features
provide the user with an easily accessible interface.
9.2.1.2 Provide Information About the Program

The GUI contains a Help menu item that leads to documentation about the
program. The documentation contains information about how to use the GUI and how
to use the Parameterized Processor Design Suite.
One feature of the documentation is that each assembly instruction that the user
can choose is given a complete description. An example of the documentation
provided for the Add assembly instruction is shown in Table 9.2.1 and a complete list of
all the assembly instructions with descriptions is shown in Appendix 5. The
documentation tells you what category the assembly instruction belongs to, the
instruction’s full name, the parameters it takes, the result of using the instruction and a
text description of the instruction.
Category Name Instruction Parameters Result Description

Arithmetic Add Add rd rs rt rd = rs + rt The sum of registers rs and rt is
placed in register rd
Table 9.2.1: Example of Documentation for assembly instructions
9.2.1.3 Display Parameters the User can Choose

The main function of the GUI is to provide the user with all of the choices they
can make. This criterion is accomplished by using standard controls to obtain input
from the user. When the user clicks the button labeled “Create It!” the program
captures the current state of the controls. The GUI uses tabbed panes to present each
of the assembly instructions the user can select. This also allows instructions to be
grouped together by category. Also sliders are used for parameters such as bus width,
number of registers, and address width.
Page 22
9.2.1.4 Limit User Input to only Acceptable Values
Limiting user input to acceptable values removes the need for parsing the input
after it is entered. This also adds less complexity to the user interface.
The GUI achieves this goal by using checkboxes and sliders for input. These
controls have only a limited range of values that the designer can control.
9.2.1.5 Cross Platform Portability

Portability was achieved in the design of the GUI by using Java and the standard
Java Swing library. By taking advantage of the Java framework the design is
theoretically portable to any platform that has a Java Virtual Machine (JVM) written for
it. Grounding our design in Java takes us a step further in making our design portable
than by using any other framework available today.
9.2.2 Implementation
The GUI can be partitioned into two sections: the front end and the back end.
The front end collects user information about the processor and the back end outputs
the user information into an XML file. The XML file is used by the next process in the
Parameterized Processor Design Suite.
The front end of the GUI is written in Java using the Java Swing library. One of
the major concerns about the implementation of the front end of the GUI was the
possibility of future changes to the specification. Therefore the implementation had to
be written in a way to make it easily adaptable to change.
The implementation of the front end of the GUI has very few hard coded
elements in it so as to be as flexible as possible. Instead, the front end relies on an
external file to produce content and only hard codes the manner in which the content
will be displayed. For instance the GUI knows that it will display a number of check
boxes organized into categories. The actual categories for the checkboxes, the number
of checkboxes, the captions that will go on the checkboxes, help content for the
checkboxes and Tool Tips are all read in from a file before the GUI is displayed. The
same holds true for the sliders in the GUI. This scheme means that no extra code has
to be written if the project requires additional assembly instructions to be added to the
GUI checkbox list. To accomplish this the GUI uses an XML file that contains all the
necessary information. A lower level discussion of this process is included in Section
9.2.3.3.
The back end of the GUI has one requirement, which is to output the user’s
selections into a User Information XML Language (UIXL) compliant XML file.
Information about our design specifications of UIXL are included in Section 9.9. The
backend starts generating the file when the user presses the button on the user
interface marked “Create It!”
Page 23
9.2.3 Evolution of Graphical User Interface
The GUI has gone through a cycle of design, coding, and testing. The current
model of the interface is a result of what was learned from the initial design. Figure
9.2.2 shows the initial design of the GUI.
Figure 9.2.2: : Initial design of the Graphical User Interface
9.2.3.1 Advantages of the Initial GUI Design
The initial GUI design already contained several key features that have been
used in the current design. The initial design effectively separates the interface into
three regions where the user selects the assembly commands to be included, the
number of registers and the bus width. Standard components, such as check boxes
and sliders, have been used in the design. As mentioned earlier, using standard
components has two advantages. First it minimizes the user’s learning curve since
these components are familiar to the user. Second it limits the user’s input to correct
values.
9.2.3.2 Disadvantages of the Initial GUI Design

There are several drawbacks to the initial design. Specifically problems included
the effective use of space in the layout, usability, and flexibility of the design.
When the design of the instruction set, shown in Appendix 5, was completed it
included over sixty assembly commands. The area allocated for assembly commands
in the initial GUI design will not support over sixty commands. Even if the space was
expanded, sixty checkboxes cannot be effectively displayed on the screen at one time
without overwhelming the user.
Page 24
User testing was conducted in order to obtain useful feedback about the
interface. This testing revealed that although the interface was straightforward it
provided little additional information about the choices the user was making. For
instance, the assembly instructions should have detailed information about the
instructions upon request. Also the commands should be grouped in a logical order so
that similar commands are presented next to each other. Finally general instructions
for the entire program should be included.
The initial design also lacks flexibility. Instructions are hard coded into the
program. This makes changes to the instruction set difficult to implement in the GUI.
After the initial design was completed it was determined that another option should be
added to the user interface that allows the user to specify the address width. With the
initial design this option would have to be hard coded into the design as well.
9.2.3.3 The New GUI Design

To improve on all of the problematic areas in the initial design a new GUI design
was planned.
In order to allow for greater flexibility and future growth of the project the GUI
now reads its assembly instructions from a Parameters XML file. The Parameters XML
file layout is shown in Figure 9.2.3. This file contains not only the name of the
instructions but also what category they should be grouped into, the number and type of
the parameters, the result of the instructions, and a description of the instructions. This
XML file, with all fields filled out, has been generated for the instruction set. If additional
instructions need to be added then they can easily be appended to the list.
Page 25
<?xml version='1.0'?>

<ParameterList>
<Category name="Arithmetic">
<Instruction>
<Name>
Add
</Name>
<InstructionName>
add
</InstructionName>
<Parameter1>
rd
</Parameter1>
<Parameter2>
rs
</Parameter2>
<Parameter3>
rt
</Parameter3>
<Result>
rd = rs + rt
</Result>
<Description>
The sum of registers rs and rt is placed in register rd
</Description>
</Instruction>
.
.
.
</Category>
.
.
.
</ParameterList>
Figure 9.2.3 : Layout of the Parameters XML file used by the GUI to read in instructions. Note that
ellipses are used in the diagram to indicate that a parent tag may include more than one child tag.
Specifically the ParameterList tag may include many Category tags and Category tags may include
many Instruction tags.
In the current design, after the GUI reads in the Parameter XML file, it displays
the options to the user. Assembly commands in the same category are now displayed
on the same tabbed pane. Using tabbed panes allows the number of commands on
the screen at one time to be reduced. A help menu has been added to increase
usability. The help menu contains information about the instructions as well as the
overall usage of the program. Features such as tool tips have also been added to help
users understand the options they are choosing. Context sensitive help has also been
added to the GUI. Context sensitive help means that when a user presses F1, a help
menu appears with information specific to the part of the GUI that currently has focus.
Finally another slider has been added to allow the user to select the address width
parameter. A diagram of the new features added to the GUI is shown in Figure 9.2.4.
9.2.4 Testing
Testing of the interface was done using two methods. First, users unfamiliar with
the GUI performed usability testing. Secondly tests were developed to make sure the
GUI was stable. These tests were further divided into testing the reading of the
parameter XML file, general GUI functionality, and writing of the UIXL output file.
Finally integration testing was performed to make sure the interface between the GUI
and the HGP works.
Page 26
Usability testing is important to the GUI in order to discover what potential users
have trouble with. Since the change from the old design of the GUI and the addition of
all the help features usability tests results have improved greatly.
The actual code of the GUI was tested in two different ways. First general
behavior of functions was tested to make sure they perform adequately under normal
conditions. Next, testing was performed to make sure the program degrades gracefully
under adverse conditions. For instance tests such as removing the input file, entering
erroneous data, or inputting too much data were performed.
Since the interface between modules was decided on near the beginning of the
project, integration testing produced few errors.
Figure 9.2.4: Revised design of the graphical user interface with added Help Menu, Tabbed panes, Tool
Tips, and Address Width selection
Page 27
9.2.5 Current Status
Currently the graphical user interface is complete. All of the features discussed
have been implemented. This includes the new features discussed in the interim
report. Testing of the GUI has resulted in a stable product .
Page 28
9.3 Processor Organization
9.3.1 Registerfile
The registerfile is the portion of the processor that keeps intermediate results of
program instructions. It is characterized by a range of registers, which hold the
intermediate results, along with functions to retrieve values and to store new values.
9.3.1.1 Design
The registerfile required for the design of the processor needed two main
capabilities. First, due to the Reduced Instruction Set C omputing (RISC) paradigm that
the processor design was following, the registerfile needed to be able to supply two
values at once. Second, the registerfile needed to be able to be parameterized to
handle different bus widths and to have a varying amount of registers available.
To accommodate the above two requirements the registerfile needed to fit the
Very High Speed Integrated Circuit Hardware Description Language (VHDL) component
description seen in Figure 9.3.1.
COMPONENT registerfile
GENERIC (
BUSWIDTH: INTEGER:=32;
NUMREG: INTEGER:=32;
LOG2NUMREG: INTEGER:=5);
PORT (
write : IN STD_LOGIC;
clk : IN STD_LOGIC;
readreg1 : IN STD_LOGIC_VECTOR(LOG2NUMREG - 1 DOWNTO 0);
readreg2 : IN STD_LOGIC_VECTOR(LOG2NUMREG - 1 DOWNTO 0);
writereg : IN STD_LOGIC_VECTOR(LOG2NUMREG - 1 DOWNTO 0);
writedata : IN STD_LOGIC_VECTOR(BUSWIDTH - 1 DOWNTO 0);
readdata1 : OUT STD_LOGIC_VECTOR(BUSWIDTH - 1 DOWNTO 0);
readdata2 : OUT STD_LOGIC_VECTOR(BUSWIDTH - 1 DOWNTO 0));
END COMPONENT;
Figure 9.3.1: Registerfile VHDL Component Declaration
As can be seen from the component description the registerfile can be

customized with the use of BUSWIDTH and NUMREG parameters (Note:
LOG2NUMREG is itself not a parameter, but the ceil(log2(NUMREG)) and is included in
the parameter list due to an inability of VHDL to perform arithmetic computations in the
declaration of a component).
The design within the registerfile can be seen in Figure 9.3.2. There are 3 main
parts to the design: the registers, the output circuitry, and in the input circuitry, all of
which can be individually parameterized to construct the correct registerfile
Page 29
Figure 9.3.2: Registerfile
9.3.1.1.1 Input Circuitry

The input circuitry of the registerfile allows new data to be stored in the
registerfile. As can be seen from Figure 9.3.2, the Write Data Signal, which contains
new data to be stored, is connected to the data input of all the registers. However, the
data cannot be written into the register unless the enable signal for the particular
register is high.
Thus a particular register must be selected with the aid of a decoder that takes
the Write Reg signal (which is a integer from 0 to the number of registers -1) and sets
the enable signal of the particular register to high. This scheme, however, allows for a
register to be written to at every clock pulse. The output of the decoder is ANDed with
the Write Enable signal so that the processor can control when new data is written into
a register.
Page 30
9.3.1.1.2 Output Circuitry
The output circuitry of the registerfile allows data to be extracted from the
registerfile. As per the requirements for RISC architecture, there are two output ports.
The Read Reg 1 and Read Reg 2 signals (which are integers from 0 to the number of
registers -1) select the appropriate register through the use of a multiplexor .
The multiplexors in the design are quite large (due to the multiplexing of a large
number of registers and the fact that each register output is also composed of many
wires) and therefore use many logic cells within the FPGA. A, simpler and more space
efficient design would consist of tri-state buffers, as seen in Figure 9.3.3, used to
connect all the registers to the output signal, but since FPGA’s do not have
programmable tri-state buffers, multiplexors must be used instead.
The multiplexors were created with the aid of LPMs (Library of Parametrizable
Modules) from within Max+plus II and are parameterizable, and therefore the specific
multiplexor needed can be easily created.
Figure 9.3.3: Tri-Stated Output Circuitry
9.3.1.2 Testing
The registerfile module has been completed and fully tested in simulation thus
meeting the mid-November deadline. Please see Appendix 6 where the timing
diagrams for the test cases are available. The test cases outline a scenario where
information is stored in each register and then the information is retrieved from each
output port.
Page 31
9.3.2 Arithmetic and Logic Unit (ALU)
The ALU is the component of the processor that performs all the arithmetic
operations such as addition and multiplication as well as all the logical operations such
as comparison testing and shifting.
9.3.2.1 Design
The ALU required for the design of the processor needed to be extensively
modular so that the removal and addition of a rithmetic and logic operations by the HGP
could be performed without upsetting the rest of the ALU. Furthermore, the ALU
needed to be able to handle different bus-widths and therefore, not only needed to be
modular per instruction, but also in terms of the size of inputs it could handle.
The design of the ALU thus proceeded from the design of a one-bit ALU, which
could then be integrated into any size ALU.
9.3.2.1.1 One-Bit ALU

The one-bit ALU is the building block of the whole ALU, and therefore needs to
be designed in a parallel fashion to reduce the dependencies between instructions.
The VHDL component description of the one-bit ALU can bee seen in Figure 9.3.4.
COMPONENT onebitalu
GENERIC (BITWISEALUOPSIZE :INTEGER:=4;
NUMBERBITWISEALUOP: INTEGER:=16);
PORT (
a : IN STD_LOGIC;
b : IN STD_LOGIC;
carryin : IN STD_LOGIC;
slt : IN STD_LOGIC;
slte : IN STD_LOGIC;
sgt : IN STD_LOGIC;
sgte : IN STD_LOGIC;
seq : IN STD_LOGIC;
sne : IN STD_LOGIC;
bitaluop : IN STD_LOGIC_VECTOR(BITWISEALUOPSIZE-1 DOWNTO 0);
binvert : IN STD_LOGIC;
f : OUT STD_LOGIC;
carryout : OUT STD_LOGIC;
addoutfinal : OUT STD_LOGIC);
END COMPONENT;
Figure 9.3.4: One-Bit ALU VHDL Description
The one-bit ALU is characterized by two parameters and a series of inputs. The
NUMBERBITWISEALUOP parameter is the number of bit-wise operations the ALU can
perform. The second parameter is actually a transformation of the first
(ceil(log2(NUMBERBITWISEALUOP)) and is included due to the inflexibility of VHDL
noted above.
The inputs to the one-bit ALU also constitute a parameterizable aspect of the
ALU that the HGP can play with. As can be seen from the design of the one-bit ALU in
Figure 9.3.5 the inputs slt (set on less than) through sne (set on not equal) can be
included, or not included depending on the user specifications. For example, If the user
Page 32
Figure 9.3.5: Design of One-Bit ALU
needs a ‘less than’ comparison then the slt (Less in Figure 9.3.5) will be included,
otherwise it will not be part of the design. The other sXX inputs can be added or
removed in the same fashion by the HGP. Furthermore, as can be seen from Figure
9.3.4, one-bit operations such as AND and XNOR can be removed from the one-bit
ALU without affecting the other ALU operations. The u ser specifications will determine
which gates are included in the one-bit ALU during compile time.
The control signal Bit Invert is used to allow for subtraction. If subtraction, and
all other operations that need subtraction (such as branching or comparisons) are not
required in the ALU, this control signal and the accompanying multiplexor can be
removed from the one-bit ALU by the HGP.
9.3.2.1.2 Bit-Wise Operations

Following the design of the one-bit ALU, many one-bit ALUs can be used in
tandem by the HGP to construct a variable size ALU that can perform bit wise
operations (Figure 9.3.6). All the one-bit ALUs are exactly the same, and therefore the
Page 33
HGP can generate as many one-bit ALUs as necessary to develop the customized
processor.
Figure 9.3.6: Variable Size ALU
The only difference between the one-bit ALU’s used in the total ALU is the
source of their input. The first one-bit ALU receives its Less, Greater and Equal signals
from the comparison-checking module, and all others receive ‘0’ as their input. This
mechanism is used to provide the ‘set on less than’ and comparable instructions. The
comparison-checking module receives the result of input A minus input B (not shown in
Figure 9.3.6) and thus can determine if the input A was equal, less than or greater than
input B. Furthermore the comparison-checking module will determine if overflow has
occurred during the arithmetic operation.
Page 34
9.3.2.1.3 Complete ALU
With the ALU designed above, the processor can perform a limited number of
operations. For a more extensive list of operations, such as multiply and divide, the
ALU designed above must be incorporated into a larger ALU. In Figure 9.3.7, the
bit-wise ALU is just one part of a larger modular ALU.
Figure 9.3.7: Full ALU
The complete ALU contains modules for shi fting, rotating, multiplying, and
dividing, as well as containing the bit wise ALU. These former modules were created
with the aid of LPM’s contained in MAX+plusII and therefore can be easily modified by
the HGP to handle different bus-widths. Furthermore, any or all of these modules can
be removed by the HGP without affecting the functionality of the other modules.
Page 35
With the removal of either multiply or divide, the two mu ltiplexors adjacent to the
Hi and Lo registers in Figure 9.3.7 can be removed by the HGP. With the removal of
both multiply and divide the two registers can also be removed. The Hi and Lo registers
are contained in the ALU due to observation that both multiply and divide will produce
results that are 2*N bits wide given inputs that are N bits wide, and thus the result must
be stored within the ALU so that the program can store each portion of the result
individually.
COMPONENT total alu
GENERIC (N : INTEGER :=32;

ALUOPSIZE : INTEGER :=5;
BITWISEALUOPSIZE : INTEGER :=4;
FUNCTALUOPSIZE : INTEGER :=3;
SHIFTSIZE : INTEGER :=5);
PORT (
clk : IN STD_LOGIC;
a : IN STD_LOGIC_VECTOR(N-1 DOWNTO 0);
b : IN STD_LOGIC_VECTOR(N-1 DOWNTO 0);
aluopin : IN STD_LOGIC_VECTOR(ALUOPSIZE - 1 DOWNTO 0);
shiftamt : IN STD_LOGIC_VECTOR(SHIFTSIZE - 1 DOWNTO 0);
funct : OUT STD_LOGIC_VECTOR(N-1 DOWNTO 0);
zero : OUT STD_LOGIC;
overflow : OUT STD_LOGIC;
lt : OUT STD_LOGIC;
lte : OUT STD_LOGIC;
gt : OUT STD_LOGIC;
gte : OUT STD_LOGIC);
END COMPONENT;
Figure 9.3.8: Complete ALU VHDL Description
From the VHDL component description of the o f the complete ALU it can be
seen that the ALU can be parameterized with five parameters:
1. N: bus-width
2. ALUOPSIZE: ceil(log2(number of operations available))
3. BITWISEALUOPSIZE: ceil(log2(number of bit-wise operations))
4. FUNCTALUOPSIZE: ceil(log2(number of operations - number of bit-wise opr.)
5. SHIFTSIZE: ceil(log2(N))
With these parameters the number of operations and the bus-widths can be
selected, and the HGP must then remove the unne eded modules from the ALU to
construct the final ALU needed for the customized processor.
9.3.2.2 Testing
The ALU module has been completed and fully tested in simulation thus meeting
the end of November deadline. Please see Appendix 6 where the timing d iagrams for
the test cases are available. The test cases outline all the ALU operations being
performed on 2 sets of inputs.
Page 36
9.3.3 Processor Datapath
The processor datapath is the organization of different components within the
processor. The datapath displays how components such as the registerfile and ALU
are connected to each other. The general datapath for the processor that the HGP will
create can be seen in Figure 9.3.9. (For simplicity control signals are not drawn in full,
but are replaced with red stubs)
Figure 9.3.9: Datapath
Page 37
9.3.3.1 Design
The design of the datapath followed the MIPS architecture and was influenced
by the instruction set that the HGP supports (displayed in Appendix 5). The datapath
needs to be able to handle all possible instructions, and their combinations. To
illustrate the design decisions made for the datapath, the use of each multiplexor in the
design will be discussed. The Instruction Register (IR), which plays an important part in
the datapath, will also be discussed.
9.3.3.1.1 IR and Register Selection

The design of the binary values that constitute instructions is of vital importance
for making the datapath efficient and simple. Due to the variability of the instruction
format that the HGP will generate, only a brief discussion will follow.
It is assumed that instructions will follow one of three formats. Following is a list
of the three formats along with the informati on stored in the format in parentheses.
1. Reg-Type, register to register operations such as ‘add’:
(Opcode, read reg1, read reg2, write reg, shift amount)
2. Immediate/Address-Type, operations involving natural numbers such as ‘add
immediate’ and ‘branch’
(Opcode,read reg1, write reg, immediate value/address)
3. Jump-Type, instructions to unconditionally jump from one region of a program
to another.
(Opcode,address)
The IR contains the instruction presently being executed in the processor, and
the instruction can be in one of the three formats explained above. Thus the
information flowing out of the IR loosely follows Figure 9.3.10. (It is not exact due to the
variability that can be generated by the HGP.)
Figure 9.3.10: IR and Register Selection
Page 38
With the three different instruction formats it can be seen that the inputs to ‘Read
Reg 1’ and ‘Read Reg 2’ always come from the same portion of the IR. The ‘Write Reg’
input however may come from the 4th portion of the IR (in reg-type instructions) or the
3rd portion of the IR (in immediate-type instructions). Since the ‘Write Reg’ input may
come from different portions of the IR, a multiplexor is needed to select between the
different inputs depending on which type of instruction is being processed . The select
signal (not shown in Figure 10) for the multiplexor comes from the processor control
which determines what instruction is being performed from analyzing the opcode field in
the IR.
Note that this portion of the datapath cannot be simplified unless only reg-type
instructions were available in the processor, but such a design would serve no practical
purpose due to the inability to use memory.
9.3.3.1.2 Input A, ALU

The first input to the ALU can originate from two places as seen in Figure 9.3.11.
The ALU input can originate from register A, which temporarily holds the information
received from the first output of the register file, or the ALU input can originate from the
Program Counter (PC), which contains the address in memory of the current instruction.
The first scenario is used when any reg-type instruction is being performed and the
second scenario is used whenever the address of the next instruction must be
computed (usually PC = PC + 4). Thus, regardless of the user specifications, this
portion of the datapath also cannot be simpli fied by the HGP.
Figure 9.3.11: Input A, ALU
9.3.3.1.3 Input B, ALU

As the first input to the ALU can originate from many place, so can the second
input to the ALU. It can originate from four locations as can be seen in Figure 9.3.12.
The four locations and scenarios are the following:
1. Register B, temporarily containing the information received from the second
output of the register file. This scenario is used with all reg-type instructions .
2. The number 4, used to increment the PC (i .e. PC = PC + 4)
Page 39
3. The output of the Sign Extension Module, which receives an immediate value
from the IR and expands it to fill the whole bus-width. This scenario is used
with any immediate instruction such as ‘add immediate.’ (immediate
instructions have a constant embedded into them)
4. The output of the Shift Left Module, wh ich receives a sign extended address
from the Sign Extension Module and shifts it left to convert the word offset.
This scenario is used with branch instructions to convert the word offset to a
byte offset.
Figure 9.3.12: Input B, ALU
If immediate instructions are not needed then the third input to the mu ltiplexor
can be eliminated by the HGP, and if there are no branch instructions in the user
specifications the fourth input to the multiplexor can also be eliminated. Furthermore, if
both immediate and branch instructions are not needed by the user, the Sign Extension
and Shift Left Modules can be both eliminated by the HGP during the creation of the
customized processor.
9.3.3.1.4 Shift Amount Input, ALU

The Shift Amount Input to the ALU is used to determine how many bits input A
should be shifted when a shift or rotate instruction is being performed. As can be seen
in Figure 9.3.13, the number of bits can originate from two places depending on if the
shift is a normal shift operation or a variable shift.
In a normal shift operation the number of shifts is hard-coded in the IR , but in a
variable shift operation the number of shifts is stored in a particular register in the
registerfile, and thus is temporarily stored in Register B.
If either type of instruction, normal shifts or variable shifts, i s not in the user
specifications, then the HGP may remove the multiplexor.
Page 40
Figure 9.3.13: Shift Amount Input, ALU
9.3.3.1.5 Memory Inputs

The memory is where the instructions composing the program and the majority
of the data used within the program are stored. The design of the memory is not
discussed in this section as it is discussed in Section 9.6. Suffice it to say that the
memory takes a memory address and either supplies the value or stores new data at
that memory address depending on the value of Read/Write control signal.
Figure 9.3.14: Memory
The address for the memory may originate from either the PC, to retrieve the
next instruction in the program, or AluOut, where the computed address for a load or
store instruction is stored temporarily. Regardless of the user specifications, the HGP
may not simplify this portion of the design as it is required for even the simplest
processor.
9.3.3.1.6 Program Counter (PC)

As explained above the PC contains the memory address of the next instruction
to be performed in the processor. The contents of the PC are usually incremented to
point to the next instruction to be retrieved, but jumps and branches can alter the
program flow and place arbitrary values in the PC.
Page 41
Figure 9.3.15: PC input
As can be seen from Figure 9.3.15, the new value for the PC may come from
four sources:
1. AluOut: To handle regular PC + 4 update
2. Reg A: To handle the jump register instruction where the next value of
the PC is held in the registerfile.
3. ALU: To handle branches where the old value of the PC is added to the
branch offset.
4. Shift Left: To handle jump instructions where the word address in the IR is
shifted left to construct a byte address and then combined with
the most significant bits of the original value PC to obtain the new
value of the PC.
This portion of the datapath can be considerably simplified by the HGP if jump
instructions are not required; the top two branches in Figure 9.3.15 can then be
eliminated. In addition, if branches are not required then the multiplexor can also be
removed.
9.3.3.1.7 Register Data Input

New data that will be stored in the register file can originate from two places.
Either the data is the result of an arithmetic or logic operation and is stored temporarily
in AluOut or the data is being retrieved from memory by a load instruction and is stored
temporarily in the Memory Data Register (MDR). Thus, as can be seen in Figure
9.3.16, there is a multiplexor to select between the two sources of data into the register
file.
Page 42
Figure 9.3.16: Register Data Input
The HGP cannot simplify this portion of the design since a processor without
load or arithmetic operations would be useless.
9.3.3.2 Processor Parameters

After reviewing the design of the processor, the parameters needed to construct
a custom processor can be readily chosen. Figure 9.3.17 contains the VHDL
component description for the processor.
ENTITY proc IS
GENERIC (
BUSWIDTH: INTEGER:=32;
NUMREG: INTEGER:=32;
LOG2NUMREG: INTEGER:=5;
NUMINSTRUCTIONS: INTEGER :=32;
OPCODEFIELDSIZE: INTEGER :=6;
REGFIELDSIZE: INTEGER:=5;
SHAMTFIELDSIZE: INTEGER:=5;
FUNCTFIELDSIZE: INTEGER:=6;
JUMPFIELDSIZE: INTEGER:=26;
IMMEDIATEFIELDSIZE: INTEGER:=16;
WORDTOBYTEOFFSET:INTEGER:=2;
ALUSRCASIZE_A: INTEGER:=1;
ALUSRCBSIZE_A: INTEGER:=2;
POWERALUSRCBSIZE_A:INTEGER:=4;
PCSOURCESIZE_A: INTEGER:=2;
POWERPCSOURCESIZE_A:INTEGER:=4;
ALUOPSIZE_A : INTEGER :=5;

BITWISEALUOPSIZE_A : INTEGER :=4;
FUNCTALUOPSIZE_A : INTEGER :=3);
PORT (clk : IN STD_LOGIC;

aluout : OUT STD_LOGIC_VECTOR(BUSWIDTH-1 TO 0));
END proc;
Figure 9.3.17: VHDL Description for complete processor
The parameters include all the parameters needed for the creation of the
registerfile and ALU as well as parameters such as PCSOURCESIZE_A which
indicates the number of possible locations that t he value of the PC can come from.
Furthermore, parameters such as REGFIELDSIZE are included so that the different
content within the IR can be connected to the appropriate places.
Page 43
9.3.3.3 Testing
The complete datapath has not been tested since the computing resources
needed to compile the processor are not available (Please see Section 9.10).
The testing of the modules in the datapath, including the Sign Extension
Modules and Shift Left Modules have been fully tested in simulation. Please see
Appendix 6 where the timing diagrams for the test cases are available
Page 44
9.4 Processor Control
The control circuitry is what controls how each instruction is being processed by
the processor. The control system is essentially a finite state machine (FSM) [8]. The
state machine transitions through a set of states for each instruction in order to ensure
proper instruction execution.
The program will use the state information for each instruction along with
predefined states to generate a new transition table with all necessary states. Each
state will be responsible for handling a part of the execution of a single instruction.
9.4.1 Flow Control

The processor executes commands on an instruction by instruction basis. In
order for the program to be executed correctly the processor control circuitry repeatedly
progresses through four stages of instruction execution. The four stages are shown in
Figure 9.4.1.
The first step is to fetch an instruction from the memory. The instruction being
loaded from memory is located at the address specified by the program counter (PC).
Once this instruction is loaded into the instruction register (IR) it can then be decoded.
In the decoding step, the controller reads the opcode of the instruction to identify its
purpose and if necessary reads additional information from memory. The next step is to
execute the instruction. This is done by loading all appropriate registers and executing
the instruction specified by the opcode. The last step is to store the result of the
operation in a specified location.
Figure 9.4.1: Processor Modes of Execution
Page 45
9.4.2 Design Approach
In order to properly design the control unit for the parameterizable processor we
decided to create a generic control unit upon which all control units generated by our
software are based. The main advantage of this decision, is that if a processor with a
full instruction set worked with the specified control units, then removing control steps
for unused instructions would not affect the flow of control for other instructions. The
following sections will explain how the control unit was designed and tested.
9.4.2.1 Design Principles

The design of the control unit follows the four stage model of execution as seen
in Figure 9.4.1. Each instruction, with a few exceptions, has exactly the same four
stages to execute before it completes. This similarity allows for optimization, such as
having only one instruction fetch state set for all instructions, rather than separate sets
for each instruction. The second advantage of this approach is the ability for some
instructions to take less time, because a step in the execution model h as been skipped,
since it was unnecessary. For example, the jump instruction only requires three stages.
The reason for this is that unlike instructions such as add, subtract or multiply, the jump
instruction only needs to update the program counter (PC) register with a new address
value. This value is stored in the opcode of the instruction, and therefore it does not
need to be computed by the ALU. This allows one stage of execution to be skipped.
9.4.2.2 Testing
The testing of the control unit was done based on the full instruction set. The
basis for our testing method was the fact that if the control unit was able to properly
control all instructions provided by the design suite, then it would also properly control a
smaller subset of those instructions.
Once the testing was complete, we have decided to test a reduced version of the
control unit. This stage of testing was necessary in order to verify that there are no
dependencies between states of different instructions, as well as to prove that the HGP
properly optimized the control circuitry.
Page 46
9.5 Processor Input/Output
An I/O interface was designed, created, and tested for the processor. The
interface uses memory mapped I/O that allows the processor to only communicate with
a limited set of devices. There are two different ways to access input and output
devices in the majority of computer systems. The two different ways are direct port
access and memory mapped access. Direct port access requires that devices have a
separate set of connections to the processor a nd that the processor has extra
instructions to deal with these devices. Memory mapped access on the other hand
allows the processor to have a simpler design. Extra logic, however, must be added
outside of the processor to figure out if the processor want s to access memory or an I/O
device [8]. Only memory mapped I/O was implemented so that the processor, the most
complex hardware, component is as simple as possible.
The VHDL files that are being generated can be divided into two large sections,
the processor, and everything outside the processor. The processor can be further
subdivided into a register file, an Arithmetic Logic Unit (ALU), a data path, and a control
circuit. The processor can communicate with everything outside of it through a set of
address, data, and status lines that connect it to everything else. Everything else
includes memory for storage, I/O devices that allow the processor to interact with the
outside world, and a way to figure out if the processor wants to communicate with th e
memory or with the I/O devices. Figure 9.5.1 is a block diagram that represents how the
design looks like at a high level of abstraction.
Figure 9.5.1: High Level Abstraction of Hardware
9.5.1 I/O Interface

The I/O interface examines the informatio n that has been sent by the processor
and then determines which device the processor wants to communicate with. The I/O
interface can be connected to a mouse, a keyboard, any number of PS/2 devices, and
to a monitor. The mouse, keyboard, and other PS/2 devices have relatively simple
interfaces, while the monitor has a rather complicated interface. Figure 9.5.2 shows
how the I/O interface is divided.
Page 47
Figure 9.5.2: I/O Interface
9.5.2 I/O Controller

The I/O controller communicates with the processor in the same way that
memory does. The processor sends an address, a signal that indicates when the
processor wants to communicate, a signal that specifies if the processor wants to read
data or to write data, and data, if the processor wants to write data. These signals are
passed along wires that have been labeled a (address), as (address strobe), rw
(read/write), and d_in (data in), respectively.
The processor in turn can receive signals from the I/O interface. These signals
include a signal that indicates that the operation is succe ssful and data that the device
sends to the processor if the processor wants to read information. These signals are
passed along the dtack (data acknowledge) and d_out (data out) lines respectively.
This entire unit also uses a clock to synchronize with the monitor and a slow
clock to communicate with any PS/2 devices. The devices also interact with the outside
world by using several lines that end in pins that can connect to the devices. Each
PS/2 device requires data, clock, ground, and power supply lines. The monitor requires
pins that carry red, green, and blue intensities as well as horizontal and vertical refresh
signals. These pins must go through all of the design files in a cascading fashion
because they must be assigned as pins in the main program file that encapsulates all of
the design files.
The I/O controller waits until the processor wants to interact with an I/O device.
Once this event occurs, the I/O controller looks at what device the processor wants to
communicate with, and whether the communication is a read or a write. The device is
selected based on the address that the processor sends. The controller then passes
the appropriate information to the device it has chosen and waits until the
communication has completed. The operation is completed when the address is no
longer valid. Once the communication is completed the selected device becomes
Page 48
unselected and cannot send any more signals to the processor unless it is selected
again.
9.5.3 I/O Devices

Unlike the rest of the processor, I/O devices have a fixed number of pins that
they use. All PS/2 devices have six pins in their sockets. PS/2 devices however, only
require four pins and the rest of the pins are unused. [9] A VGA monitor requires five
pins as mentioned previously. PS/2 devices usually operate at a frequency between
10 and 33 kHz [9] while a monitor has a very specific way that it has to be dealt with so
that it works correctly.
9.5.3.1 Mouse
A PS/2 mouse communicates with another device through a 6-pin PS/2
connector. The 6 pins on the connector are:
1. Data
2. No connection
3. Ground
4. +5 V
5. Clock
6. No connection
The pin layout for the socket and the plug is as follows in Figure 9.5.3:
Figure 9.5.3: Socket and Plug Layout
The mouse sends data three packets (bytes) at a time. The data contains
information concerning the last movement the mouse has made and the states of the
buttons (either pressed or not pressed). The way the information is stored is shown in
Table 9.5.1.
Page 49
Bit 7 6 5 4 3 2 1 0
Byte 1 YV XV YS XS 1 0 R L
Byte 2 X7 X6 X5 X4 X3 X2 X1 X0
Byte 3 Y7 Y6 Y5 Y4 Y3 Y2 Y1 Y0
YV, XV
XS, YS
L, R
X0-X7
Y0-Y7
Table 9.5.1 : The format of the information that the mouse transmits [1Borys]
The clock frequency that the mouse generates is somewhere between 10 kHz
and 33 kHz. The mouse uses this clock signal to synchronize its communications.
When a mouse transmits a byte, it sends 11 bits:
• A start bit, which is always ‘0’
• Eight data bits
• A parity bit that is ‘1’ if the data contains an even number of ‘1’s
• A stop bit, which is always ‘1’. [10]
The mouse can send data when its state has changed and it detects that both
clock and data are high. The mouse then sends data so that whatever it is connected to
can capture the data on the falling edge of a signal . [9]
9.5.3.2 Generic PS/2 Port and the Keyboard

The transmission of a byte of data from any PS/2 device is the same as that for
a mouse. The mouse just sends three bytes of data sequentially at any one time.
Transmitting to a device, however, requires some extra work. The data and clock pins
on a PS/2 port are bi-directional and open collector. This means that data and clock
are ‘1’ unless either the device or the computer it is connected to want them to be 0.
The computer can send information to a device by controlling the data and clock lines in
a certain way [10].
For the computer to send a byte to the device, the computer has to:
• Set clock to ‘0’ for at least 60 microseconds
• Set data to ‘0’
• Allow the clock to go back to ‘1’ and allow the device to control the clock
• Transmit the data by changing the values on the data line when the clock
signal is low
• Wait for an acknowledge bit
A byte that a computer sends and a byte that a device sends both have the
same format. That is, a start bit, eight data bits, a parity bit, and a stop bit, as described
above. [10]
A keyboard can use the PS/2 port to communicate with a computer. The
keyboard can respond to commands sent to it, and when a key is pressed and released
Page 50
the keyboard sends make and break codes respectively. A buffer does not have to be
implemented to store past keystrokes because the keyboard has an internal buffer.
The keyboard has this buffer so it does not loose any information if it is stopped from
transmitting data to the computer [11].
9.5.3.3 Monitor
The monitor displays images on the screen by quickly going through each pixel
on the screen and turning it into a certain colour. The monitor does this about 60 times
a second. The pixels are set starting with the top left corner, going to the right, line by
line until the bottom right is reached. The process is then repeated . Each line can be
thought of as a horizontal cycle. The monitor knows what to do because it is sent
information concerning the red, green, and blue intensities of each signal and a
horizontal synchronization indicator that indicates when a line is finished. There is also
a vertical synchronization indicator that indicates when a screen is finished being
updated. [4Borys] A simplified waveform is shown in Figure 9.5.4. The most difficult
part about sending information to a monitor is making sure that everything is sent at the
appropriate time.
Figure 9.5.4: A single screen write to a screen with a resolution of 2 pixels by 2 pixels
9.5.4 Mouse Interface

The mouse interface is designed so that the mouse can write to a set of registers
whenever it wants. The processor can read these registers whenever it wants (except
when the mouse is writing to them) and is responsible for ensuring that it detects all of
the mouse’s movements. The processor, however, cannot send information to the
mouse. The mouse interface waits for the mouse to send information and then reads
what the mouse sends a bit at a time. The bits are received as part of a set of three
bytes as specified by the PS/2 interface. Each byte is sent as a stop bit, eight data bits,
a parity bit, and a stop bit. The parity bit is not checked to see if the data has been sent
successfully. Once all three bytes are read in, they are stored in three registers that
can be read by the computer.
To ensure that the computer knows whether a signal is new or whether the
computer has already read the signal in, one of the twenty-four bits is used as a status
flag. One of the bits that is transmitted by the mouse is always a one. Once one of the
registers has been read by the processor, that bit is set to 0 in the register. The design
of this circuit is shown in Figure 9.5.5.
Page 51
Figure 9.5.5: Mouse Interface
9.5.5 Generic PS/2 Interface

This interface can be used to interact with any PS/2 device. The processor will
use this interface to interact with the keyboard. This interface can also be used to
interact with the mouse and any other PS/2 device. This port works by stopping the
clock on the device until the processor wants to communicate. The device continues
working but cannot send and receive information via its PS/2 port. Once the processor
wants to communicate, the clock on the device is allowed to run, and the processor
either writes to the device or reads from it. Once the operation is completed, the
interface sends any necessary data and a signal that indicates that the operation is
finished. Figure 9.5.6 shows the design of this interface.
Figure 9.5.6: Generic PS/2 Interface
This generic interface is used to communicate with an AT keyboard. XT

keyboards have a different, although similar, method of communication, which can be
implemented at a different time . An advantage of this interface is that a buffer does not
have to be created to store past keystrokes, since keyboards have this feature built into
them. Another advantage is that many of the different commands and signal s that can
Page 52
be used to communicate with the keyboard can be generated using software. This
approach allows for simple hardware and greater flexibility as to what is actually
implemented to communicate with the keyboard.
9.5.6 VGA Interface

The VGA interface allows a user to place text and coloured rectangl es, referred
to as pixels. on the screen. The processor, however, cannot read data from the VGA
interface. The graphics screen has a resolution of 64 columns and 30 rows. The
original design of the VGA Interface allowed the processor to either write to a text
screen or to a graphics screen. As the project continued, the amount of memory that
the interface used had to be reduced. As a result, the interface had to be modified to
reduce memory usage.
The interface is organized as a piece of memory and a finite state machine that
outputs the contents of the memory onto the screen. The processor can either write to
the memory or clear the entire memory. The finite state machine takes the contents of
the memory and outputs them along with the proper contr ol signals to the screen. The
processor can put an element at one of 1920 (64*30) places on the screen. To reduce
the amount of memory that is used, there are only 64 elements that the screen can
display. These are the uppercase letters, the 10 digits, rectangles of 8 different colours
and 20 punctuation and mathematical characters. The 8 colours are black, white, red,
green, blue, yellow, magenta, and cyan. The 20 extra characters are all within the curly
braces, which are not part of the character set: {,.:;?!()*+-/=<>“’|&^}. The memory
actually requires a RAM element with 2048 entries each of which is 6 bits long. This
corresponds to 11 bits on an address line and 6 bits on the data line. Since the
minimum values that our software allows for these characteristics are larger, the 7th
data bit on the data line is used to clear the screen. If the bit is high for any write that
occurs to the memory, then the memory is reset.
9.5.7 Testing
Each of the components in the I/O interface has been tested. The tests
encompass the following simulations:
• The mouse interface’s actions when a mouse writes to it
• The generic PS/2 interface’s actions when it has to read from and write to a
device
• The entire I/O interface’s responses when the processor communicates with all
of the devices
All of the simulations that were performed show that the interfaces work as
expected. The simulations are presented in Appendix 6.
Page 53
9.6 Memory/Cache Design
In this section the design specification for the memory and cache controller will
be layed out. Further, the design choices will be explained as well as how the controller
fits together with the rest of the processor.
9.6.1 Overview
The memory and cache controller is an essential part of a processor. The
controller takes care of memory reads and writes and abstracts the memory circuitry
from the processor. This abstraction is necessary due to the fact that memory chips are
not created equal, and thus the controller abstracts away the differences, allowing the
processor to see a common interface. To speed up the memory access times a cache
is introduced.
A cache refers to memory that holds a copy of data stored in main memory, but
the access time required to read from a cache is much smaller than that required to
read from main memory. Since a cache is very fast, it is also very expensive and does
not hold much data compared to the main memory within a computer system. In order
to make good use of the cache, fo r example so that it holds more frequently used data
along with most recently accessed data, a set of mapping functions and replacement
algorithms have been developed.
9.6.2 Mapping Function and Replacement Algorithm

The mapping function refers to where in a cache a memory page can be stored.
In this project the set-associative-mapping has been selected, because it has better
performance than the full-associative mapping and it does not create a lot of overhead
when implementing the replacement algorithm.
A memory system can be thought of as a collection of memory blocks of a fixed
length, also called memory pages. Similarly the cache memory can be thought of in the
same way. The obvious difference is the size of the memory compared to the cache.
The cache size is much smaller than the main memory, but it has to be able to
copy any memory page into one of its own pages. One way of managing the memory
mapping is a set-associative mapping. As an example consider a two way set-
associative mapping shown in Figure 9.6.1. The memory size is 16 pages and the
cache size is 8 pages. As shown in the figure each memory page can be mapped onto
a page in the cache. Each page has a corresponding mapping set. Within that set it
can occupy any one page. Therefore, memory pages 0, 4, 8, and 12 can be mapped
into set 0. Similarly pages 1, 5, 9 and 13 can be mapped into set 1, and so on. When a
new page has to be added to a full cache, one of the pages already in the cache must
be removed. The selection of the precise page to be removed is determined by the
replacement algorithm of a given cache. The cache designed uses the Least Recently
Used algorithm. This algorithm chooses the page in the set that was not accessed for
the longest period of time and replaces it with the contents of the new memory page. In
the example above the algorithm can be easily implemented, since the controller can
Page 54
keep track of which page in the set was accessed last and replace the other page in the
set. The larger the set size, the more complicated the algorithm becomes.
Figure 9.6.1: Mapping Function
9.6.3 Design Schematic

The memory and cache controller module consists of four key sub-modules. The
modules will be connected to the main system bus, main system memory and any I/O
device controller that uses memory mapped I/O as a means of communication with the
processor. A top level view of the memory and cache controller module is shown in
Figure 9.6.2.
The address decode and tag search sub-module will be responsible for properly
decoding the memory address and deciding if it corresponds to a real memory address
or an I/O device access port. This port will in turn request proper behaviour from the
memory mapped I/O sub-module should it be necessary. In case of a real memory
access the control machine will generate a memory read request from cache memory.
The cache memory sub-module will process that request and perform a memory page
swap if necessary. Any read and write commands that deal directly with the main
memory will be handled by that module. In case of a write command the cache and the
main memory will be updated simultaneously in order to avoid the necessity of writing
back an entire page of memory back to main system memory during a page swap.
Page 55
Figure 9.6.2: Memory And Cache Controller module
Page 56
9.7 Hardware Generation Program
The HGP takes in an XML file from the GUI and then generates the HDL for the
specified processor and the XML for the assembler. This section will explain the design
methods and issues involved in the creation of the HGP . First an overview of the HGP
operation will be explained and then each sub module will be explained in detail.
9.7.1 Overview
The HGP performs the following steps in its execution:

1) Read XML
2) Selects Instruction Set Based Components
3) Write VHDL
4) Write XML
In the first step the HGP reads the XML from the GUI to determine what
instructions should be included in the processor, as well as the buswidth, number of
registers, and address width. Once the information from the GUI is transferred to the
HGP, the HGP can determine the resources and signals that are required for the
processor as well as determining the exact low lever processor parameters. At this
stage, the HGP knows all the details about the processor, all optimizations on the
datapath have been completed, and the VHDL code is about to be written. The HGP
produces the VHDL code by running through VHDL templates and modifying them to
produce the precise processor. Finally an XML file indicating what instructions are
included and their format is produced so that the assembler can produce machine
readable code for the processor. The HGP block overview can be seen in Figure 9.7.1.
9.7.2 Read XML

This subsection of the HGP is quite simple. It parses through the XML file and
for each instruction it encounters it instantiates a new instance of the specific instruction
class (this will be described below in the "Instruction Set Based Component" section).
The HGP also receives the number of registers and bus and address widths from the
XML file. The HGP stores these values in an instance of the ProcParameters class
(this will be described below in the "Instruction Set Based Component" section)
9.7.3 Instruction Set Based Component Selection

This section of the HGP depends heavily on four types of classes (Instuction
class and subclasses, Resource class and subclasses, Signal class and subclasses,
ProcParameter class), and thus the structure and operation of the classes will be
detailed before the operation of how the HGP determines the processor details and
parameters given the user specifications.
Page 57
Figure 9.7.1: HGP Overview
9.7.3.1 ProcParameter Class

The ProcParameter class stores all the parameters needed for the creation of a
new processor. Its operation will be described in terms of the members and methods
included in the class.
Page 58
9.7.3.1.1 Members
Table 9.7.1 lists the members of the ProcParameter class along with their utility:
Member Utility
protected boolean doShiftsExist; Determines if Shifts are included in the
processor
protected int busWidth; The buswidth of the processor
protected int numReg; The number of registers in the processor
protected int log2NumReg; log2(numReg)
protected int numInstructions; The number of instruction available in the
processor
protected int opcodeFieldSize; The number of bits available for the
opcode in the instruction word
protected int regFieldSize; The number of bits available to indicate a
register in the instruction word
protected int shamtFieldSize; The number of bits available for the shift
amount in the instruction word
protected int functFieldSize; The number of bits available for the
function field in the instruction word
protected int jumpFieldSize; The number of bits available for a jumpto
address in the instruction word
protected int immediateFieldSize; The number of bits available for an
immediate value in the instruction word
protected int wordToByteOffset; The number of bytes in a word
protected int aluSrcASize_A; log2(number of paths into Input A of the
ALU)
protected int aluSrcBSize_A; log2(number of paths into Input B of the
ALU)
protected int PCSourceSize A; log2(number of paths into the PC)
protected int powerPCSourceSize A; Number of paths into the PC
protected int aluOpSize_A; The number of bits specified to the ALU to
indicate an operation
protected int bitWiseAluOpSize_A; The number of bits used by the bitwise
ALU to select among different outputs
protected int functAluOpSize A; The number of bits used by the ALU to
select among different outputs
Table 9.7.1: ProcParameter Members
9.7.3.1.2 Methods
The ProcParameter class has three types of methods :
1. Methods which allow the busWidth, numReg, numInstructions, and
doShiftsExist members to be set or initialized.
Page 59
2. Methods which allow the value of each member (except doShiftsExist) to
be viewed.
3. The fixProcParameters method which analyzes the input received through
the set methods and determines a suitable value for all other members.
9.7.3.2 Resource Class Hierarchy

The Resource class Hierarchy includes a class for every resource available in a
full processor (seen as block diagrams in the discussion of the processor above).
The Resource class by itself is defined abstract so that it cannot be instantiated
alone; all other Resources must extend it. The Resource class has one member, a
name, which is a String, and has one method, getName , to retrieve the name.
The classes that extend the Resource class such as the RegisterFile class will
initialize the name member in their constructors. Some Resources such as the ALU are
also defined abstract since they include other resources within them. Please see
Figure 9.7.2 for the Resource hierarchy (for clarity, not all Resources are shown).
Figure 9.7.2: Resource Hierarchy
9.7.3.3 Signal Class Hierarchy

The Signal class Hierarchy includes a class for all data buses in the datapath
where the data may come from two or more places. Such examples in the processor
datapath discussion include all locations where a multiplexor is used.
The Signal class by itself is defined abstract so that it cannot be instantiated
alone; all other Signals must extend it. The Signal class has one static member (a
member which is shared among all instances of a class) that records the number of
Signals instantiated. The count is incremented in the constructor. The Signal class
also has a method to retrieve the value of the count.
Page 60
The classes that extend the Signal class such as the FullAdderInputSignal class
contain another static count to determine how many instances of the particular class
have been instantiated. Please see Figure 9.7.3 for the Signal hierarchy.
Figure 9.7.3: Signal Hierarchy
9.7.3.4 Instruction Class Hierarchy

The Instruction class Hierarchy includes a class for all instructions
implementable in the customizable processor. The Instruction class by itself is defined
abstract so that it cannot be instantiated alone; all other Instructions must extend it.
The Instruction class has two static members, a Vector (dynamic array in Java) of
Signals and a Vector of Resources. Furthermore, the Instruction class has two other
members, a name and an opcode. The Instruction class has three methods, two to
retrieve the signal and resource Vectors, and a resourceAlreadyThere(String name)
method which searches for the Resource with a particular name in the resource Vector
and returns if it is already in the Vector.
The classes that extend the Instruction class are themselves defined abstract.
The second level of hierarchy in the Instruction class Hierarchy contains classes for
different types of instructions such as Immediates or Memory Instructions. These
second level classes do not contain any additional members or methods, but they have
extra code within their constructors which will be explained below.
The third level of classes include all the individual instructions. These classes,
also, do not have any extra members or methods, but again have extra code within their
constructors. Please see Figure 9.7.4 for some representative code found in the
constructor of a third level class in the Instruction Hierarchy (more specifically the div
(divide) function).
public class div extends RType {
public div() {
super();
name = newString("div");
opcode = 5;
if(!Instruction.resourceAlreadyThere("RegLo")) {
Resource.addElement(new RegLo());
Signals.addElement(new ALUOutputSignal ());
}
if(!Instruction.resourceAlreadyThere("RegHi")) {
Resource.addElement(new RegHi());
Signals.addElement(new ALUOutputSignal ());
}
Page 61
if(!Instruction.resourceAlreadyThere("Divide")) {
Resource.addElement(new Divide());
Signals.addElement(new HiInputSignal ());
Signals.addElement(new LoInputSignal ());
}
}
}
Figure 9.7.4: div Class
The code within the constructor first calls the super constructor, then sets the
name of the Instruction as well as the opcode. Then by using the static method
resourceAlreadyThere in the Instruction class, the divide constructor searches for
resources that the divide function needs. If the resource has not already been created
by another instruction, the resource is instantiated and added to the Vector.
Furthermore, the Signals needed by the addition of that resource are also instantiated.
The divide constructor does this for every resource it needs.
One important point is that every third level class first calls the constructor of the
second level class. It was noted before that there is extra code within the second level
constructors as well. This code is very similar to the code seen in Figure 9.7.4, but it
allows for code reuse. For example, an “and immediate” class will only contain code in
its constructor to search for resources that it needs in addition to the resources that all
immediate functions need. The resources that are needed by all immediate functions
would be searched for in the constructor of the Immediate class which is a superclass
of the andi class. Please see Figure 9.7.5 for a subset of the Instruction class
Hierarchy.
Figure 9.7.5: Instruction Class Hierarchy
9.7.3.5 Processor Specification Determination

To determines the processor details and parameters given the user
specifications, the HGP need only call the fixProcessorParameters method in the
ProcParameter class. Due to the way the Instruction class and its subclasses were
organized, the needed resources and signal contention areas for the processor were
determined right when the instructions were read from the XML file and instantiated .
This leads to modular, flexible, and reusable code. If in the future further instructions
Page 62
need to be added to the available instruction set, then no classes must be modified, but
only new classes defining the resources and signals needed for the particular
instruction created.
9.7.4 Write VHDL

In order to create a complete circuit the HGP will use the Signals, Resources,
and ProcParameter classes to create a script. The script will then be processed using
information stored in a database and predefined templates. The database will contain
all the information necessary to properly create every state of the control finite state
machine: each state needed by any given instruction, component declarations, design
fragments, etc. The templates will act as a reference for the script to be able to create
the needed datapath. Both the database and the templates will be based on VHDL
code that will be extended by script commands to make it reusable. Once the
processing is completed a new control circuit will be generated.
The two steps, script creation and script processing, will be explained in the
following sections.
9.7.4.1 Script Creation
The script is created by the HGP in order to facilitate parameterization and
optimization features of the processor design tool. This script is based on a template
that is specific to the module being created, for example the datapath module. A
general description of the datapath is already included within a script. This script is
edited by the HGP in order to signal what options, resources, and signals are to be
used with the new processor.
The editing step of the script creation process requires the template script to
support features required by the processor. This means that the template should have
all components marked in such a way as to be recognized by the script processor. The
marking of the script is done with the help of ifdef statements (table 9.7.2). For
example, the multiply unit in the ALU is marked with a MULTIPLY flag. This flag
signifies that any processor design requiring the multiply unit should have the
MULTIPLY flag set.
These flags are what is set by the HGP when editing the script. Based on the set
of resources and signals the HGP is able to determine which flags need to be set and
places the corresponding information within a copy of the template script. This copy is
what will be used by the script processor in order to generate a new HDL description of
a processor.
9.7.4.2 Script Processing
The script processor is a Java program, which will read a set of specifications
from a file and then using its database compile a new hardware description in VHDL.
The set of specifications is written in a script file as described above. The script
processor uses the database that has been created using script commands, thus being
able to take a design template and change it into a circuit specific to the user’s needs.
Page 63
The following sections will describe how the script processor has been designed
and describes features provided by the scripts.
9.7.4.2.1 Script Processing Ideology

The idea behind the script is to create an intermediate step between the user
and the VHDL source code. In order to create a complete circuit a user specifies
processor parameters. These parameters are then used by the HGP to create a script
with the corresponding information. The script is then processed using information
stored in its database with respect to a predefined template. The database will contain
all information necessary to properly create each and every component of the new
processor. A template acts as a reference in order for the script to be able to identify
components that the HGP refers to when parameterizing and optimizing the processor.
Both the database and the template is written in VHDL, however they are extended by
script commands to make them reusable.
The advantage of the script approach is that once the program to process the
scripts has been developed, different scripts for different types of processors can be
written, thus allowing the software to grow by simply extending the libraries. Also this
will allow a user to modify a script to make an optimization if it is necessary. The
disadvantage of this approach is that most of the code will be in the scripts which
means some of the VHDL code will be duplicated.
9.7.4.2.2 Script Commands

The commands available through the script are listed in Table 9.7.2 along with
the rationale for their existence. Table 9.7.3 lists a set of script keywords that should
only be used in their specified context.
Page 64
Command Parameters Purpose
define name value Allows declaration of a variable that can be used in
a loop or as a parameter.
undefine name Removes declaration from the list. It is needed in
order to ensure that local declarations do not
interfere with global variables. Additionally it
prevents creating a full set of unique variable
names for local segments.
label name Enables the script to recognize a subsection of the
database. The program can look for the label and
process only the data within the label scope.
name Ends a label scope.
end
load filename label Loads contents from a database “filename” under a
label “label.” The contents of the label scope are not
executed, just copied.
execute filename label Executes the contents of the label scope “label” in
the database file “filename.” The commands within
this scope are executed as though they applied to
the caller of this command.
instructions instruction1 instruction2 ... Specifies the set of instructions that are to be used
instructionN EOI by the processor. The list is terminated by the
keyword EOI.
generate param The parameter “param” has to be one of the
following:
- STATES - to generate a set of states for the circuit
to follow, based on instruction set and declare them
in VHDL
- FLOW - to generate the states and their transition
information based on instruction set
for variable start end step Creates a set of VHDL commands for the code
provided in the for loop scope. For each iteration of
the loop the “variable” will change value by step.
close Closes the for loop scope.
Table 9.7.2: Script command listing
Keyword Purpose
EOI Specifies the End Of Instruction set in the instruction command
Specifies the size of the bus to be used
BUSWIDTH
Specifies the address space for the processor
ADDRESSWIDTH
Specifies the size of a memory page
PAGESIZE
REGISTERCOUNT Specifies the number of registers to be used
Table 9.7.3: Script Keyword List
9.7.4.2.3 Expression Processing

The feature added to the script processor is its ability to evaluate expressions.
The reason for this addition was to support future features such as loops and more
complex conditional statements. Another reason for this addition was to enable the
Page 65
script processor to use script variables not only as macros to be placed within the VHDL
code, but to actually manipulate them.
The main goal that has been achieved by adding this feature is the ability to
handle more complex operations within the script, thus making the script processor
something more than just a variable substitution tool. The expressions handled by the
script processor are all of an integer or string type. The integer type allows the results of
the expression to be used for width evaluation, counting, and also reusing VHDL
defined generic variables. The string type gives the script processor a little more
flexibility. Now an expression can be also treated as a parameter to a VHDL
instantiated module. This feature is very useful, because it allows the script processor
to change connections between different modules or eliminate them completely in order
to optimize the design.
9.7.4.2.4 Testing
The list of currently available commands in Table 9.7.2 has been fully tested. We
were able to use the script to develop many different processors without any problems.
Some of the templates used by the design tool have been modified to include script
commands. Generating proper VHDL code using the script enhanced code has been
successful.
9.7.5 Write XML

The very last task that the HGP has to perform is the creation of an XML file that
specifies the types of instructions that the generated processor can perform. The
assembler subsequently generates machine-readable code from a program written in
assembly language based upon this XML file. The description of the format of the XML
file is described in Section 9.9. The XML file consists of a list of register descriptions, a
list of instruction descriptions, and a description of the types of instructions that the
processor can execute.
The XML file is given information from two sources. The user provides the
number of registers, the size of the instructions, and list of instructions through the GUI.
The HGP generates all of the details of the implementation, such as the size of all of
the fields of the instruction (e.g. opcode, destination register, and immediate value
fields) and the way that the processor interprets different instruction types such as
memory instructions and register-only instructions. All of the information that is required
can be gathered from the ProcParamter class and a list of Instruction objects based
upon the Instruction Class Hierarchy.
Page 66
9.8 Assembler
The main function of any assembler is to convert assembly language into
machine code for a target processor. The target processor in this case is generated
according to user specifications. Therefore the assembler is dynamic and it can deal
with several different processors and a large number of permutations of assembly
instructions. The assembler also provides meaningful errors when it encounters them
in the assembly code.
9.8.1 Design
The assembler design requires that the assembler be able to handle many
different instruction sets. To accommodate this requirement, the HGP must output all
the assembly instruction details. The assembler proceeds by reading in these details
and forming assembly rules. The second phase of the assembler applies the assembly
rules to a user created assembly file.
9.8.1.1 Interface Design

The interface design for the assembler is command line only. At a later point a
GUI shell could be built around the command line functionality. The first command line
argument to the assembler indicates the assembly language file the assembler is going
to convert. The Parameterized Processor Design Suite convention dictates that this file
should end with a .paf. The assembler writes to a file with the same name as the
assembly language file except the extension is replaced with .out.
For instance, if the user wishes to assemble an assembly file called program.paf
then they would issue the following command:
assembler program.paf
This would generate a file called program.out, which would contain the machine code.
9.8.1.2 Error Handling

The assembler must generate errors if the assembly code is badly formed.
When the assembler encounters an error it enters an error recovery routine. Because it
is useful to identify more than one error when parsing code, the assembler does not
quit when it finds the first error. Instead its error recovery mechanism reports the line
number and the cause of the error and continues parsing the file without generating
code for the current line. This error recovery technique will only generate further errors
based on the original error if a label was not recognized because of the original error.
This method provides the user with more information about the correctness of their
assembly code.
9.8.2 Implementation
The implementation of the assembler design was done using Java. The
assembler itself can be partitioned into two phases. The first phase is the construction
Page 67
of the assembler and the second phase is the application of the assembler onto the
assembly code.
In the first phase the assembler is constructed to fulfill the needs of a target
processor and assembly code. All the information the assembler needs is located in an
Assembler XML Language (AXL) compliant file. The details of AXL are provided in
section 9.9.2. The AXL file contains information on the instructions the assembler
recognizes and the code it will output. Internally each assembler instruction is stored in
an object that is placed in a hash table for quick lookup.
The second phase of the assembler applies the rules stored in the AXL file. In
this phase the assembler parses the user created assembly file. The user assembly
language file is read one line at a time. An outline of the parsing procedure for each
line is shown below:
1. Line is read in
2. Comments, denoted by a #, are removed
3. Labels, denoted by a :, are stored in a symbol table and are removed
4. The assembly instruction is looked up in the hash table generated when
reading the AXL file
5. Parameter 1 is evaluated
8. Finished machine code is generated
Table 9.8.1 summarizes the following example of how the parsing procedure
may be applied to a sample assembly instruction. Line one, in the table, indicates the
assembler has read in the following line:
start: addi $1, $2, 15 # reg1 = reg2 + #15.
This assembly code is based on the MIPS instruction set. The line contains a
label, “start”, and a comment, “reg1 = reg2 + # 15”. The command used is add
immediate, “addi”, which adds a register and an immediate value and stores the result
in another register. In this particular case, the parameters to the addition are register 2,
denoted by $2, and the number 15. The destination is register 1, denoted by $1.
Line two of the table applies the parsing procedure to remove the comment.
Thus, in the assembly code column of line two, the comment “reg1 = reg2 + #15”, is
removed.
The parsing procedure next stores the label in a symbol table for future
reference. This is shown in line three of the table. The assembly code that is displayed
in line three of the table is what remains after both the comments and the label have
been processed.
The next step of the parsing procedure processes the assembly instruction,
“addi”. “addi” is looked up in the AXL file and the code associated with the instruction is
retrieved. In this example the code associated with “addi” is
Page 68
“00100000000000000000000000000000”. Line four of the table shows the remaining
assembly code to be processed at this stage, in the assembly code column, and the
code that has been generated so far, in the action column.
Lines five through seven of the table process each the parameters of the
assembly instruction. Each parameter has code generated for it. This code is inserted
into the code already retrieved for the instruction. The table shows where the new code
for each parameter is inserted by boldfacing the type in the Action column. The AXL
file contains information regarding the position and value of the code that is generated.
The final step of the parsing procedure is to output the generated code.
This sample assumes that the AXL file contained an entry for addi, $1 and $2
and code associated with these entries.
Step Assembly code being processed Action
1 start: addi $1, $2, 15 # reg1 = reg2 + #15
2 start: addi $1, $2, 15 Removed comment
3 addi $1, $2, 15 Add ‘start’ to symbol table
4 $1, $2, 15 00100000000000000000000000000000
5 $2, 15 00100000000000010000000000000000
6 15 00100000010000010000000000000000
7 00100000010000010000000000001111
8 Output:
00100000010000010000000000001111
Table 9.8.1 Application of parsing procedure on a sample assembly instruction.
Internally the assembler must deal with many different circumstances that lead to
errors. At this point the assembler has to stop generating code for the current line,
report an error and reset its state to begin the parsing of the next line. To best
accomplish this, the assembler throws a custom exception each time an error is
encountered. A class called AssemblerException that extends java.lang.Exception was
written to represent the exception. This allows all errors to be reported in a uniform
manner because they are all generated from the same class of objects. It also enables
the assembler to quickly reset its state for the next instruction.
9.8.3 Testing
Testing of the assembler has proved to be difficult because of the number of
combinations of inputs. Currently test cases have been developed to test for all errors
the assembler reports. Tests have also been performed to account for erroneous data
read by the assembler. Interface testing between the Assembler and the HGP was
eased because interfaces between the sections of the project were decided early on.

The assembler is currently complete. The design and implementation details for
the assembler are complete. Testing of the assembler has produced a stable product.
Page 69
9.9 Interfaces
One of the most important parts of any project is to clearly define interfaces
between sections early on. This way sections are written against interfaces without
worrying about the underlying code. This also allows test cases to be written before the
sections are complete. Interfaces can be thought of as a contract that the different
sections must fulfill. Each component in our design interfaces with the next component
using ASCII text files. In order to write clear interfaces we chose to develop XML
languages that these files must conform to.
9.9.1 User Information XML Language

The User Information XML Language (UIXL) is a standard language defined for
this project. It facilitates the transfer of information between the GUI and the HGP. An
example of the layout of the UIXL language is shown in Figure 9.9.1.
9.9.1.1 Design
The design of UIXL was intended to be human readable. UIXL conveys
information about the parameters the user selected in the GUI. When looking at a UIXL
document one can see what the user has selected. The attribute tags allow information
to be conveyed using name value pairs. While the command tags specify what
commands the processor should be able to process.
The structure of UIXL is specific but the content is not. This was done to allow
the future additions of attribute and command tags. The program that reads this file will
look to see if an attribute or command is present and act accordingly. That is if a
required attribute or command is not set then the program can set a default value or
raise an error.
9.9.1.2 Specification
A tree diagram of the UIXL specification is shown in Figure 9.9.2. In order for a
file to conform to UIXL it must contain a CPU tag as the parent of all other tags. The
CPU tag may optionally have the following children tags: the Attributes tag and the
Commands tag.
The Attributes tag may have zero or more Attribute tags contained inside it.
Each Attribute tag must contain a Name tag and a Value tag. For example an Attribute
tag has a Name tag that contains NumberOfRegisters and a Value tag, which contains
2.
The Commands tag may have zero or more Command tags contained inside it.
Each Command tag must contain a text value. For example in a Command tag
contains add, and the other Command tag contains sub.

<CPU>
<Attributes>
<Attribute>
<Name>
NumberOfRegisters
</Name>
Page 70
<Value>
2
</Value>
</Attribute>
<Attribute>
<Name>
BusWidth
</Name>
<Value>
2
</Value>
</Attribute>
.
.
.
</Attributes>
<Commands>
<Command>
add
</Command>
<Command>
sub
</Command>
.
.
.
</Commands>
</CPU>
Figure 9.9.1: Layout of the User Information XML Language (UIXL) used to transfer information from
the graphical user interface to the hardware generator. Note that ellipses are used in the diagram to
indicate that a parent tag may include more than one child tag. Specifically the Attributes tag may
include many Attribute tags and Commands tags may include many Command tags.
Figure 9.9.2: Tree diagram of the UIXL language
Page 71
9.9.2 Assembler XML Language
The Assembler XML Language (AXL) is used to pass information from the HGP
to the assembler. AXL is defined for this project to give the assembler all the
information it needs to convert assembly code to machine code for a specific processor.
Since the assembler is written in a very generic manner the AXL language must be rich
enough to allow for very precise specification of how the assembler should behave. An
example of the layout of AXL is shown in Figure 9.9.3.
9.9.2.1 Design
AXL can be interpreted in three parts: assembly instructions, instruction types,
and parameters. These tree parts are represented by Instruction tags, TypDef tags,
and Parameter tags respectively.
The Instruction tags define a set of instructions the assembler is aware of. Each
Instruction tag contains the name of the instruction, the binary code associated with it,
and the type of instruction. The value in the type of instruction field must refer to an
instruction type defined by a TypeDef tag.
The second part of AXL is the user defined instruction types represented by the
set of TypeDef tags. Many instructions may be associated to one type. This feature of
AXL reduces data repetition and the overall length considerably. For instance, the
assembly language recognized by the MIPS processor will only require four TypeDef
tags. If the writers of an AXL file need a new instruction type for each instruction this
can also be accomplished. The instruction type contains information on the number of
parameters, the parameter widths, the parameter types, and the location of each of the
parameter’s code inside the instruction code (by using an offset).
Finally the Parameter tags contain information about each of the processor
named parameters accessible to the assembly code. For each parameter there must
be a name and code.
Like UIXL, the structure of AXL is specific but the content is not. This was done
in order to allow the assembler to be as flexible as possible. This also allows for future
additions and changes to the instructions, their types and parameters. Programs that
parse files that conform to the AXL language should query if an element exists and take
appropriate action if it does not.
9.9.2.2 Specification
A tree diagram of the AXL specification is shown in Figure 9.9.4. AXL requires
that the top-level tag be the Language tag. Under the Language tag there can be at
most one of each of the following tags: TypeDefs, Parameters, and Instructions.
The TypeDefs tag can have zero or more TypeDef tags as children. Each
TypeDef tag requires a Name tag and can have at most one of each of the following
tags: Param1, Param2, and Param3. Each of the Param1, Param2, and Param3 tags
Page 72
must contain tags RightShift, Width, and Type. Each of the RightShift, Width, and Type
tags must contain a text literal.
The Parameters tag can have zero or more Parameter tags as children. Each
Parameter tag must contain a Name tag and a Code tag. The Name and Code tags
must contain text literals.
The Instructions tag can have zero or more Instruction tags as children. Each
Instruction tag must contain a Name tag, a Type tag, and a Code tag. The Name,
Type, and Code tags must contain text literals.

All the work on the interface between parts was completed early on in the
project. This way development on the various phases of the project could continue
independent of one another.

<Language>
<TypeDefs>
<TypeDef>
<Name>
RegType
</Name>
<Param1>
<RightShift>
6
</RightShift>
<Width>
5
</Width>
<Type>
Registor
</Type>
</Param1>
<Param2>
<RightShift>
11
</RightShift>
<Width>
5
</Width>
<Type>
Registor
</Type>
</Param2>
<Param3>
<RightShift>
16
<RightShift>
<Width>
5
</Width>
<Type>
Registor
</Type>
</Param3>
</TypeDef> ...
</TypeDefs>
<Parameters>
<Parameter>
<Name>
r1
</Name>
<Code>
00001
</Code>
</Parameter> ...
</Parameters>
Page 73
<Instructions>
<Instruction>
<Name>
add
</Name>
<Type>
RegType
</Type>
<Code>
00000000000000000000000000000000
</Code>
</Instruction> ...
</Instructions>
</Language>
Figure 9.9.3 Layout of the Assembler XML Language (AXL) used to transfer information from the
Hardware Generator to the Assembler. Note that ellipses are used in the diagram to indicate that a
parent tag may include more than one child tag. Specifically the TypeDefs, Parameters, and
Instructions tag may include many TypeDef, Parameter, and Instruction tags respectively
Page 74
Figure 9.9.4: Tree diagram of the AXL language
Page 75
9.10 Results
The objective of this project was to create a design suite that generates custom
processors so that the most efficient processor could be created for a specific task. To
measure if the objective was met the speed and space utilization of different custom
processors would be measured. However, due to a lack of computing power these
measurements could not be performed in full. The lack of computing power included a
lack of physical memory on the ugsparc machines, a lack of disk space on a partition
on the eecg network, and the lack of large devices included in the student license for
Max+plus2 on Windows machines.
With these computing restrictions in place, only subsections of the whole
processor could be compiled and analyzed. Please see Figures 9.10.1 through 9.10.3
for the results.
As can be seen on the plots, the size and speed of the com ponents change with
different parameters. The registerfile’s size increases drastically when either t he
buswidth or the number of registers is increased. Therefore it is profitable to define a
processor with the exact specifications needed to save space.
Furthermore, the size and speed of the ALU also increase with increased
functionality or buswidth, again showing that a processor exactly suited to a task is the
best option, and a general processor with all the options is not the correct tool to use.
Thus, the results point to the observation that the tool does create customized
processors that are better suited to the task than a complete general processor by
allowing them to be faster and take up less space.
Page 76
Size of Customizable ALU
2000
Logic Cells Synthesized 1500
1000
500
0
8 16 32
BusWidth
Basic No Multiplier/Division No Shifting Everything
Figure 9.10.1: Size of Customizable ALU
Speed of Customizable ALU

160
140
Tcritical (ns)
120
100
80
60
8 16 32
BusWidth
Basic No Multiplier/Division No Shift Everything
Figure 9.10.2: Speed of Customizable ALU
Page 77
Size of the Customizable RegisterFile
800
Logic Cells Synthesized 700
600
8 Bits Wide
500
16 Bits Wide
400
32 Bits Wide
300
200
100
8 16 32
Number of Registers
Figure 9.10.3: Size of Customizable RegisterFile
Page 78
10 Conclusions
The project included three main objectives as indicated in Section 8 and
repeated below. Each of the objectives and their state will be individually discussed
below.
Design a set of hardware components to be used as a basis of a
user-defined processor
Create a set of easy to use, portable, and flexible software that allows a user
to create a processor without knowing a Hardware Description Language
Analyze the performance and benefits of using our approach
The first objective was successfully met. Throughout the design of the hardware
description language for the processor it became clear that certain coding practices
made it very easy to create customizable and parameterizable hardware components.
These practices included the use of Generic Maps and multiplexor coding standards.
When the HDL was used in conjunction with the script processor to create scripts to
make customizable processors, the true flexibility and ease of the coding standard was
discovered. Thus, not only was the objective met, but a standard for creating new
hardware modules that could easily be integrated into future software suites was also
found.
The second objective, which is the most important objective, was also met
successfully. The software suite allows a user to create a hardware description of a
processor and a custom assembler for that processor in a matter of minutes. The GUI,
with its help features, allows the user to start using the program without a steep learning
curve since there are no command line options. The HGP does not need any user
interaction and thus is very easy to use. Finally the assembler produces machine
readable code for correct source code, and outputs meaningful errors if there are
source code errors.
In addition to its ease of use, the software suite created for the project has many
other desirable characteristics including portability, loose coupling, flexibility, and,
modularity. The whole design suite was designed in Java and Java Swing and
therefore can be used on most computer platforms including Windows, UNIX, Linux,
and Apple systems. The portability of our software suite does not limit its use, which is
an important ease of use factor.
The XML files used between different components in the software suite allow the
software to be loosely coupled. By using the XML as a defined interface between the
components, each component could be designed independently of all other
components. For example, if another implementation of the HGP was produced, then
the old HGP could be removed and the new one used without affecting the GUI or the
assembler since the XML allows for a clean division between the components.
Finally, the software suite is both flexible and modular. Due to the inherent
nature of XML to be extensible (as the name implies) and the design of each software
component, new features or instructions can be added to the software suite with very
little change. As an example, take the scenario where a new instruction needs to be
included in the software suite, then only the following actions need to be taken:
Page 79
1. XML input into GUI must be changed to include the instruction, instruction
format, and tool tips. (GUI code is unchanged)
2. New Java classes for the instruction and any new resources it uses has to be
created in the HGP, but because of the hierarchy in the Instruction and
Resource classes, each one of these classes will be quite small.
Notice that the assembler is completely unchanged. Of course the addition of a
new instruction will need more work in terms of creating a customizable and
parameterizable hardware description in all of the components used for the new
instruction, but the changes in the software suite are minimal thus allowing the software
suite to be able to be upgradeable.
Thus the creation of the software suite not only meets our objectives, but
provides a framework for an extension of its use in the future.
The third and final objective was not completely met. As indicated in previous
sections, due to resource limitations, the compiled processor could not be analyzed in
full. However, components of the custom processor were analyzed and they do point to
the benefits to having a custom processor suited to a particular task. In particular, the
processor will be able to operate faster, and take up less area, allowing for other logic
to be placed on the same programmable chip.
Thus, the motivation for the project, which was to create a tool that could easily
and quickly develop processors suited to a task so that it would perform those
operations most efficiently, has been met. The tools created and described above are
easy to use, need no hardware knowledge, and create processors that are faster and
more space efficient compared to a processor that includes the full functionality.
10.1 Future Work

Due to the flexibility and modularity of the software suite described above, much
future work could be incorporated into our design which can include the following:
1. Increasing the Instruction Set
2. Making the processor pipelined
3. Adding off chip memory
The first work to be completed in the future, however, before the addition of new
features to the processor, is to obtain the resources to completely compile the hardware
description of the processor and run a program on it to prove that our design does in
fact work.
Page 80
Appendix 1: Timeline from Technical Proposal
Page 81
Appendix 2: Timeline from Interim Reports
Page 82
Appendix 3: Final Timeline
Page 83
Appendix 4: Acronyms
ALU: Arithmetic and Logic Unit

ASIC: Application Specific Integrated Circuits
CISC: Complex Instruction Set Computing
CPLD: Complex Programmable Logic Device
FPGA: Field Programmable Gate Arrays
FSM: Finite State Machine
GUI: Graphical User Interface
HDL: Hardware Description Language
HGP: Hardware Generation Program
RISC: Reduced Instruction Set Computing
VHDL: Very High Speed Integrated Circuit Hardware Description Language
Page 84
Appendix 5: Instruction Set
hello
Category Name Instruction Parameters Result

Arithmetic Add add rd rs rt rd = rs + rt
Add Immediate addi rd rs imm rd = rs + imm
Subtract sub rd rs rt rd = rs - rt
Subtract Immediate subi rd rs imm rd = rs - rt
Divide div rs rt LO = rs / rt; HI = rs % rt
Multiply mult rd rt LO = rd * rt; HI = rd * rt
Logical And and rd rs rt rd = rs AND rt
And Immediate andi rd rs imm rd = rs AND imm
Nand nand rd rs rt rd = rs AND rt
Nand Immediate nandi rd rs imm rd = rs AND imm
Or or rd rs rt rd = rs OR rt
Or Immediate ori rd rs imm rd = rs OR imm
Not not rd rs rd = NOT rs
Nor nor rd rs rt rd = rs NOR rt
Nor Immediate nori rd rs imm rd = rs NOR imm
Xor xor rd rs rt rd = rs XOR rt
Xor Immediate xori rd rs imm rd = rs XOR imm
Xnor xnor rd rs rt rd = rs XNOR rt
Xnor Immediate xnori rd rs imm rd = rs XNOR imm
Shift Shift Left Logical sll rd rt shift rd = rt << shift
Shift Left Logical Variable sllv rd rs rt rd = rs << rt
Shift Right Logical srl rd rt shift rd = rt >> shift
Shift Right Logical Variable srlv rd rs rt rd = rs >> rt
Shift Left Arithmetic sla rd rt shift rd = rt << shift
Shift Left Arithmetic Variable slav rd rs rt rd = rs << rt
Shift Right Arithmetic sra rd rt shift rd = rt >> shift
Shift Right Arithmetic Variable srav rd rs rt rd = rs >> rt
Rotate Left rol rd rs shift rd = rs (rotated by) shift
Rotate Right ror rd rs shift rd = rs (rotated by) shift
Rotate Left Variable rolv rd rs rt rd = rs (rotated by) rt
Rotate Right Variable rorv rd rs rt rd = rs (rotated by) rt
Load Load Upper Immediate lui rt imm rt = imm << busw idth /2
Load Immediate li rt imm rt = imm
Load Word lw rt addr rt = memory [addr] busw idthbit
Store Store Word sw rt addr memory [addr] = rt busw idth bit
Move Move From HI Register mfhi rd rd = HI
Move From LO Register mflo rd rd = LO
Comparison Set on Less Than slt rd rs rt rd = (rs < rt)
Set on Less Than Immediate slti rd rs imm rd = (rs < imm)
Set on Equal seq rd rd rt rd = (rs == rt)
Set on Equal Immediate seqi rd rs imm rd = (rs == imm)
Set on Not Equal sne rd rs rt rd = (rs != rt)
Set on Not Equal Immediate snei rd rs imm rd = (rs != imm)
Set on Greater Than sgt rd rd rt rd = (rs > rt)
Set on Greater Than Immediate sgti rd rs imm rd = (rs > imm)
Set on Greater Than Equal sge rd rs rt rd = (rs >= rt)
Set on Greater Than Equal Immediate sgei rd rs imm rd = (rs >= imm)
Set on Less Than Equal sle rd rd rt rd = (rs <= rt)
Set on Less Than Equal Immediate slei rd rs imm rd = (rs <= imm)
Branch Unconditional Branch b label branch (label)
Branch on Equal beq rs rt label if (rs == rt) branch (label)
Branch on Not Equal bne rs rt label if (rs != rt) branch (label)
Branch on Greater Than Equal bge rs rt label if (rs >= rt) branch (label)
Branch on Greater Than bgt rs rt label if (rs > rt) branch (label)
Branch on Less Than Equal ble rs rt label if (rs <= rt) branch (label)
Branch on Less Than blt rs rt label if (rs < rt) branch (label)
Unconditional Branch and Link bal label branch (label)
Branch and Link on Equal beqal rs rt label if (rs == rt) branch (label)
Branch and Link on Not Equal bneal rs ` label if (rs != rt) branch (label)
Branch and Link on Greater Than Equal bgeal rs rt label if (rs >= rt) branch (label)
Branch and Link on Greater Than bgtal rs rt label if (rs > rt) branch (label)
Branch and Link on Less Than Equal bleal rs rt label if (rs <= rt) branch (label)
Jump Jump j target jump (target)
Jump and Link jal target jump (target)
Jump Register jr rs jump (rs)
Page 85
Appendix 6: Test Cases
A6.1 Registerfile Simulation
The simulation in Figure A.6.1

outlines a case where there are eight
thirty-two bit registers. The simulation
starts by placing values in all the registers
in first eight clock cycles. Notice that the
write signal is also high for the first 8 clock
cycles except for the sixth clock cycle.
The next four clock cycles proceed
to extract the information that was stored
previously in the registers. Notice that
when register five is being retrieved, a
value of zero is returned indicating that, in
fact, the value which was supposed to be
stored in register five was not stored
because the write signal was low.
Figure A.6.1: Registerfile Simulation
Page 86
A6.2 ALU Simulation
Figure A.6.2: ALU Add and Subtract Simulation
Figure A.6.2 outlines a test simulation where both positive and negative
numbers are added and subtracted. Results are correct for all four cases.
Figure A.6.3: ALU Logical Operations Simulation
Figure A.6.3 outlines a test scenario where all the logical operations are
applied to the same inputs. All results are correct.
Figure A.6.4: ALU Multiply and Divide Simulation
Figure A.6.5 outlines a test scenario where 13 is divided by 5, and then the
remainder and quotient are subsequently extracted with the “Move from Lo” and “Move
Page 87
from Hi” instructions respectively. Then, 2147483635 is multiplied by 16 and the result
is again extracted with “Move from Lo” and “Move from Hi” instructions.
Figure A.6.5 outlines a test scenario where the numbers 5,6,7 are compared with
the number 6 with the instructions “set on less than” through “set on not equal.” All test
cases provide the correct output.
Figure A.6.6 outlines a test scenario where the hex number 8FFFFFF8 is shifted
and rotated in all directions by varying amounts. All test cases provide the correct
output.
Page 88
Figure A.6.5: ALU Comparison Simulation Figure A.6.6: ALU Shifting Simulation
Page 89
A6.3 PS/2 Mouse Port Simulation
Figure A 6.7: PS/2 Mouse Port Simulation
Page 90
Figure A.6.7 shows the reaction of the mouse interface to a stream of bytes from
the mouse. The bytes are read in serially and then output to the registers byte0, byte1,
and byte2. The interface also shows how byte0 changes when it is read from by the
processor.
A6.4 Generic PS/2 Port Simulation
Figure A 6.8: PS/2 Port Simulation
Figure A.6.8 shows how the processor shows two cases of accessing a generic
PS/2 port. First, the processor write the value 6C to the data port. The processor then
reads in a value, 96, from the data port.
A6.5 Memory Mapped Bus Simulation

The VGA interface is located at addresses 0x7FF through 0x000, the mouse is
located at addresses 0x801 through 0x803, and a generic PS/2 port is located at
address 0x804. The test in Figure A. 6.9 shows the processor reading in a value from
the first address of the mouse interface twice through the use of the I/O bus. The test
shows that the I/O bus properly identifies the device that the processor wants to access.
The test also shows that once the mouse interface detects that the processor has read
its data, the interface will change the value of a bit in its first register.
Page 91
Figure A 6.9: Memory Mapped Bus Simulation
Page 92
Appendix 7: Sample Source Code
Due to the large amount of code in the project, only a selection of code from
each part of the project is shown here.
A7.1 GUI: Java Code

public void display()
{
// Set Look and Feel of this application to be the same across all platforms
try
{
UIManager.setLookAndFeel(
UIManager.getCrossPlatformLookAndFeelClassName());
}
catch (Exception e)
{
System.out.println("Error 456yf: Unable to Set look and Feel of window");
}
//Create the top-level container and add contents to it.
frame = new JFrame("Parameter Selection");
Component contents = this.createComponents();
frame.getContentPane().add(contents, BorderLayout.CENTER);
// Create the Menu Bar

JMenuBar menuBar = new JMenuBar();
JMenu menu = new JMenu("Help");
menu.setMnemonic(KeyEvent.VK_H);
menu.addMenuListener(new MenuListener()
{
public void menuSelected(MenuEvent e)
{
// This function handles events caused when a menu item is selected
// Display Help Menu
JFrame fHelp = new JFrame("Top Level Help Menu");
javax.swing.text.StyledDocument doc = new DefaultStyledDocument();

JTextPane tp = new JTextPane(doc);
tp.setSize(100,100);
tp.setEditable(false);
.
.
.
}
public void menuDeselected(MenuEvent e)
{
// This function is blank because we don't want to handle a menu deselected event
}
public void menuCanceled(MenuEvent e)
{
// This function is blank because we don't want to handle a menu canceled event
}
});
menuBar.add(menu);
frame.setJMenuBar(menuBar);
//Finish setting up the frame, and show it.

frame.addWindowListener(new WindowAdapter()
{
public void windowClosing(WindowEvent e)
{
System.exit(0);
}
});
// Set Frame to the size of its components

frame.pack();
// Show the window
frame.setVisible(true);
}
Page 93
A7.2 XML Input/Output: Java Code
Sample section from the Java object that writes the Assembler XML file:
/* Write all of the instructions */
for (i = 0; i < instructions.size(); i++) {
// Retrieve the data for each instruction

inst=(Instruction)instructions.elementAt(i);
// Write part of the instruction information to the XML file

xmlfile.write("\t<Instruction>\n>");
xmlfile.write("\t\t<Name>\n");
xmlfile.write("\t\t\t"+inst.getName()+"\n");
xmlfile.write("\t\t</Name>\n");
xmlfile.write("\t\t<Type>\n");
xmlfile.write("\t\t\t");
// Keep track of the types of instructions that are implemented

if (inst instanceof Immediate) {
isImmediate=true;
xmlfile.write("Immediate");
}
else if (inst instanceof RType) {
isRType=true;
xmlfile.write("RegType");
}
Sample section from the Java object that reads the User Interface XML file:
// Traverse over the node list containing Instructions
for (int i = 0; i < NumberOfInstructions; i++) {
// retrieve data for each instruction

nInstruction = nlInstructions.item(i);
// add the appropriate instruction to the list

if (nInstruction.getNodeName() == "add")
instructions.add(new add());
else if (nInstruction.getNodeName() == "addi")
instructions.add(new addi());
else if (nInstruction.getNodeName() == "and")
instructions.add(new and());
Page 94
A7.3 Instruction Set Based Component Selection: Java Code
public abstract class Instruction {
protected String name ="";

protected int opcode;
static protected Vector Resource = new Vector();
static protected Vector Signals = new Vector();
public Instruction() {
}
public String getName() {

return(name);
}
public int getOpcode() {

return(opcode);
}
public static Vector getResources() {

return(Resource);
}
public static Vector getSignals() {

return(Signals);
}
public static boolean resourceAlreadyThere(String name) {

for (int i = 0; i < Resource.size(); i++) {
Resources r = (Resources)Resource.elementAt(i);
if (r.getName().equals(name))
return(true);
}
return(false);
}
}
//--------------------------------------------------
public class PCInSignal extends Signal {
protected static int numberOfinner = 0;

protected static Vector Names = new Vector();
public PCInSignal() {
super();
numberOfinner++;
}
public static int getNumberOf() {

return(numberOfinner);
}
public static int getTotalNumberOf() {

return(Signal.getNumberOf());
}
public static Vector getNames() {

return(PCInSignal.Names);
}
}
Page 95
A7.4 Script Processing: Java Code
/* This method is called to initiate script processing. Once it is completed a
script has been fully processed. */
void Run() {
String a;
int result;
while ((a=input.readLine())!=null) {
/* Process input line and determine if it is within valid block. If not then
ommit it. */
result = ProcessString(a);
switch(result) {
case ENTERED_SEARCH_BLOCK: within_search_block = true;
break;
case LEFT_SEARCH_BLOCK: within_search_block = false;
return;
case RETURN_OK: break;
Default: System.out.println("Error processing code");
System.exit(1); /* On error */
break;
}
}
if (close_destination == true) {
try {
output.close();
}
catch (Exception e) {}
}
input.DoneParsing();
}
/* This function processes a string and executes proper commands. */

int ProcessString(String s) {
StringTokenizer token = new StringTokenizer(s);
String a;
/* Process the input line based on command vs. code line analysis.
A command will be responded to, where as a VHDL code line will be parsed
for expression to evaluate and copied to target file.*/
if (s.length()>0) {
switch (ProcessCommand(token)) {
case READ_COMMAND: break;
case RETURN_OK: if (SkipToNextEnd==true) break;
if ((within_search_block ||(!specific_search)))
{
if (ExecuteExpressionEvaluation)
WriteToFile(Expression.EvaluateParameters(s,defines));
else
WriteToFile(s);
}
break;
case RETURN_ERROR: System.out.println("Error: Invalid command
encountered\n");
System.exit(1);
break;
default: break;
}
}
else {
a = "";
if( (within_search_block || (!specific_search)) )
WriteToFile(a);
}
return RETURN_OK;
}
Page 96
A7.5 Assembler: Java Code
// Now that all files that we need are opened
// Start parsing the input and generating the output
//
try
{
StringTokenizer st;
String AssemblerLabel;
int nTokenCount;
int nLineNumber = 0;
String sInstruction, sParam1, sParam2, sParam3;
StringBuffer sbOutput;
Instruction inst;
TypeDef tpdef;
Param param1, param2, param3;
for (String line = in.readLine(); line != null; line = in.readLine())

{
nLineNumber++;
try
{
//
// Remove comments indicated by the # character
//
st = new StringTokenizer(line, "#");
nTokenCount = st.countTokens();
if ((st.hasMoreTokens()) && (line.indexOf("#") != 0))
{
line = st.nextToken();
}
else
{
line = "";
}
//
// Check for labels
//
st = new StringTokenizer(line, ":");
nTokenCount = st.countTokens();
if ((nTokenCount == 1) && (line.indexOf(":") != -1))
{
// label was found but it is either blank or not associated
//with a line
throw new AssemblerException ("label is either blank or not associated
with a instruction", nLineNumber);
}
else if (nTokenCount == 2)
{
// label was found
AssemblerLabel = st.nextToken();
line = st.nextToken();
}
else if (nTokenCount > 2)
{
// found more than one colon on a line
// ACTION: report an error and skip this line
throw new AssemblerException ("statment cannot have more than one ':'
per line", nLineNumber);
}
Page 97
A7.6 Datapath: VHDL Code
--Instruction Register-------------------------------------
ir0 : myregister GENERIC MAP (BUSWIDTH => BUSWIDTH)

PORT MAP (memout,clk,irwrite,irout);
--Mux into register file write reg--------------------------
mux2g: FOR i IN 0 TO REGFIELDSIZE-1 GENERATE

mux2input(0,i) <= irout(BUSWIDTH-OPCODEFIELDSIZE-2*REGFIELDSIZE+i);
mux2input(1,i) <= irout(BUSWIDTH-OPCODEFIELDSIZE-3*REGFIELDSIZE+i);
END GENERATE;
mux2: mux2to1 GENERIC MAP (MUX_WIDTH => REGFIELDSIZE)

PORT MAP (mux2input,regdst,rfwritereg);
--Mux into register file data ------------------------------
mux3g: FOR i IN 0 TO BUSWIDTH-1 GENERATE

mux3input(0,i) <= aluout(i);
mux3input(1,i) <= mdrout(i);
END GENERATE;
mux3: mux2to1 GENERIC MAP (MUX_WIDTH => BUSWIDTH)

PORT MAP (mux3input,memtoreg,rfwritedata);
--Register File--------------------------------------------
rf0: registerfile GENERIC MAP (BUSWIDTH => BUSWIDTH, NUMREG => NUMREG,
LOG2NUMREG => LOG2NUMREG)
PORT MAP (rfwrite,
clk,
irout(BUSWIDTH-OPCODEFIELDSIZE-1 DOWNTO
BUSWIDTH-OPCODEFIELDSIZE-REGFIELDSIZE),
irout(BUSWIDTH-OPCODEFIELDSIZE-REGFIELDSIZE-1 DOWNTO
BUSWIDTH-OPCODEFIELDSIZE-2*REGFIELDSIZE),
rfwritereg,
rfwritedata,
rfreaddata1,
Rfreaddata2);
--Mux into input A of ALU--------------------------------
mux4g: FOR i IN 0 TO BUSWIDTH-1 GENERATE

mux4input(0,i) <= pcout(i);
mux4input(1,i) <= regaout(i);
END GENERATE;

PORT MAP (mux4input,alusrca(0),aluinputa);
--Mux into input B of ALU--------------------------------
mux5ga: FOR i IN 0 TO BUSWIDTH-1 GENERATE

mux5input(0,i) <= regbout(i);
mux5input(1,i) <= '1' WHEN i = WORDTOBYTEOFFSET ELSE '0';
mux5input(2,i) <= signout(i);
mux5input(3,i) <= shiftout1(i);
END GENERATE;

PORT MAP (mux5input,alusrcb,aluinputb);
Page 98
A7.7 Control: VHDL Code
BEGIN
-- Obtain instruction opcode
opcode <= IR_Data(BUSSIZE-1 downto BUSSIZE-OPCODESIZE);
-- Process stage changes on every positive clock edge

process(clk)
begin
if (clk'EVENT and clk = 1) then
case y IS
WHEN IF1 =>
-- The fetch stage. It has been split into three
-- separate state to make it work
memaddr_select <= '0';
option <= "00";
mem_go <= '1';
IR_Enable <= '1';
W_Enable <= '0';
y<=IF2;
WHEN IF2 =>
if (mem_done='1') then
y<=IF3;
IR_Enable <= '0';
mem_go <= '0';
end if;
WHEN IF3 =>
ALU_A_Select <= '0';
ALU_B_Select <= "01";
NewPC_Select <= "10";
PC_Enable <= '1';
WREG_Select <= '1';
y <= ID1;
WHEN ID1 =>
-- Decode stage
PC_Enable <= '0';
WDATA_Select <= '1';
case opcode IS
-- Arithmetic
WHEN "0000001" => y <= ADD1;
WHEN "0000010" => y <= ADD1; -- Immediate operand
WREG_Select <='0';
...
...
...
WHEN "0111111" => y <= JRL1;
WHEN OTHERS => y <= Halt;
end case;
-- Execute Stage
-- Load
WHEN LOAD1 => y <= LOAD1;
ALU_Operation <= "00000";
memaddr_select <= '1';
if (mem_done <= '1') then
mem_go <= '0';
WDATA_Select <= '0';
y <= WB;
Page 99
A7.8 Cache Controller: VHDL Code
BEGIN
-- Setup registers to hold tags for each bin
REGA : myreg
generic map (WIDTH => BINTAGWIDTH)
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_a,
clock, reset, en_a, a);
REGB : myreg
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_b,
clock, reset, en_b, b);
REGC : myreg
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_c,
clock, reset, en_c, c);
REGD : myreg
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_d,
clock, reset, en_d, d);
REGE : myreg
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_e,
clock, reset, en_e, e);
REGF : myreg
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_f,
clock, reset, en_f, f);
REGG : myreg
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_g,
clock, reset, en_g, g);
REGH : myreg
port map ( newaddr(ADDRESSWIDTH-1 downto 8), cl_h,
clock, reset, en_h, h);
-- Connect a controller to manage tag searches and tag updates
CONTROL : regcontrol
GENERIC map( WIDTH => BINTAGWIDTH,
ADDRESSWIDTH => ADDRESSWIDTH )
PORT map( Address, a,b,c,d,e,f,g,h,enable,
en_a,en_b,en_c,en_d,en_e,en_f,en_g,en_h,
cl_a,cl_b,cl_c,cl_d,cl_e,cl_f,cl_g,cl_h,
clock, Replace, CellSelect, newaddr);
-- Pass to memory the new address, knowing the cache bin data is
-- located in.
NewAddress <= newaddr;
END Behaviour;
Page 100
A7.9 I/O: VHDL Code
The main process that reads from and writes to the PS/2 port
PROCESS(clk,reset)
BEGIN
IF reset='1' THEN
state<=waiting;
ELSIF clk'EVENT AND clk='0' THEN
CASE state IS
WHEN waiting =>
temp_data <= d_in;
count <= "0000";
IF (rw='0') THEN
state <= writeData;
ELSE
state <= readData;
END IF;
WHEN readData => -- read in the data bit by bit and increment counter
ShiftRead: FOR i IN 7 DOWNTO 1
LOOP
temp_data(i) <= temp_data(i-1);
END LOOP;
temp_data(0)<=data;
count <= count + 1;
IF count="0111" THEN
state <= readParity;
ELSE
state <= readData;
END IF;
WHEN readParity =>
state <= readStop;
WHEN readStop =>
state <= waiting;
WHEN writeData => -- destroy bits
ShiftWrite: FOR i IN 7 DOWNTO 1
LOOP
temp_data(i) <= temp_data(i-1);
END LOOP;
count <= count + 1;
IF count="0111" THEN
state <= writeParity;
ELSE
state <= writeData;
END IF;
WHEN writeParity =>
state <= writeStop;
WHEN writeStop =>
state <= writeAck;
WHEN writeAck =>
state <= done;
WHEN done =>
state <= done;
END CASE;
ELSE
state <= state;
END IF;
END PROCESS;
Page 101
Bibliography
[1] Xilinx Inc., “The Future of FPGAs”, [Online document], 1999 Apr 1, Available HTTP:
http://www.xilinx.com/prs_rls/5yrwhite.htm
[2] B. Kastrup, A. Bink and J. Hoogerbrugge, “ConCISe: A Compiler-Drived CPLD-Based Instruction
Set Accelerator,” in Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable
Custom Computing Machines, K. Pocek and J. Arnold. Los Alamitos, California: IEEE Computer
Society Press, 1999, pp. 92-102.
[3] A. Chien, “Safe and Protected Execution in the Morph/AMRM Reconfigurable Processor,” in
Proceedings of the Seventh Annual IEEE Symposium on Field-Programmable Custom Computing
Machines, K. Pocek and J. Arnold. Los Alamitos, California: IEEE Computer Society Press, 1999,
pp. 209-221.
[4] J. Hauser and J. Wawrzynek,”Garp: A MIPS Processor with a Reconfigurable Coprocessor,” in
Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, K. Pocek and J.
Arnold. Los Alamitos, California: IEEE Computer Society Press, 1997, pp. 12-21.
[5] D. Robinson, and Patrick Lysaght, “Modelling and Synthesis of Configuration Controllers for
Dynamically Reconfigurable Logic Systems Using the DCS CAD Framework,” in
Field-Programmable Logic and Applications, P. Lysaght, J. Irvine, and R. Hartenstein. Berlin,
Germany: Springer, 1999, pp.41-50.
[6] G. McGregor, and Patrick Lysaght, “Self Controlling Dynamic Reconfiguration: A Case Study,” in
Field-Programmable Logic and Applications, P. Lysaght, J. Irvine, and R. Hartenstein. Berlin,
Germany: Springer, 1999, pp.144-154.
[7] R. Meier, “Rapid Prototyping of a RISC Architecture for Implementation in FPGAs,” in Proceedings
of the IEEE Symposium on FPGAs for Custom Computing Machines, P. Athanas and K. Pocek. Los
Alamitos, California: IEEE Computer Society Press, 1995, pp. 190-196.
[8] V. C. Hamacher, Z.G. Vranesic and S.G. Zaky, Computer Organization, Fourth Edition New York,
New York: McGraw-Hill, 1996.
[9] T. Engdahl, “PC Mouse Info”, [Online Document], 1999, Aug 13, Available HTTP:
http://www.hut.fi/Misc/Electronics/docs/pc/mouse.html
[10] A. Chapweske, “The PS/2 Mouse/Keyboard Protocal”, [Online Document], 2000 Oct 13, Available
HTTP: http://panda.cs.ndsu/nodak.edu/~achapwes/PICmicro/PS2/ps2.htm
[11] A. Chapweske, “The AT Keyboard”, [Online Document] 2000, Nov 10, Available HTTP:
http://panda.cs.ndsu.nodak.edu/~achapwes/PCImicro/keyboard/atkeyboard.htm
Page 102

Processor Design Suite

Uploaded by

Copyright:

Available Formats

Processor Design Suite

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Processor Design Suite

Uploaded by

Copyright:

Available Formats

Department of Electrical and Computer Engineering

Prepared by: Navid Azizi nazizi@ecf.utoronto.ca

Supervisor: Stephen Brown

Section Coordinator D. Beresford

Date: April 11, 2001

Task Individual Responsible For the Task

Design Project Research Navid Azizi

Development of Parameterizable Navid Azizi

Testing and Simulation of Navid Azizi

Development of Instruction, Resource,

Testing of Instruction, Resource and Navid Azizi

Testing and Numerical Analysis of Navid Azizi

Table 3.1: Contributions Made by Navid Azizi

Design Project Research Borys Bradel

Development of I/O Interface to Borys Bradel

Testing and Simulation of I/O Interface Borys Bradel

Development of Java Modules to Read Borys Bradel

Testing and Simulation of Java Borys Bradel

Integration Testing of Java Design Borys Bradel

Table 3.2: Contributions Made by Borys Bradel

Design Project Research Tomasz Czajkowski

Development of Memory/Cache and Tomasz Czajkowski

Testing and Simulation of Tomasz Czajkowski

Development of Script Processing Tomasz Czajkowski

Testing and Simulation of Script Tomasz Czajkowski

Development of Processor Control Unit Tomasz Czajkowski

Testing and Simulation of Processor Tomasz Czajkowski

Design Project Research Michael Krejcik

Development of XML Interfaces Michael Krejcik

Development of Java Modules to Read Michael Krejcik

Development of Graphical User Michael Krejcik

Testing of Graphical User Interface Michael Krejcik

Development of assembler Michael Krejcik

Testing of assembler Michael Krejcik

Table 3.4: Contributions Made by Michael Krejcik

1 Cover Page Navid Azizi

2 Executive Summary Borys Bradel

3 Team Members Contributions Navid Azizi

4 Old Milestones Borys Bradel

5 Revised Timeline Borys Bradel

6 Table of Contents Navid Azizi

7 Acknowledgments Tomasz Czajkowski

8 Introduction Borys Bradel

10 Conclusions Navid Azizi

• Our processor design completed by the first week of March

5.1 Reasons for Modification

5.2 Milestone Status

8.4 Design and Measurement Methodology

8.4.1 Design Methodology

8.4.2 Measurement Methodology

8.4.3 Report Outline

Figure 9.1.1: Design flow for design tool

9.1.1 Graphical User Interface

9.1.2 Hardware Generation Program

9.1.2.1 Parameterized HDL Code

9.2.1.2 Provide Information About the Program

Category Name Instruction Parameters Result Description

9.2.1.3 Display Parameters the User can Choose

9.2.1.5 Cross Platform Portability

Figure 9.2.2: : Initial design of the Graphical User Interface