NISC Architecture
NISC Architecture
282
party cores, and the same GNR description is used for compilation, the control ports of the components in the NiscArchitecture (see
simulation and RTL generation. Since the designer does not describe Section 2.2).
the controller in our approach, the GNR descriptions are much shorter The set of connections Lx is defined between a bit-slice of a port p1
than other ADLs. We have developed a web-based interface for our and a similarly sized bit-slice of port p2 as follows:
toolset, so that users can upload and evaluate new architectures
described in GNR. Our compiler supports various architectural features ⎛ ⎞
Lx = {( p1, p 2, s1, e1, s 2, e2) | p1, p 2 ∈ ⎜⎜ Px ∪ ( U Py ) ⎟⎟ and
such as controller/datapath pipelining, multi-cycle/pipelined units, and ⎝ y ∈C x ⎠
heterogeneous forwarding paths. The compilation algorithm and the 0 ≤ s1 ≤ e1 < β p1 , and 0 ≤ s 2 ≤ e2 < β p 2 , and e1 − s1 = e2 − s 2}
datapath optimizations have been discussed in [9] and [10],
respectively. where, s1 and s2 are the start index of p1 and p2 and e1 and e2 are the
The rest of the paper is organized as follows. Section 2 and 3 explain end index of p1 and p2.
the GNR modeling approach and its syntax. Section 4 discusses the Ax is a list of aspects required by different tools for processing
details of GNR using several examples. Section 5 presents the flow of component x. Aspects are defined based on components types.
our tools, followed by experimental results in Section 6. Section 7 Currently, in our toolset, each component has three aspects:
presents related works and Section 8 concludes the paper. compilation aspect CAx, simulation aspect MAx, and synthesis aspect
NAx. Compilation aspect usually captures the relation between the
2. GNR MODELING APPROACH component’s behavior and the C-language operations, or application
GNR models a system as a hierarchical description of components functions. Simulation and synthesis aspects usually contain the
(objects) and their connections (composition). GNR contains a set of description of the component in an HDL, or the information required
predefined components and port types. These types are used for for generating a hardwired core (e.g. memory, divider, etc.). For some
enforcing the composition rules. A typical system consists of several component types, if an aspect is not specified by the designer, the
RTL components and processing elements (PEs). The behavior of each toolset will generate it automatically. For example, the
PE is captured in C language. In GNR, the PEs are represented by simulation/synthesis aspects of hierarchical components can either be
components of type behavioralIP. A behavioralIP may contain a generated automatically from their internal components, or be explicitly
custom datapath that is captured by a component of type specified by the designer. This feature allows modeling of third party
NiscArchitecture. The NiscArchitecture contains basic RTL cores and pre-laid-out components that have special technology or
components that are used by our compiler. Figure 1 shows a simple manufacturability considerations. Aspects are also used in defining
example of a system with two PEs (BIP1, BIP2), a bus, and an arbiter. proxy components in a NiscArchitecture. A proxy component is a
BIP2 is implemented by a programmable NISC and has a control component that resides outside of the IP block but the IP controls it. For
memory (Cmem) and data memory (Dmem). In the rest of this section, example, a memory proxy represents a memory or cache hierarchy that
we present the details of the GNR objects and compositions rules. resides outside of the IP. The HDL implementation of a proxy may be
as simple as input to output wirings. However, its compiler aspect
captures the information for controlling the external component. The
NiscArchitecture and behavioralIP component types have additional
properties as follows:
NiscArchitecture: The NiscArchitecture represents our target
architecture that does not have instruction-set and its control words are
generated by the cycle-accurate compiler. The compiler aspect of a
NiscArchitecture ξ is modeled by CAξ=(freqξ, CNSTξ, Γξ, sPtξ, fPtξ).
Figure 1- A sample system in GNR. The freqξ specifies the clock frequency of the NiscArchitecture and is
used by the compiler to generate the proper control words considering
2.1 GNR formalism the component delays. A control word contains the control values of
In GNR, a component x is represented by (τx, Px, Cx, Lx, Ax), where τx is components as well as a set of constant fields CNSTξ. The constant
the component’s type, Px is the set of ports, Cx is the set of components fields are used for jump and other operations with a constant operand.
inside x, Lx is the set of its internal point-to-point connections, and Ax is Each constant field f in CNSTξ has a bit-width or size denoted by βf.
the list of aspects that describe behavior of x for different tools in the The Γξ is a function that defines the ordering of the constant and control
toolset. Component type τx is defined as follows: fields in the control word. This ordering is used by the compiler to
τx ∈T, T={register, register-file, bus, mux, tri-state buffer, functional- generate the correct control words. The sPtξ and fPtξare storage
unit, memory-proxy, controller, NiscArchitecture, behavioralIP, components used for stack pointer and frame pointer. The storage
module, system} components can be separate registers or registers in a register file.
Where, NiscArchitecture, behavioralIP, module, system, and controller BehavioralIP: behavioralIP is a component that its behavior is
are hierarchical components and contain an internal netlist, while others specified in C language, and is handled by our cycle-accurate compiler,
are basic RTL components with no internal netlist. a traditional compiler, or a high-level synthesis (HLS) tool. The
compiler aspect of the behavioralIP specifies the set of application files
Each port p in Px has a bit-width βp, and a type θp defined as follows:
(e.g. header files and C files) that execute on that IP. In our approach,
θp ∈ {clkPort, ctrlPort, inPort, outPort, cwPort} the netlist of behavioralIP contains a NiscArchitecture and, if
Type clkPort shows the port is a clock, and type ctrlPort shows the port necessary, a memory subsystem (Figure 1). The cycle-accurate
is used to control the component. For example, a register has one port compiler compiles the application C code directly on the datapath of
of each type clkPort, inPort, outPort, and ctrlPort (i.e. load enable). NiscArchitecture. The behavioralIP can cover instruction-set based
Type cwPort means the port is a control-word port and is used to drive general-purpose or custom processors as well, where the synthesis
aspect is usually a third-party core.
283
2.2 GNR Rules component. For example, the delay or bit-width of the component can
be specified as parameters.
Our formal and typed description allows us to define rules to validate
the correctness of the given netlist. Enforcing such rules significantly
improves the productivity of the designer by identifying most of the
problems without simulation. Depending on the component type, the
rules can restrict number and types of the ports, instantiated
components, and their connectivity. There are two groups of rules:
general rules, and NISC-specific rules.
General rules:
• Clock ports can only connect to clock ports:
Figure 2- Block diagram of GNR schema for NiscArchitecture.
∀(p1,p2, …)∈Lx, τp1=clkPort if and only if τp2=clkPort
• Connections in Lx are defined between source ports (i.e. outPort) 4. EXAMPLE GNR MODELS
and the destination ports (i.e. inPort). For boundary connections In this section, we discuss modeling IPs in more details using several
(i.e. the connections that involve ports in Px), the input ports of Px examples. We first explain how a simple component, namely an ALU,
must be source and its output ports must be the destination. is defined in GNR. Then, we explain how components are integrated to
• Maximum of one connection is allowed to any bit of any form a simple IP that can execute C code. Finally, we show how this IP
destination port. The only exception is for input ports of bus-type is extended for system.
components, where multiple connections are valid. In digital
design, connecting several output ports to a single input port is not
4.1 Modeling a custom ALU
valid, unless through tri-state buffers. ALU is a component of type functional-unit. Figure 3 shows the GNR
description of a custom ALU that executes three operations: Add, Sub,
∀(p1,p2,s1,e1,s2,e2), (p3,p4,s3,e3,s4,e4)∈Lx, if p2=p4, then (p2∈Px
Not. The component has two parameters: BIT_WIDTH and DELAY.
and τx=bus) or (s2>e4) or (s4>e2) The parameters are initialized during the instantiation of a component
NISC-specific rules: in a datapath. This ALU has two input ports, one output port and a
• Each NiscArchitecture ξ has one and only one component of type control port. Since this ALU executes three operations, the size of the
controller: ctrl port is at least two. The simulatable and synthesizable code of the
ALU are described in the <Simulation-aspect> and <Synthesis-aspect>
∃! x∈Cξ, where τx=controller (not shown in the figure). For some components, it is also possible to
• Only component x with τx=controller can have one and only one generate the HDL description automatically from the component entity
port of type cwPort: information and compiler aspect.
∃! p∈Px and θp=cwPort if and only if τx = controller
• Each NiscArchitecture ξ has at least one component of type
register-file:
∃ x∈Cξ, where τx=register-file
• In NiscArchitectureξ, the bit-width of the cw port of controller
component must be equal to sum of the bit-widths of all control
ports, plus the sum of the bit-widths of all control fields in CNSTξ.
∀cw ∈ Pc , if θ cw = cwPort, then β cw = ∑β
p∈CPξ
p + ∑β
f ∈CNSTξ
f
3. GNR SYNTAX
We use XML language [12] to describe IP models in GNR. We define Figure 3- Partial description of a custom ALU in GNR.
GNR syntax in XML Schema [13] to enforce syntax and semantics In <Compiler-aspect> the operations that the ALU executes are
checking on the given input model. The Schema can also be used for described in details. Each operation has a name and a delay attribute:
code completion, which further increases the productivity of the the name is selected from the list of valid C operations, and the delay
designers. Figure 2 shows the partial block diagram of the Schema for is specified in terms of number of cycles or nanoseconds, according
modeling a custom IP (NiscArchitecture). The IP has several children to the selected target technology. Each operation has a set of input
tags including: <Ports>, <Components>, <Connections>, <CwFields>, ports and at most one output port. An operation may also require a
<Compiler-aspect>, <Simulation-aspect>, and <Synthesis-aspect>, specific value on one or more control ports. The values are specified
representing Pξ, Cξ, Lξ, Γξ, CAξ, MAξ, and NAξ, respectively. All using <Ctrl> tag. Using this modeling approach, new functional units
components in GNR have a <Params> tag that parameterizes that can be described and added to the library.
284
Some functional units are more complex than others. For example, In this IP, suppose that a constant field of 10 bits is used for operations
some of them are pipelined, or may require instantiation of hardwired with a constant operand. Figure 5 shows the GNR description of the IP.
cores provided by a third party. In case of a pipelined unit, a netlist of The IP has one clock port, a reset port, and several IO ports for
the main functional unit and the pipeline registers are defined as a communicating with data memory unit. The <Netlist> tag shows the
module in GNR. Most of today’s synthesis tools apply retiming to the components and connections of the IP. For each instantiated
netlist, and generate proper pipelined functional unit. In case of component the proper parameters such as BIT_WIDTH and
hardwired cores, the information of the third party tool that must be REG_COUNT are initialized. Thirty four connections are defined for
called for core generation is specified in <Synthesis-aspect>. this IP. Each connection determines the source component src, source
port sPort, destination component dest, and destination port dPort.
4.2 Modeling a simple IP Among these connections, 19 are shown in Figure 5, and the rest are
Figure 4(a) shows the block diagram of a simple NiscArchitecture that clock and control connections.
can execute simple C codes. The architecture consists of a controller, a In <Compiler-aspect> the ordering of the control fields are specified by
register file (RF), a data memory proxy, an ALU, a comparator, and a listing the fields in tag <CwFields>. This information is used by the
few multiplexers. The bus-width of the IP is 32 bits. The register file compiler for generating the control words. In this architecture, the total
has 32 registers, and two read ports and one write port. bit-width of the control ports is 35 bits, and the constant width is 10
bits. Therefore, the bit-width of the control words is 45 bits.
285
component, then it is possible to detect and report the problem. Note System4 that includes MicroBlaze, OPB bus, bridge, DHS bus, and
that such connections are valid in HDLs but they result in an incorrect three custom IPs (One DCT and two IMDCTs).
design behavior. Using such simple checking in GNR, most
architecture problems are quickly determined.
GNR Model C code
code
C
C code
Pre-Processor
286
we changed one system to create the next, we did not need to rewrite 8. CONCLUSION AND FUTURE WORK
the whole description again. The number of modified lines of code in
each step is shown in the fifth column of the table. For example, when In this paper we presented GNR, a formal modeling approach,
generating System3 from System2, we reused the IP description and developed to improve productivity of designing systems and processing
only need to modify the system description to instantiate and connect it elements, the same way that traditional ADLs improved productivity
(40 lines). The last two columns of the table show the size of the for designing processors. GNR captures a system as a hierarchical
Verilog and other core related files that are generated automatically. netlist of components annotated by compilation, simulation and
Note that, while the GNR descriptions are only a few hundred lines of synthesis aspects. Our tools and GNR improve the productivity of
code, the generated files are several thousand lines. This shows the system design by means of using parametrizable component
productivity gain of using the GNR. descriptions, static rule checking, and automatic compilation and RTL
generation for the custom PEs.
Overall, we could perform different system level (coarse-grained) and
IP level (fine-grained) architecture explorations using relatively small Furthermore, GNR enhances the designer control over structural details
GNR descriptions. The productivity gain was due to several factors of the design and hence improves design predictability. Using relatively
including: parametrizable component descriptions, static rule checking, short GNR description, we explored several designs for implementing
and automatic compilation and RTL generation for the custom IPs. an MP3 decoder and achieved 3.25 speedup compared to MicroBlaze
Since GNR enabled us to make detailed architectural adjustments, we processor. The future work will address TLM to GNR translation.
were able to achieve significant performance improvement while
meeting the area constraints.
9. REFERENCES
[1] P. Mishra and N. Dutt, “Architecture Description Languages for
7. RELATED WORKS Programmable Embedded Systems”, IEE Proc. on Computers and
Over the past years, several ADLs and their supporting software tools Digital Techniques (CDT), Special issue on Embedded
have been introduced. A complete survey of these ADLs can be found Microelectronic Systems: Status and Trends, vol. 152, no 3, 2005.
in [1], [2]. Among these ADLs only the followings have directly or [2] W. Qin and S. Malik, “Architecture Description Languages for
indirectly addressed synthesis of the architecture. Retargetable Compilation”, in The Compiler Design Handbook:
LISA [3], a sate-of-the-art commercial product, and EXPRESSION [4] Optimizations & Machine Code Generation. Y. N. Srikant and Priti
are behavioral ADLs that capture a processor in terms of its instruction- Shankar, CRC Press, 2002.
set behavior and a high level block diagram of its pipeline. They were [3] A. Hoffmann, T. Kogel, A. Nohl, G. Braun, O. Schliebusch,
originally designed for compilation and simulation and have been A.Wieferink, and H. Meyr. A Novel Methodology for the Design of
recently extended to generate the RTL of the processor by synthesizing Application Specific Instruction Set Processors (ASIP) Using a
the instruction behaviors. Since instruction behaviors are described in a Machine Description Language. IEEE Transactions on Computer-
very high abstraction level in order to be used by the compiler, Aided Design, 20(11):1338–1354, Nov. 2001.
achieving a high quality synthesis in these approaches is less likely. [4] P. Mishra, A. Kejariwal, and N. Dutt, “Synthesis-driven Exploration
Furthermore, the designer has no control over the details of final of Pipelined Embedded Processors”, International Conference on
implementation and is limited to describing the functionality of VLSI Design, 2004.
instructions. Since these ADLs are behavioral, they must capture all [5] H. Akaboshi, “A Study on Design Support for Computer Architecture
possible configurations of instructions. This can lead to very lengthy Design”, Doctoral Thesis, Depart. of Information Systems, Kyushu
descriptions. For example, in LISA the description of two RISC Univ., Japan, Jan. 1996
processors with four and seven pipeline stages has been reported to be [6] R. Leupers and P. Marwedel, “Retargetable Code Generation based
more than 2000 and more than 9000 lines of code, respectively [8]. on Structural Processor Descriptions,” Design Automation for
UDL/I [5] is a hardware description language (HDL) that captures the Embedded Systems, vol. 3, no. 1, 1998.
architecture at the Register-Transfer (RT)-level. A target specific [7] R. Leupers, P. Marwedel, “Retargetable Generation of Code Selectors
compiler can be generated based on the instruction set extracted from from HDL Processor Models”, European Design and Test, 1997.
the UDL/I description. UDL/I cannot support architecture with any [8] A. Chattopadhyay, D. Kammler, E. Witte, O. Schliebusch, H.
instruction level parallelism. Ishebabi, B. Geukes, R. Leupers, G. Ascheid, “Automatic Low Power
MIMOLA [6] is another HDL that captures the architecture netlist at Optimizations during ADL-driven ASIP Design”, VLSI-DAT, 2006.
RT-Level and is used for hardware synthesis, simulation, test [9] M. Reshadi, D. Gajski, “A Cycle-Accurate Compilation Algorithm
generation, and code generation. The RECORD compiler [7] extracts for Custom Pipelined Datapaths”, CODES+ISSS, 2005.
behavioral model of instructions from MIMOLA HDL. It processes the [10] B. Gorjiara, D. Gajski, “Custom Processor Design Using NISC: A
structure of the datapath from destination storages towards source Case-Study on DCT algorithm”, ESTIMEDIA, 2005.
storages to extract valid register transfers (RTs). After analyzing the [11] http://www.cecs.uci.edu/~nisc
controller and the instruction decoder, it rejects illegal RTs that do not [12] XML: http://www.w3.org/XML/
correspond to an instruction, and uses the remaining RTs in the [13] XML Schema: http://www.w3.org/XML/Schema
compiler. MIMOLA does not support pipelined architectures and [14] http://www.underbit.com/products/mad/
assumes single cycle operations. Furthermore, designer must describe [15] H. Cho, S. Abdi, D. Gajski, “Design and Implementation of
the instruction decoder from which the compiler will extract the set of Transducer for ARM-TMS Communication”, In Proc. ASPDAC,
valid operations. Although RT-level descriptions are more amicable to Design Contest, 2006.
hardware designers, describing the instruction decoder at RT-level is
[16] B. Gorjiara, M. Reshadi, D. Gajski, “NISC Communication
very tedious. Also instruction set extraction from RT-level is very
Interface”, Center for Embedded Computer Systems (CECS)
difficult and is typically possible only for limited target scope.
Technical Report TR 06-05, 2006.
287