A Network On Chip Architecture and Design Methodology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

A Network on Chip Architecture and Design Methodology

Shashi Kumar1, Axel Jantsch1, Juha-Pekka Soininen2, Martti Forsell2,


Mikael Millberg1, Johny Öberg1, Kari Tiensyrjä2 and Ahmed Hemani3
1
Laboratory of Electronics and Computer Systems, Department of Microelectronics and Information
Technology, Royal Institute of Technology, 164 40 Kista, Stockholm, Sweden
2
VTT Electronics, Box 1100, Oulu, FIN-90571, Finland
3
Spirea AB, Kista Science Park, Electrum 209, S-164 40, Stockholm

Abstract mapping of applications to dedicated architectures would


be impossible. The possible solutions must be searched
We propose a packet switched platform for single chip from platform based design and computer system design,
systems which scales well to an arbitrary number of which rely on the reuse of components, architectures,
processor like resources. The platform, which we call applications and implementations. The essential issue is
Network-on-Chip (NOC), includes both the architecture the trade-off between generality and performance.
and the design methodology. Generality provides reusability of hardware, operating
The NOC architecture is a m × n mesh of switches and systems and development practices, while performance
resources are placed on the slots formed by the switches. (delay, cost, power, etc.) is achieved by using application
We assume a direct layout of the 2-D mesh of switches specific structures.
and resources providing physical- architectural level We propose a NOC platform, consisting of architecture
design integration. Each switch is connected to one and design methodology, which scales from a few dozens
resource and four neighboring switches, and each to several hundred or even thousands of resources. A
resource is connected to one switch. A resource can be a resource may be a processor core, a DSP core, an FPGA
processor core, memory, an FPGA, a custom hardware block, a dedicated HW block, a mixed signal block, or a
block or any other intellectual property (IP) block, which memory block of any kind such as RAM, ROM or CAM.
fits into the available slot and complies with the interface We base this proposal on three assumptions:
of the NOC. The NOC architecture essentially is the on- 1. Moore's law will continue to hold for another five to
chip communication infrastructure comprising the 15 years. In that case our platform should prove useful
physical layer, the data link layer and the network layer of in the time period 2005-2015 [1].
the OSI protocol stack. We define the concept of a region, 2. Single processors will not be able to utilize the
which occupies an area of any number of resources and transistors of an entire chip. Single synchronous clock
switches. This concept allows the NOC to accommodate regions will span only a small fraction of the chip area
large resources such as large memory banks, FPGA areas, [16, 2, 3].
or special purpose computation resources such as high 3. Applications will be modeled as a large number of
performance multi-processors. communicating tasks. The different tasks may have
The NOC design methodology consists of two phases. very different characteristics (e.g. control or data flow
In the first phase a concrete architecture is derived from dominated) and origins (most of them are reused from
the general NOC template. The concrete architecture earlier products or from external sources) [4]. This
defines the number of switches and shape of the network, will make a heterogeneous implementation with
the kind and shape of regions and the number and kind of different kind of resources for different tasks the most
resources. The second phase maps the application onto cost effective solution.
the concrete architecture to form a concrete product. From this we conclude that a large number of different
kinds of blocks, each of the size of a few hundred thousand
gates, will constitute the computational resources. They
1. Introduction have to be connected efficiently.
Increasing non-recurring cost of these chips require that
Current algorithm on chip and system on chip design design cost of chips must be shared across applications.
methodologies cannot respond to the needs of the billion- Furthermore, the same or different variants of the same
transistor area. The design would take too much time and application have to be mapped onto different variants of

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’02)


0-7695-1486-3/02 $17.00 © 2002 IEEE
the product, each establishing a different solution of the Verification and testing are ever increasing challenges in
cost/performance/functionality trade-off. If this can be done today's design routines. With every new technology
quickly and cost effectively, many product versions for generation they are becoming more pressing. We argue that
various market niches can be supported. Physical level the NOC platform effectively addresses these challenges by
and architectural level design integration will be very separating the computation resources from each other and
useful for this. This implies that physical layout and from the communication network for all issues of design,
implementation issues are kept in mind while taking verification and testing.
architectural decisions, or the architectural design is carried In section 2, we list some other research work related to
out within constraints of physical size and a floor plan. complex system design on a chip. In section 3 we describe
The proposed NOC platform would effectively separate the basic ideas and concepts of our proposed NOC
the specification of inter-task communication from the architecture. In section 4 we describe the principles of
implementation of that communication; separate the design methodology for NOC based systems. In section 5
design, implementation and verification of individual tasks we discuss issues of physical implementation and
from the rest of the application (a precondition for task performance for NOC architecture.
reuse); separate the development, optimization and
verification of the individual resource from the network 2. Related work
infrastructure. We argue that the consequent separation of
different concerns is a way to develop high-performance, It is being realized, by all research groups involved in
cost-effective products while boosting design productivity. system level design, that it is absolutely necessary to
Here is not the place to speculate about the kind of allow reuse of already designed components or blocks.
products to be expected within five to ten years. However, Gajski et. al. [5] have proposed an IP-centric embedded
we assume that the future devices will have the following system design methodology. The major challenges in the
requirements and features: IP centric methodologies are the interface synthesis among
1. Processing of multiple ultra high data rate (> 100 various IP blocks and system verification. Recently,
MB/sec) streams of data including audio and video Platform Based Design methodology [6] has been
data. The devices will be required to store this data proposed which not only allows reuse of components but
and process it in real time. also reuse of system architectures and topologies. The
2. Devices will be multi-functional. The functionality basic idea is that an architecture, which is suitable and
could be a mix of entertainment (like games, music efficient for one application will also be suitable and
instruments), communication, remote control, efficient for many similar applications. The idea of using
surveillance etc. the same architecture (platform) for development of
3. Devices will have high-capacity wire line or more application not only speeds up application design but also
likely wireless interfaces to standard networks like reduces its verification time. Keutzer et. al. [7] have
telephone network, Internet, and will need to be able extended the idea of platform based design by including a
to handle multiple communication protocols layer of software on top of the hardware platform to help
simultaneously application development. This layer is called Software
4. Security and secrecy of data stored and flowing Platform. The combination of hardware and software
through these devices will become important. platforms is referred as System Platform. It has also been
Clearly, a NOC based design will not always be the realized that the key to reuse and integration of IP
preferred solution for all kinds of applications. We expect components is the communication from the physical to the
that NOC based designs will provide good solutions for system and conceptual level, and consequently
flexible products that should be reconfigurable and communication centric architectures, platforms and
programmable; for designs which are the basis for several methodologies have been developed [8, 9, 10].
product variants; for applications with a heterogeneous task Many architectural templates have been proposed for
mix; for applications with stringent time to market hardware platforms for future SoCs. There is a general
requirements; for products where reuse both at the block emphasis on providing efficient and standardized
and the function and feature level is considered valuable. communication infrastructure for connecting multiple
The design costs can be justified by increasing the resources on the chip [11, 8, 9]. There is a trend to adapt
implementation volumes and it is likely that the billion- layered approach of OSI reference model towards on Chip
transistor chips are not designed for single product communication [12, 10, 13].
instances or single applications. The design methodology It is estimated that video and audio processing are
must therefore support product family management. going to be common tasks in many applications. These
Tolerance of incomplete specifications, management of applications are going to require storage and processing of
configurations and modifications, support for multiple large amount of data. It is predicted that memories are
languages and methods, and capability to handle different going to take around two third of the chip area in future
abstraction levels simultaneously are desirable system on chips [14]. Many researchers have concentrated
characteristics. on analyzing hierarchical organizations of memories and

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’02)


0-7695-1486-3/02 $17.00 © 2002 IEEE
optimization of memory sizes and data storage strategies 2008, a 22mm × 22mm chip size, and a resource size of
for data intensive applications [15]. Researchers have alos 2mm × 2mm and a minimum wire pitch of 300nm. A
simulated theoretically elegant shared memory model on NOC would accommodate 10 × 10 resources, each switch
message passing parallel computers in order to develop would occupy 30µm × 30µm and the channels would be
data intensive applications on them [19]. 30µm wide. Assuming that we can use 3 metal layers for
The future system on a chip, incorporating many the switch-to-switch connection we have space for 300
different types of processing and memory elements, has to wires. Since we need control, handshaking and signaling
operate using Globally Asynchronous Locally bits will yield an effective data bus width of 256 bits.
Synchronous (GALS) paradigm [16], at least at the
hardware level. GALS paradigm not only avoids the
problem of clock skew but also leads to lower power S S S S
consumption. rni rni rni rni
Resource Resource Resource Resource

3. Network on Chip Architecture


S S S S
rni rni rni rni
The NOC architecture provides the communication
Resource Resource Resource Resource
infrastructure for the resources. We have two main
objectives. Firstly, it is possible to develop the hardware
of resources independently as stand-alone blocks and create S S S S
rni rni rni rni
the NOC by connecting the blocks as elements in the
Resource Resource Resource Resource
network. Secondly, the scalable and configurable network
is a flexible platform that can be adapted to the needs of
different workloads, while maintaining the generality of S S S S
application development methods and practices. rni rni rni rni
Resource Resource Resource Resource

3.1. The NOC network

We chose a simple mesh interconnection topology as Figure 1. A NOC with 16 resources.


basic topology, because it is simplest from a layout
perspective and the local interconnections between
resources and switches are independent of the size of the
SWITCH
network. Moreover, routing in a two-dimensional mesh is
selection
mux
easy resulting in potentially small switches, high capacity, logic queue
short clock cycle, and overall scalability.
A NOC consists (Figure 1) of resources and switches
queue
that are connected using channels as a mesh (Manhattan-

mux
like structure) so that they are able to communicate with
each other by sending messages. A resource R is a selection selection
logic logic
computation or storage unit or their combination. A
mux

switch S (Figure 2) routes and buffers messages between


resources. Each switch is connected to four other queue
c

neighboring switches through input and output channels.


gi
lo

selection
s.

A channel C consists of two one-directional point-to-point


logic
m
ux

buses between two switches or a resource and a switch. mux


Switches may have internal queues to handle congestion.
We call this approach Chip-Level Integration of
Communicating Heterogeneous Elements (CLICHÉ).
The precise layout and geometry depends on the
technology generation. We expect that the area of a Figure 2. Block diagram of a switch.
resource is the maximal synchronous region in a given
technology. It is expected to shrink with every new 3.2. NOC resources
technology generation. Consequently the number of
resources will grow, the switch-to-switch and the switch- The NOC would allow for arbitrary resources. Typical
to-resource bandwidth will grow, but the network wide examples would be embedded processor and DSP cores
communication protocols will be unaffected. Figure 1 provided with caches as well as local memories, dedicated
illustrates the principles of the physical floor plan within hardware resources, and configurable hardware resources.
the NOC. Consider a 60nm CMOS technology expected in Since the area of resource equals one synchronous clock

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’02)


0-7695-1486-3/02 $17.00 © 2002 IEEE
domain, the resource can be a combination of all previous layer. We expect that c=n(w-wc) with n=1,2,3 or 4.
types. The internal communication inside a resource is For n=2,3 or 4 the channel would be pipelined,
synchronous. In Figure 3 RNI=resource network interface, accommodating n data link cells at any time instant.
P=processor core, D=DSP core, c=cache, M=memory and w c is the number of control wires required by the
re=reconfigurable block. physical layer, e.g. synchronization signals.
3. The network layer defines how a packet is transmitted
over the network from an arbitrary sender to an
S S S arbitrary receiver directed by the receiver's network
P
rni rni
P
rni address. This layer is again technology dependent and
c M D M c M each network layer packet, together with the
re re
c destination address, is exactly 1 data link cell. Thus,
S S S taking up our previous example, we have w=300 and c
P c rni
P
rni
P c rni may be 290. We need roughly 10 bits for the address
M c M M and a few control bits (e.g. a hop count) for switching.
re Hence, the network packet would be 256 bit.
4. The transport layer is technology independent. The
S S S transport layer message size can be variable. The RNI
rni
interface has to pack transport layer messages into
P c rni rni
re D M network layer packets.
M The RNI implements all four layers towards the network.
c
The switch-to-switch interfaces implement only the three
lower protocol layers. The basic communication
Figure 3. A typical NOC CLICHÉ featuring mechanism envisioned among computing resources is
various types of resources. message passing. However, it is possible to add additional
protocols on top of the transport layer to provide for
The model of computation is a heterogeneous network instance a virtual shared memory abstraction, which will
of resources executing local computation. Communication help the programmers in development of data and
between the resources is implemented by passing messages computation intensive application.
over the mesh network. Resources operate asynchronously
with respect to each other. Synchronization is provided by 3.4. Regions and wrappers
synchronization primitives, which are implemented by
passing messages around the network. Even a non-local A 2-D mesh topology provides access to all resources
memory is accessed through message passing. of the NOC, it is scalable and it has a simple structure.
In order to make the NOC interface with the outside However, there are applications for which CLICHÉ
world dedicated resources such as I/O elements are needed. structure is not suitable for performance reasons. Examples
The I/O could be of various kinds, they could glue many can found from parallel computation, digital signal
NOC chips together, interface with external memory or processing and data flow processing areas.
implement a TCP/IP interface. Interface modules also A region G is an area inside the NOC, which is
handle data buffering and packet reordering. insulated from the network and which may have different
internal topology and communication mechanisms. The
3.3. Communication concept of region allows for resources of larger size than
the atomic slots in the mesh. In this way development,
Every resource has a unique address and is connected to management, communication and instantiation concerns of
a network via a switch. It communicates with the switch various regions can be separated. Regions are connected to
through a RNI. Thus, any resource can be plugged into the the NOC by special communication arrangements called
network if its footprint fits into an available slot and if it wrappers W, which route packets so that regions are
is equipped with an RNI. The NOC defines four protocol insulated from external traffic. Specific IO wrappers Wio
layers: allow communication between the region and its
1. The physical layer determines the number and length environment. It is also responsible for converting the
of wires connecting resources and switches. messages into appropriate format. Thus, the region concept
2. The data-link layer defines the protocol to transmit a in NOC can be seen to address four aspects:
cell between a resource and a switch and between two 1. A region can be used to dedicate a set of resources and
switches. Both, the physical and the data link layer are a part of the network to a specific task like processing
dependent on the technology. Thus, for each new of streaming-oriented data, processing of block-
technology new technology generation these two oriented data or parallel processing.
layers are defined. Let w be the number of wires in the 2. One can arrange communication inside a region
physical layer and c be the cell size of the data link differently than in the other regions. A NOC designer
may e.g. want to define a region with high

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’02)


0-7695-1486-3/02 $17.00 © 2002 IEEE
communication capacity for efficient work-optimal
implementation of shared memory abstraction [20]. Optimised
virtual
3. A region can be used to insulate a set of resources components
from the traffic happening between the resources not Communication
structure
belonging to the region.
4. A region can be used for encapsulating a specific
technology into a NOC. For example an area dedicated Generic backbone Platform “Application area
development specific IPR”
to FPGA or embedded memory could be larger than
the area of resource.
However, the shape of regions cannot be arbitrary but
Processors
their boundaries must be convex. This definition of and hardware
regions imply that resources requiring high-capacity Optimised
intellectual
intercommunication need to be placed into the same property
region, because wrappers between regions may cause some Product area specific platform Application
mapping “Product
constraints to capacity and latency of communication. specific IPR”
From the point of view of the network layer, regions do
not form separate sub-networks, instead they can be Code and
configuration
considered as just lightweight mechanisms to organize
communication in a more efficient and rational way.
NOC system
4. Backbone-Platform-System Methodology
Our NOC concept is based on the idea to have a Figure 4. NOC based system design.
backbone based application specific platform where the
final applications can be mapped as software or 4.1. Backbone design
configurable hardware. Combination of design productivity
and system quality requirements has led us to the The NOC backbone encapsulates the topological and
backbone-platform-system design methodology (BPS). The communication issues such as channels, switches, and
idea with the BPS is to encapsulate the design work into network interfaces. The backbone is the development
reusable platforms. A NOC based system consists of a platform for all NOC based systems, so it is important
hierarchy of structural and behavioral objects, e.g. that every system follows the basic operation principles
backbone, platform and system concepts. BPS has two defined in the backbone.
main phases, platform development and application During the backbone design the focus is the network
mapping, as depicted in Figure 4. communication resources, e.g. switches and interfaces, and
Even in a small 4x4 meshes of switches and resources NOC system services and performance of different region
there are 16 subsystems with a complexity of current state- topologies. From the definition of resource area follows
of-the-art SOC design each. Management of such that the connections between neighboring switches and the
complexity must be based on extremely structured switch design are issues where physical design has an
architecture and extensive reuse. In BPS methodology the important role. The system-level communication
generic, structured architecture and system development challenges the technological limits. The amount of wires,
principles are described as a backbone concept. wire lengths, synchronization, and buffering are all
Development of several SOC complexity level problems were physical layout and characteristics sets
subsystems, e.g. resources in CLICHÉ topology, must be constraints. Customized region topology enables NOC
based on the reuse of optimized virtual components or based systems were the quality of the application mapping
even computer systems. If we assume that current SOC is optimized in the beginning. Definition of region
design has a moderate complexity of 10 million gates, requires that potential applications are analyzed and
then even in small 4x4 mesh the hardware complexity modeled. Mathematical and performance analyses and even
would approach 200 million gates. performance simulations are the main tools to be used.
The computational capacity of NOC based system
depends on the type of resources. If we assume that 4.2. Platform design
resources are general-purpose processor based computer
systems with a capacity of 1000 MIPS each, the 4x4 mesh The objective of platform development is to create a
would have a total capacity of 16 GIPS. In real system, computation platform for an intended application area.
part of the capacity would be wasted due to Scaling of the network, definition or regions, design of the
communication and allocation problems, but it is obvious resource nodes, and definition of the system control are the
that reuse of applications, middleware and system main activities. It requires thorough understanding of the
architectures is required. functionality of the target systems, but due to the platform

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’02)


0-7695-1486-3/02 $17.00 © 2002 IEEE
nature it is not possible to use exact applications as a Table 1. Design responsibilites during
starting point for architecture requirement definition. Use different phases of NOC development.
of optimized virtual components and knowledge of
application-area requirements are essential in managing the Instance Responsibilities during design
complexity and performance requirements of the target Backbone Region types
system. During the development the characterization of development Communication channels and switches
Network interfaces of resources
application area domain and architecture and system Communication protocols (specification)
quality estimations are essential tools. The application area Platform Region scaling
specific platform encapsulates the hardware design development Resource design (units, interconnections)
problems and serves as a manufacturing integration Dedicated hardware blocks
platform for system developers. System level control (implementation of
communication, diagnostics, monitoring)
For example, in 4x4 mesh CLICHÉ system, we have to
Application Resource level control (OS)
define and design 16 resources, e.g. 16 communicating development Functionality of resources (SW, configurable
computer systems, if the NOC platforms would be used HW)
for the parallel implementation of heterogeneous Control of the network
applications. If we want to optimize the platform for some Functionality of the network
specific application area, we certainly need very efficient
ways of making the right decisions and new figures of Our NOC backbone defines the implementation of the
merit to describe the quality of NOC. Currently used network. The main task for designer at system level is to
metrics: performance, utilization, capacity must be adapted decide what to put into the NOC as resources, how to map
to handle temporal and spatial effects that are inevitable functionality into those resources, and how to validate the
with target systems. For example with combined decisions. The actual design relies on the reuse of virtual
communication and computing systems, the required components and intellectual property, and enhanced
architectural features may vary from bit-based processing to methods and tools to support them are required. Especially
parallel manipulation of huge data sets. The at system level it is important to use abstract models and
communication throughput and latency requirements are descriptions of both resources and applications. Otherwise
different in the same way. the computational complexity of analyses, estimations and
simulations will exceed the computational capacity of
4.3. System design design tools. In traditional system design approaches the
design space exploration has been done using with
In the application mapping the functionality of analytical approaches or with similar design methods and
application is mapped to the resources. The NOC concept tools than the actual design. Most often, only the
should ultimately support both dynamic and static abstraction level of system models has been different.
mapping of applications, but the main problems with both In NOC design, we propose a clear distinction between
are the resource allocation, optimisation of network usage decision making support, development and verification
and verification of performance and correctness. Basically methods and tools. The decision environment should
these issues are rather similar to what distributed and include methods for advanced complexity estimation,
parallel system designers have to face. resource selection, and network analysis. Complexity
The proposed NOC platform is very heterogeneous. The estimation is needed for the scaling of NOC and for region
resources can vary from configurable hardware to type selection. The characteristic of computation is one
multiprocessor computers of almost every type. Therefore, issue that needs to be added to operational complexity. In
several modeling languages should be supported by NOC the resource selection the mappability of algorithms and
application development environment making it easy to architectures is one alternative extension to currently used
integrate different tools into the design flow. As with performance metrics that could provide more knowledge on
platform design, the decision support and quality the potential quality of the system. Similar analysis could
validation needs special attention and new approaches. be used during application mapping. Analysis of network
behavior is a critical part of region definition and
4.4. Methods and tools allocation of resources to functions. Modeling of network
behavior, workload characterization and efficient
Implementation of the BPS methodology or any other simulation are the potential methods, if adapted to NOC
design flow for NOC systems will be a challenge for EDA concept. The development and verification environments
industry. The traditional SOC, platform, and intellectual should provide a virtual machine and development
property based design flows must be extended to cover environment for software development, and tools for
network-related issues, e.g. distribution and parallelism hardware design. Complexity is the biggest challenge in
effects as described in Table 1. both. Abstraction, partitioning of problems and
distribution of computation looks as viable alternatives.

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’02)


0-7695-1486-3/02 $17.00 © 2002 IEEE
5. Discussion 5.1. Physical Aspects of NOC

Design of a new product using NOC architecture is We have investigated some physical issues in the
similar to the problem of designing a computer network design of the switches and the inter-switch connections for
with some computing and communication requirements. on-chip communication networks like NOC [18]. In
We have adapted ns-2 from Univ. of Berkeley at particular, we have compared two distinct layouts for a
California, to study various design options in NOC switch, called “thin switch” and “square switch”. In thin
architecture and their effect on performance[ 17]. switch, the switch functionality is distributed around a
0,2 resource and wires are routed across the resources. A
square switch is placed on the crossings in dedicated
Traffic rates channels left between resources. The wires are routed in
0,15 80 Mb/sec these channels.
100 Mb/sec
We have considered wireability, delay and maximum
Drop Probability

120 Mb/s
150 Mb/s
190 Mb/s
signal bandwidth between switches, positioning of pads
0,1
and positioning of repeaters in our study. The study has
been conducted based on the 60nm CMOS technology
0,05 expected in about 6 years. The main conclusion is that in
five years 10 x 10 NOC architectures will be feasible. It
will be possible to route 256 wires between a resource and
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13
a switch and between two neighboring switches in the
log2 Buffer size mesh. The study also shows that the square switch option
is superior with respect to performance and bandwidth
Figure 5. Drop probability vs. buffer size. while the thin switch requires relatively low area.
We have used a homogeneous 5 x 5 NOC architecture 5.2. System development
for our simulation experiments. In particular, we have
studied the effect of buffer size in switches and network The main objective for the NOC development
traffic (called network load) on delay and probability of environment will be to separate different concerns and
message loss. These simulation experiments have been activities and to shield some tools and design tasks from
carried out using various types of network traffic cases like details in other tools and tasks.
random traffic and local traffic and mix of these. The The BPS methodology tries to benefit from reuse as
figure below shows relationship between the probability of much as possible and to give support for application
a packet being dropped verses the size of buffer in the development. The idea has also been to find an optimal
switch for each direction. We have assumed that a link balance between manufacturing and system level
between two switches supports a maximum traffic of integration platforms. The role of the backbone is to
200Mbits/sec. Various lines in the graph show the drop provide a solid starting point for ASIC design with
probability verses buffer size for various actual traffic rates. guidelines and flexibility.
We observe that for actual traffic of up to 100Mbits/sec, The NOC system development environment will
the drop probability is very close to zero if a buffer size of provide layered system services, which will shield an
four packets is used. Traffic rate is controlled by application developer from the details of the NOC lower
controlling the rate at which a subset of resources generate level architecture. It will provide application level
packets and by controlling the destination address of the communication, synchronization, memory management,
generated packets. The traffic generated for this study had a and resource management services.
mix of local and random traffic. We have carried out many Design tools, which map applications onto the NOC,
other similar experiments [17]. must eventually implement all communications between
These simulation studies have resulted in many resources by means of the three protocol layers provided by
interesting conclusions: For moderate traffic, a buffer size the network. This can be considered as a contract. If the
of 8 messages for each direction leads to almost zero drop applications comply with these protocols the network
probability. Message delay increases with buffer size as guarantees the communication services. Ideally we would
well as network load. Message delay is more sensitive to like to extend this contract also to performance issues, for
network load than to buffer size. If the network load instance with a contract where applications guarantee a
increases beyond 50% of network capacity, then it is maximum number of messages per time unit and the
impossible to avoid message drop even with large buffers. network guarantees a maximum transport delay of all
This study helps us to decide size of buffer in switches. messages. It is part of our future work to define the
It also emphasizes the need for good mapping of conditions under which such a contract is feasible.
applications to the NOC architecture so that the resulting
traffic is local to a small area of the NOC. This will reduce
network traffic.

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’02)


0-7695-1486-3/02 $17.00 © 2002 IEEE
6. Conclusions the parallel processing capacity provided by multiple
computational resources. Programmable interconnectivity
In this paper we have described an architectural and efficient implementation of shared memory abstraction
template, called network on chip architecture, for are keys to provide this generality.
developing large and complex systems on a single chip. Before NOC architectural template can be used to
The architecture supports physical level and architectural develop applications, one needs to work out the details of
level design integration. Basic communication mechanism architecture, communication, design flow, and system
between resources is envisioned to be packet switched services. Currently we are building many simulators for
message passing through the switches. NOC architecture evaluating various architectural and communication
defines four layered inter-resource communication protocol options at different levels. We are also interested in
(physical, data-link, network and transport layer), which analytical analysis of architectural options for NOC.
are adapted from OSI standard. These protocols must be
implemented in the resource to network interface (RNI) for 7. Acknowledgements
every resource in NOC. We have also described a two-
phase design methodology for developing systems for the We gratefully acknowledge many valuable discussions
proposed NOC architecture. we had with Dr. Li-Rong Zheng and Dinesh Pamunuwa.
The NOC concept has been necessitated by three This work is a part of the joint Finnish-Swedish EXSITE
factors: First there is the increasing demand of on-chip (Explorative System Integrated Technologies) research
interconnect bandwidth. The second equally crucial factor program. This work was sponsored by TEKES (The
is to amortize the enormous engineering cost involved in National Technology Agency of Finland), VINNOVA
designing such large chips over multiple applications. The (Swedish Agency for Innovation Systems), Nokia Oyj,
third factor is demand for easy-to-use methods to exploit Ericsson Radio Systems AB, and Spirea AB Kista.

References [11] A. Hemani, A. Jantsch, S. Kumar, A. Postula, J. Oberg,


M. Millberg, and D. Lindqvist, “Network on Chip:
An architecture for billion transistor era”, Proc. o f
[1] Semiconductor Industry Association, International the IEEE NorChip Conference, Nov. 2000.
Technology Roadmap for Semiconductors, World
[12] A. Jantsch, J. Soininen, M. Forsell, L. Zheng, S.
Semiconductor Council, Edition 1999, 1999.
Kumar, M. Millberg, and J. Öberg, “Networks o n
[2] D. Sylvester and K. Keutzer, “Getting to the Bottom Chip”, Workshop at the European Solid State
of Deep Submicron”, Proc. of the Int. Conference o n Circuits Conference, Sep. 2001.
Computer-Aided Design, 1998, 203-211.
[13] L. Benini and G. DeMicheli, “Powering Networks o n
[3] D. Sylvester and K. Keutzer, “Getting to the Bottom Chip”, Proc. of the 14th Int. Symp. on System
of Deep Submicron II: a global wiring paradigm”, Synthesis, 33-38, Oct. 2001.
Proc. of the 1999 Int. Symp. on Physical Design,
[14] F. Catthoor, D. Verkest, and E. Brockmeyer, ”Proposal
1999, 193-200.
for unified system design meta flow in task-level
[4] C. Szyperski, Component Software: Beyond Object and instruction –level design technology research
Oriented Software, Reading, MA, ACM/Addison for multi-media applications”, Proc. 11 th Int. Symp.
Weseley, 1998. on System Synthesis, 1998, 89-95.
[5] D. Gajski, R. Dömer and J. Zhu, “IP-Centric [15] P. Panda, N. D. Dutt, and A. Nicolau,, ”Local Memory
Methodology and Design with the SpecC Language” Exploration and Optimization in Embedded
in System Level Design, Edited by Ahmed A. Jerraya Systems”, IEEE Trans. on Computer Aided Design o f
and Jean Mermet, Nato Science Series 357, 1999. Integrated Circuits and Systems 18, 1 (1999), 3-13.
[6] F. Vahid and T. Givargis, “Platform Tuning for [16] A. Hemani et. al. , ”Lowering power consumption i n
Embedded Systems Design”, IEEE Computer 34, 3. clock by using Globally Asynchronous Locally
[7] K. Keutzer, S. Malik, A. Newton, J. Rabaey and A. Synchronous Design style”, Proc. of Design
Sangiovanni-Vincentelli, “System Level Design: Automation Conference, 1999, USA.
Orthogonolization of Concerns and Platform-Based [17] Yi-Ran Sun, “Simulation and Performance
Design”, IEEE Trans. on Computer-Aided Design o f Evaluation for Network on Chip”, MSc thesis, Dept.
Integrated Circuits and Systems 19, 12 (Dec. 2000). of Microelectronics and Information Technology,
[8] W. Dally and B. Towles, “Route Packets, Not Wires: Royal Institute of Technology, Stockholm.
On-Chip Interconnection Networks”, Proc. of the [18] Dinesh Pamunuwa et. al., “ A study of Physical Issues
Design Automation Conference, Jun. 2001. in the design of an on-chip regular communication
[9] D. Wingard, “MicroNetwork-Based Integration of network”, submitted to DAC 2002.
SOCs”, Proc. of the 38th Design Automation [19] V. Leppänen, Studies on the realization of PRAM,
Conference, Jun. 2001 Dissertation 3, TUCS, University of Turku, 1996.
[10] M. Sgroi et. al., “Addressing the System-on-a-Chip [20] M. Forsell and S. Kumar, Virtual Distributed Shared
Interconnect Woes Through Communication-Based Memory for Network on Chip, Proc. of the 19th IEEE
Design”, Proc. of the 38th Design Automation NORCHIP Conference, Nov. 12-13, 2001, Kista.
Conference, Jun. 2001.

Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI’02)


0-7695-1486-3/02 $17.00 © 2002 IEEE

You might also like