Instruction Level Parallelism: Module 5: Chapter 12
Instruction Level Parallelism: Module 5: Chapter 12
Instruction Level Parallelism: Module 5: Chapter 12
Parallelism
Module 5 : Chapter 12
Topics
Computer Architecture
Basic Design Issues
Computer Architecture and Computer
Organization
Architecture describes what the computer does.
Organization describes how it does it.
Example:
Say you are constructing a house, design and all low-
level details come under computer architecture while
building it brick by brick, connecting together
keeping basic architecture in mind comes under
Computer Organization.
12.1 Computer Architecture
Computer architecture defined as the arrangement
by which the various system building blocks –
processors, functional units, main memory, cache,
data paths and so on – are interconnected and inter-
operated to achieve desired system performance.
Processor Design
Processor
performance
System performance is the key benchmark in the
study of computer architecture.
Criteria: Scalability, price, usability, reliability, power
consumption, physical size
A basic rule of system design is that
no performance bottlenecks in the system.
Typically, a performance bottleneck arises when one
part of the system cannot keep up with the overall
throughput requirements of the system.
This system exhibits a performance mismatch between the processors,
main memory and the processor-memory bus.
summary
Processor design is the central element of computer
system design.
Since system design can only be carried out with specific
target application loads in mind, it follows that processor
design should also be tailored for target application loads.
To satisfy overall system performance criteria, various
elements of the system must be balanced in terms of their
performance.
No element of the system should become a performance
bottleneck.
12.2 Basic Design Issues
designs which aim to maximize the exploitation of
instruction level parallelism need deeper pipelines;
such designs may support higher clock rates.
Instruction-level parallelism (ILP) is a measure of
how many of the instructions in a computer program
can be executed simultaneously.
Let us examine the trade-off involved in this context in a simplified
way:
total chip area = number of cores X chip area per core
or
total transistor count = number of cores X transistor count per core
At a given time, VLSI technology limits the left hand side in the
above equations, while the designer must select the two factors on
the right. Aggressive exploitation of instruction level parallelism,
with multiple functional units and more complex control logic,
increases the chip area—and transistor count—per processor core.
Alternatively, for a different category of target applications, the
designer may select simpler cores, and thereby place a larger
number of them on a single chip.
Within a processor, a set of instructions are in
various stages of execution at a given time—within
the pipeline stages, functional units, operation
buffers, reservation stations, and so on.
Therefore machine instructions are not in general
executed in the order in which they are stored in
memory, and all instructions under execution must
be seen as ‘work in progress’.
To maintain the work flow of instructions within the
processor, a superscalar processor makes use of
branch prediction—i.e. the result of a conditional
branch instruction is predicted even before the
instruction executes—so that instructions from the
predicted branch can continue to be processed,
without causing pipeline stalls. The strategy works
provided fairly good branch prediction accuracy is
maintained.
we shall assume that instructions are committed in order.
Here committing an instruction means that the instruction
is no longer ‘under execution’—the processor state and
program state reflect the completion of all operations
specified in the instruction.
Thus we assume that, at any time, the set of committed
instructions correspond with the program order of
instructions and the conditional branches actually taken.
Any hardware exceptions generated within the processor
must reflect the processor and program state resulting
from instructions which have already committed.
Summary
One of the main processor design trade-offs faced in this
context is this:
Should the processor be designed to squeeze the
maximum possible parallelism from a single thread, or
should processor hardware support multiple independent
threads, with less aggressive exploitation of instruction
level parallelism within each thread?
In this chapter, we will study the various standard
techniques for exploiting instruction level parallelism, and
also discuss some of the related design issues and trade-
offs.