Unit 5 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Pipelining

The term Pipelining refers to a technique of decomposing a sequential


process into sub-operations, with each sub-operation being executed in a
dedicated segment that operates concurrently with all other segments.
The most important characteristic of a pipeline technique is that several
computations can be in progress in distinct segments at the same time.
The overlapping of computation is made possible by associating a
register with each segment in the pipeline. The registers provide
isolation between each segment so that each can operate on distinct data
simultaneously.

The structure of a pipeline organization can be represented simply by


including an input register for each segment followed by a
combinational circuit.

Let us consider an example of combined multiplication and addition


operation to get a better understanding of the pipeline organization.
The combined multiplication and addition operation is done with a
stream of numbers such as:

Ai* Bi + Ci for i = 1, 2, 3, ......., 7

The operation to be performed on the numbers is decomposed into sub-


operations with each sub-operation to be implemented in a segment
within a pipeline.

The sub-operations performed in each segment of the pipeline are


defined as:

R1 ← Ai, R2 ← Bi Input Ai, and Bi


R3 ← R1 * R2, R4 ← Ci Multiply, and input Ci
R5 ← R3 + R4 Add Ci to product
The following block diagram represents the combined as well as the sub-
operations performed in each segment of the pipeline.

Registers R1, R2, R3, and R4 hold the data and the combinational
circuits operate in a particular segment.

The output generated by the combinational circuit in a given segment is


applied as an input register of the next segment. For instance, from the
block diagram, we can see that the register R3 is used as one of the input
registers for the combinational adder circuit.
In general, the pipeline organization is applicable for two areas of
computer design which includes:

1. Arithmetic Pipeline
2. Instruction Pipeline

Arithmetic Pipeline
Arithmetic Pipelines are mostly used in high-speed computers. They are
used to implement floating-point operations, multiplication of fixed-
point numbers, and similar computations encountered in scientific
problems.

To understand the concepts of arithmetic pipeline in a more convenient


way, let us consider an example of a pipeline unit for floating-point
addition and subtraction.

The inputs to the floating-point adder pipeline are two normalized


floating-point binary numbers defined as:

X = A * 2a = 0.9504 * 103
Y = B * 2b = 0.8200 * 102

The combined operation of floating-point addition and subtraction is


divided into four segments. Each segment contains the corresponding
suboperation to be performed in the given pipeline. The suboperations
that are shown in the four segments are:

1. Compare the exponents by subtraction.


2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalize the result.

We will discuss each suboperation in a more detailed manner later in


this section.
The following block diagram represents the suboperations performed in
each segment of the pipeline.
1. Compare exponents by subtraction:

The exponents are compared by subtracting them to determine their


difference. The larger exponent is chosen as the exponent of the result.
The difference of the exponents, i.e., 3 - 2 = 1 determines how many
times the mantissa associated with the smaller exponent must be shifted
to the right.

2. Align the mantissas:


The mantissa associated with the smaller exponent is shifted according
to the difference of exponents determined in segment one.

X = 0.9504 * 103
Y = 0.08200 * 103

3. Add mantissas:

The two mantissas are added in segment three.

Z = X + Y = 1.0324 * 103

4. Normalize the result:

After normalization, the result is written as:

Z = 0.1324 * 104
Instruction Pipeline
Pipeline processing can occur not only in the data stream but in the
instruction stream as well.

Most of the digital computers with complex instructions require


instruction pipeline to carry out operations like fetch, decode and
execute instructions.

In general, the computer needs to process each instruction with the


following sequence of steps.

1. Fetch instruction from memory.


2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.

Each step is executed in a particular segment, and there are times when
different segments may take different times to operate on the incoming
information. Moreover, there are times when two or more segments may
require memory access at the same time, causing one segment to wait
until another is finished with the memory.

The organization of an instruction pipeline will be more efficient if the


instruction cycle is divided into segments of equal duration. One of the
most common examples of this type of organization is a Four-segment
instruction pipeline.
A four-segment instruction pipeline combines two or more different
segments and makes it as a single one. For instance, the decoding of the
instruction can be combined with the calculation of the effective address
into one segment.

The following block diagram shows a typical example of a four-segment


instruction pipeline. The instruction cycle is completed in four segments.
Segment 1:

The instruction fetch segment can be implemented using first in, first out
(FIFO) buffer.

Segment 2:

The instruction fetched from memory is decoded in the second segment,


and eventually, the effective address is calculated in a separate
arithmetic circuit.

Segment 3:

An operand from memory is fetched in the third segment.

Segment 4:

The instructions are finally executed in the last segment of the pipeline
organization.
Parallel Processing
Parallel processing can be described as a class of techniques which
enables the system to achieve simultaneous data-processing tasks to
increase the computational speed of a computer system.

A parallel processing system can carry out simultaneous data-processing


to achieve faster execution time. For instance, while an instruction is
being processed in the ALU component of the CPU, the next instruction
can be read from memory.

The primary purpose of parallel processing is to enhance the computer


processing capability and increase its throughput, i.e. the amount of
processing that can be accomplished during a given interval of time.

A parallel processing system can be achieved by having a multiplicity of


functional units that perform identical or different operations
simultaneously. The data can be distributed among various multiple
functional units.

The following diagram shows one possible way of separating the


execution unit into eight functional units operating in parallel.

The operation performed in each functional unit is indicated in each


block if the diagram:
o
o The adder and integer multiplier performs the arithmetic operation
with integer numbers.
o The floating-point operations are separated into three circuits
operating in parallel.
o The logic, shift, and increment operations can be performed
concurrently on different data. All units are independent of each
other, so one number can be shifted while another number is being
incremented.
A RISC (Reduced Instruction Set Computer) pipeline in computer organization
refers to the implementation of a processor architecture that follows the principles
of RISC design. RISC processors typically have a simplified instruction set, with a
focus on executing instructions in a small number of clock cycles. The pipeline
architecture is a key component of RISC processors, allowing for efficient
instruction execution by breaking down the instruction execution process into
multiple stages.

Here are the typical stages of a RISC pipeline:

IF (Instruction Fetch): Fetches the next instruction from memory.

ID (Instruction Decode): Decodes the fetched instruction and fetches register


operands.

EX (Execution): Executes the operation specified by the instruction.

MEM (Memory Access): Accesses memory (for load/store instructions) or


computes branch targets.

WB (Write Back): Writes the result of the executed instruction back to the register file.

Each stage operates concurrently, and instructions move through the pipeline
stages in sequence. The pipeline stages are synchronized by a clock signal. As the
clock ticks, each stage processes the instruction currently in its pipeline stage and
passes it to the next stage on the next clock cycle. This overlapping of instruction
execution enables high throughput and efficient use of hardware resources.

RISC Processor
RISC stands for Reduced Instruction Set Computer Processor, a microprocessor
architecture with a simple collection and highly customized set of instructions. It is built
to minimize the instruction execution time by optimizing and limiting the number of
instructions. It means each instruction cycle requires only one clock cycle, and each cycle
contains three parameters: fetch, decode and execute. The RISC processor is also used
to perform various complex instructions by combining them into simpler ones. RISC
chips require several transistors, making it cheaper to design and reduce the execution
time for instruction.
Examples of RISC processors are SUN's SPARC, PowerPC, Microchip PIC processors,
RISC-V.

Advantages of RISC Processor

1. The RISC processor's performance is better due to the simple and limited number
of the instruction set.
2. It requires several transistors that make it cheaper to design.
3. RISC allows the instruction to use free space on a microprocessor because of its
simplicity.
4. RISC processor is simpler than a CISC processor because of its simple and quick
design, and it can complete its work in one clock cycle.

Disadvantages of RISC Processor

1. The RISC processor's performance may vary according to the code executed
because subsequent instructions may depend on the previous instruction for
their execution in a cycle.
2. Programmers and compilers often use complex instructions.
3. RISC processors require very fast memory to save various instructions that require
a large collection of cache memory to respond to the instruction in a short time.

RISC Architecture
It is a highly customized set of instructions used in portable devices due to system
reliability such as Apple iPod, mobiles/smartphones, Nintendo DS,
Features of RISC Processor
Some important features of RISC processors are:

ADVERTISEMENT

1. One cycle execution time: For executing each instruction in a computer, the
RISC processors require one CPI (Clock per cycle). And each CPI includes the
fetch, decode and execute method applied in computer instruction.
2. Pipelining technique: The pipelining technique is used in the RISC processors to
execute multiple parts or stages of instructions to perform more efficiently.
3. A large number of registers: RISC processors are optimized with multiple
registers that can be used to store instruction and quickly respond to the
computer and minimize interaction with computer memory.
4. It supports a simple addressing mode and fixed length of instruction for
executing the pipeline.
5. It uses LOAD and STORE instruction to access the memory location.
6. Simple and limited instruction reduces the execution time of a process in a RISC.

CISC Processor
The CISC Stands for Complex Instruction Set Computer, developed by the Intel. It has
a large collection of complex instructions that range from simple to very complex and
specialized in the assembly language level, which takes a long time to execute the
instructions. So, CISC approaches reducing the number of instruction on each program
and ignoring the number of cycles per instruction. It emphasizes to build complex
instructions directly in the hardware because the hardware is always faster than
software. However, CISC chips are relatively slower as compared to RISC chips but use
little instruction than RISC. Examples of CISC processors are VAX, AMD, Intel x86 and the
System/360.

Characteristics of CISC Processor


Following are the main characteristics of the RISC processor:

1. The length of the code is shorts, so it requires very little RAM.


2. CISC or complex instructions may take longer than a single clock cycle to execute
the code.
3. Less instruction is needed to write an application.
4. It provides easier programming in assembly language.
5. Support for complex data structure and easy compilation of high-level languages.
6. It is composed of fewer registers and more addressing nodes, typically 5 to 20.
7. Instructions can be larger than a single word.
8. It emphasizes the building of instruction on hardware because it is faster to
create than the software.

CISC Processors Architecture


The CISC architecture helps reduce program code by embedding multiple operations on
each program instruction, which makes the CISC processor more complex. The CISC
architecture-based computer is designed to decrease memory costs because large
programs or instruction required large memory space to store the data, thus increasing
the memory requirement, and a large collection of memory increases the memory cost,
which makes them more expensive.

Advantages of CISC Processors

1. The compiler requires little effort to translate high-level programs or statement


languages into assembly or machine language in CISC processors.
2. The code length is quite short, which minimizes the memory requirement.
3. To store the instruction on each CISC, it requires very less RAM.
4. Execution of a single instruction requires several low-level tasks.
5. CISC creates a process to manage power usage that adjusts clock speed and
voltage.
6. It uses fewer instructions set to perform the same instruction as the RISC.

Disadvantages of CISC Processors

1. CISC chips are slower than RSIC chips to execute per instruction cycle on each
program.
2. The performance of the machine decreases due to the slowness of the clock
speed.
3. Executing the pipeline in the CISC processor makes it complicated to use.
4. The CISC chips require more transistors as compared to RISC design.
5. In CISC it uses only 20% of existing instructions in a programming event.

Difference between the RISC and CISC Processors

RISC CISC

It is a Reduced Instruction Set Computer. It is a Complex Instruction Set Computer.

It emphasizes on software to optimize the It emphasizes on hardware to optimize the


instruction set. instruction set.

It is a hard wired unit of programming in the RISC Microprogramming unit in CISC Processor.
Processor.

It requires multiple register sets to store the It requires a single register set to store the
instruction. instruction.

RISC has simple decoding of instruction. CISC has complex decoding of instruction.

Uses of the pipeline are simple in RISC. Uses of the pipeline are difficult in CISC.

It uses a limited number of instruction that requires It uses a large number of instruction that
less time to execute the instructions. requires more time to execute the
instructions.

It uses LOAD and STORE that are independent It uses LOAD and STORE instruction in the
instructions in the register-to-register a program's memory-to-memory interaction of a
interaction. program.

RISC has more transistors on memory registers. CISC has transistors to store complex
instructions.

The execution time of RISC is very short. The execution time of CISC is longer.

RISC architecture can be used with high-end CISC architecture can be used with low-end
applications like telecommunication, image applications like home automation, security
processing, video processing, etc. system, etc.

It has fixed format instruction. It has variable format instruction.

The program written for RISC architecture needs to Program written for CISC architecture tends
take more space in memory. to take less space in memory.

Example of RISC: ARM, PA-RISC, Power Architecture, Examples of CISC: VAX, Motorola 68000
Alpha, AVR, ARC and the SPARC. family, System/360, AMD and the Intel x86
CPUs.

Types of Array Processor



Array Processor performs computations on large array of data. These are two types
of Array Processors: Attached Array Processor, and SIMD Array Processor. These
are explained as following below.
1. Attached Array Processor :
To improve the performance of the host computer in numerical computational tasks
auxiliary processor is attached to it.

Attached array processor has two interfaces:


1. Input output interface to a common processor.
2. Interface with a local memory.

Here local memory interconnects main memory. Host computer is general purpose
computer. Attached processor is back end machine driven by the host computer.
The array processor is connected through an I/O controller to the computer & the
computer treats it as an external interface.
2. SIMD array processor :
This is computer with multiple process unit operating in parallel Both types of array
processors, manipulate vectors but their internal organization is different.

SIMD is a computer with multiple processing units operating in parallel.


The processing units are synchronized to perform the same operation under the
control of a common control unit. Thus providing a single instruction stream,
multiple data stream (SIMD) organization. As shown in figure, SIMD contains a set of
identical processing elements (PES) each having a local memory M.
Each PE includes –
 ALU
 Floating point arithmetic unit
 Working registers

Master control unit controls the operation in the PEs. The function of master control
unit is to decode the instruction and determine how the instruction to be executed.
If the instruction is scalar or program control instruction then it is directly executed
within the master control unit.
Main memory is used for storage of the program while each PE uses operands
stored in its local memory.

You might also like