Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
( RISC )
Hardwired control
Microprogrammed control
Highly pipelined
Less pipelined
Complexity in compiler
Complexity in microcode
Example : 5 X 10 = ?
MOV AX, 0
MOV BX, 10
MOV CX, 5
BEGIN : ADD AX, BX
LOOP BEGIN
Example : 5 X 10 = ?
MOV AX, 10
MOV BX, 5
MUL BX, AX
10+10+10+10+10=50
Total clock cycles =
(3 MOV X 1 clock cycle) +
(5 ADD X 1 clock cycle) + (5 LOOP X 1 clock cycle)
= 13 clock cycles
10 x 5 = 50
Total clock cycles =
(2 MOV X 1 clock cycle) +
(1 MUL X 30 clock cycle)
= 32 clock cycles
Superscalar Vs Superpipeline
Pipelines
Pipelines in computers are used to improve the performance of the basic instruction cycle.
The goal is to improve the throughput of the computer, the number of instructions per second (MIPS),
by overlapping tasks in the instruction cycle.
Pipeline Stages:
Fetch instruction (FI): Fetch the next instruction into the computer.
Decode instruction (DI): Determine the opcode and the operand specifiers.
Calculate operands (CO): Calculate the effective address of each source operand.
Fetch operands (FO): Fetch each operand from memory. Operands in registers need not be fetched.
Pipeline Hazards :
A risk in which pipeline operations stall (stop) for one or more clock cycles
Occur when two or more instructions in pipeline need the same resource
Example : a simplified five stage pipeline, in which each stage takes one clock cycle
All instruction fetches and data R/Ws must be performed one at a time
An operand R/Ws from memory cannot be performed in parallel with an instruction fetch.
Assumes that the source operand for I1 is in memory, rather than a register.
Therefore, the fetch instruction stage of the pipeline must idle for one cycle before beginning
the instruction fetch for I3.
Solution :
Increase available resources, such as having multiple ports into main memory and multiple
ALU units.
Data Hazard:
Two instructions in a program are to be executed in sequence and both access a particular memory or
register operand.
But the SUB instruction needs that value at the beginning of its stage 2, which occurs at clock
cycle 4.
To maintain correct operation, the pipeline must stall (stop) for two clocks cycles.
The i1 is calculating a value to be saved in R2, and the i2 is going to use this value to compute a
result for R4.
However, in a pipeline, when we fetch the operands for the 2nd operation, the results from
the first will not yet have been saved, and hence we have a data dependency.
We say that there is a data dependency with instruction i2, as it is dependent on the
completion of instruction i1.
If we are in a situation that there is a chance that i2 may be completed before i1 (i.e. with
concurrent execution) we must ensure that we do not store the result of register R5 before i1
has had a chance to fetch the operands.
Control hazard:
Occurs when the pipeline makes the wrong decision on a branch prediction and therefore brings
instructions into the pipeline that must subsequently be discarded.
Example :
As soon as it is executed in step 6, the pipeline is flushed (instruction 3 is able to complete) and
instructions starting at #20 are loaded into the pipeline.
Solution :
-
Multiple Streams
Loop buffer
Branch prediction
Delayed branching
We assume a superscalar pipeline capable of fetching and decoding two instructions at a time,
and having two instances of the write-back pipeline stage.
Instructions are fetched two at a time and passed to the decode unit.
Because instructions are fetched in pairs, the next two instructions must wait until the pair of
decode pipeline stages has cleared.
To guarantee in-order completion, when there is a conflict for a functional unit or when a
functional unit requires more than one cycle to generate a result, the issuing of instructions
temporarily stalls.
In this example, the elapsed time from decoding the first instruction to writing the last results is
eight cycles.
This allows I3 to be completed earlier, with the next result of a savings of one cycle.
With out-of-order completion, any number of instructions may be in the execution stage at any
one time, up to the maximum degree of machine parallelism across all functional units.
During each of the first three cycles, two instructions are fetched into the decode stage.
During each cycle, subject to the constraint of the buffer size, two instructions move from the
decode stage to the instruction window.
In this example, it is possible to issue instruction I6 ahead of I5 (recall that I5 depends on I4, but I6
does not).
Thus, one cycle is saved in both execute and write-back stages, and the end-to-end savings.
An instruction being in the window simply implies that the processor has sufficient information
about that instruction to decide when it can be issued.