dpco unit 4
dpco unit 4
dpco unit 4
com
1. The memory-reference instructions load word (lw) and store word (sw)
2. The arithmetic-logical instructions add, sub, AND, OR, and slt
3. The instructions branch equal (beq) and jump (j)
To implement the above three types of instructions for same method, but independent of the
exact class of instruction. For every instruction, the first two steps are identical:
1. Send the program counter (PC) to the memory that contains the code and fetch the
instruction from that memory.
2. Read one or two registers, using fields of the instruction to select the registers to read.
For the load word instruction, to read only one register, but most other instructions
require reading two registers.
After the above two steps, the actions required to complete the instruction depend on
the instruction class.
1. Memory Reference
2. Arithmetic- Logical
3. Branches
After using the ALU, the actions required to complete various instruction classes differ.
A memory-reference instruction will need to access the memory either to read data for
a load or write data for a store.
An arithmetic-logical or load instruction must write the data from the ALU or memory
back into a register.
A branch instruction may need to change the next instruction address based on the
comparison; otherwise, the PC should be incremented by 4 to get the address of the
next instruction.
All instructions start by using the program counter to supply the instruction address to
the instruction memory.
After the instruction is fetched, the register operands used by an instruction are
specified by fields of that instruction.
Once the register operands have been fetched, they can be operated on
1. To compute a memory address (for a load or store),
2. To compute an arithmetic result (for an integer arithmetic-logical instruction),
3. To compare (for a branch).
If the instruction is an arithmetic-logical instruction, the result from the ALU must be
written to a register.
If the operation is a load or store, the ALU result is used as an address to either store
a value from the registers or load a value from memory into the registers.
The result from the ALU or memory is written back into the register file.
Branches require the use of the ALU output to determine the next instruction address,
which comes either from the ALU (where the PC and branch off set are summed) or
from an adder that increments the current PC by 4.
The above figure 3.1 shows most of the flow of data through the processor, it omits two
important aspects of instruction execution.
1. Data going to a particular unit as coming from two different sources.
2. Several of the units must be controlled depending on the type of instruction.
First aspect: Data going to a particular unit as coming from two different sources.
The value written into the PC can come from one of two adders.
The data written into the register file can come from either the ALU or the data memory
The second input to the ALU can come from a register or the immediate field of the
instruction.
These data lines cannot simply be wired together so must add a logic element that
chooses from among the multiple sources and steers one of those sources to its
destination.
This selection is commonly done with a device called a multiplexor; it is also called as
data selector.
The multiplexor, which selects from among several inputs based on the setting of its
control lines. The control lines are set based primarily on information taken from the
instruction being executed.
Second aspect: Several of the units must be controlled depending on the type of
instruction.
The following Figure 3.2 shows the data path with the three required multiplexors added, as
well as control lines for the major functional units.
Control unit:
A control unit, which has the instruction as an input, is used to determine how to set
the control lines for the functional units and two of the multiplexors.
Function of Third multiplexor
The middle multiplexor, whose output returns to the register file, is used to steer the output
of the ALU or the output of the data memory (in the case of a load) for writing into the register
file.
Finally, the bottom most multiplexor is used to determine whether the second ALU input is
from the registers (for an arithmetic-logical instruction or a branch) or from the offset field of
the instruction (for a load or store).
The control line determines the ALU perform which operations among three mentioned
operations.
Instruction memory:
The instruction memory need only provide read access because the data path does
not write instructions.
The instruction memory is called as combinational element because it will perform
only read access.
The output at any time reflects the contents of the location specified by the address
input,
No read control signal is needed.
Program Counter:
The program counter is a 32-bit register that is written at the end of every clock
cycle
Does not need a write control signal.
The register containing the address of the instruction in the program being
executed.
Adder:
The adder is an ALU wired to always add its two 32-bit inputs and place the sum on its
output.
Fetching Phase:
To execute any instruction, must start by fetching the instruction from memory.
To prepare for executing the next instruction, must also increment the program
counter
So that it points at the next instruction, 4 bytes (PC+4).
Portion of data path used for fetching instruction and incrementing the program
counter
Register File
A register file is a state element that contains set of registers that can be read or
written by specifying the number to be accessed.
The register file contains the register state of the computer.
There are two elements need to implement the R-format ALU operations such as
1. Register file
2. ALU
Register file:
The register file contains all the registers and has two read ports and one write port.
The register file always outputs the contents of the registers corresponding to the
Read register inputs on the outputs
No other control inputs are needed.
Register write must be explicitly indicated by asserting the write control signal.
The register number to the register file is all 5 bits wide to specify one of 32 bits
wide.
ALU:
ALU takes two 32 bit inputs and produces a 32 bit result as well as 1 bit signal if the
result is 0.
The operation to be performed by the ALU is controlled with the ALU operation
signal, which will be 4 bits wide.
The Zero detection output used to implement branches.
Consider the MIPS load word and store word instructions, which have the general
form.
1. lw $t1,offset_value($t2)
2. sw $t1,offset_value ($t2).
These instructions compute a memory address by adding the base register, which is
$t2, to the 16-bit signed off set field contained in the instruction.
If the instruction is a store, the value to be stored must also be read from the
register file where it resides in $t1.
If the instruction is a load, the value read from memory must be written into the
register file in the specified register, which is $t1.
6
To implement MIPS load and store instructions we need following four units:
1. Register file
2. ALU
3. Data Memory
4. Sign extension unit
1. Register file:
The register file contains all the registers and has two read ports and one write port.
The register file always outputs the contents of the registers corresponding to the
Read register inputs on the outputs
No other control inputs are needed.
Register write must be explicitly indicated by asserting the write control signal.
The register number to the register file is all 5 bits wide to specify one of 32 bits wide.
2. ALU:
ALU takes two 32 bit inputs and produces a 32 bit result as well as 1 bit signal if the
result is 0.
The operation to be performed by the ALU is controlled with the ALU operation
signal, which will be 4 bits wide.
The Zero detection output used to implement branches.
The memory unit is a state element with inputs for the address and the write data.
It produces a single output for the read result.
Data Memory unit has separate read and write control lines for read and write
operation.
Register file does not require read signal but memory unit needs a read signal
Because the register file, reading the value of an invalid address can cause problems.
The sign extension unit has a 16-bit data input and that sign-extended into a 32-bit result.
1. beq-branch equal
2. bnq-branch unequal
The beq instruction has three operands, in that
1. Two operands are registers that are compared for equality
2. One operand is a 16-bit off set used to compute the branch target address relative to
the branch instruction address.
beq $t1,$t2,offset
It is an address specified in a branch, which becomes the new program counter (PC)
if the branch is taken.
If the operands are equal, the branch target address becomes the new PC, and it
is called as branch is taken.
If the operands are not equal, the incremented PC should replace the current PC
and it is called as branch is not taken.
The branch data path will perform two kinds of operations:
1. Compute the branch target address
2. Compare the register contents.
To compute the branch target address, the branch data path includes a sign
extension unit
To perform the compare, need to use the register file
Adder circuit is used to compute the branch target and it is sum of the
incremented PC and sign extended lower 16 bits of the instruction shifted left 2
units.
Control logic is used to decide whether the incremented PC or branch target should
replace the PC, based on the Zero output of the ALU.
Shift left 2is simply a direction of the signals between input and output that adds
00two to the low-order end of the sign-extended off set field
Control logic is used to decide whether the incremented PC or branch target should
replace the PC, based on the Zero output of the ALU as shown in fig 3.3
By combining individual instruction class data path components can form a single data
path and add the control to complete the implementation.
Single data path will execute all instructions in one clock cycle. This means that no
data path resource can be used more than once per instruction
In single data path if any element needed more than once must be duplicated.
To share a data path element between two different instruction classes, need to
allow multiple connections to the input of an element.
To provide multiple connections we need to use multiplexor and control signal to
select among the multiple inputs.
For example consider two different instruction classes are
1. Arithmetic and logical instructions (or) R-type
2. Memory instructions
Difference between arithmetic and logical instructions and memory instructions
S.NO R-type instruction Memory instruction
1 It gets two operand s from It gets one operand from register and
register to perform LAU operation another operand from sign extended 16
bit offset field from the instruction to do
address calculation
2 ALU result has stored in the ALU result has stored in the load
destination register
For these two different kinds of instruction classes need to make single data path.
It can be obtained by using single register file, single ALU to handle both types of
instructions and multiplexers.
9
To create a data path with only a single register file and a single ALU, need to
provide two different sources for the second input of the ALU.
Because both instructions has first operand as register and second operand is
different.
Two instructions have two different formats to store result so need to support two
different sources for the data stored into the register file.
For that need two multiplexers, one multiplexor is placed at the ALU input and
another at the data input to the register file.
Fig 3.4 Data path for the memory and R-Type instructions
Combine the simple data path for the core MIPS architecture.
It can be obtained by adding the data path for instruction fetch and the data path
from R-type and memory instructions and the data path for branches that is
shown in the above figure 3.4
The branch instruction uses the main ALU for comparison of the register operands.
So need to use the adder circuit for the data path components of branch instruction
An additional multiplexor is required to select either the sequentially following
instruction address (PC + 4) or the branch target address to be written into the
PC.
To complete this simple data path, must add the control unit.
The control unit must be able to take inputs and generate a write signal for each state
element, the selector control for each multiplexor, and the ALU control.
The ALU control is different in a number of ways, and it will be useful to design it first
before design the rest of the control unit.
The Simple data path for the core MIPS architecture by combining elements required
by different instruction classes as shown in fig 3.5.
10
Fig 3.5 Simple data path for the core MIPS architecture by combining elements required
by different instruction classes
Any instruction set can be implemented in many different ways like single-cycle
implementation and multicycle implementation.
In a basic single-cycle implementation all operations take the same amount of time-a
single cycle.
A multicycle implementation allows faster operations to take less time than slower
ones, so overall performance can be increased.
Load word and store word instructions- ALU to compute the memory address by
addition.
R-type instructions- ALU needs to perform one of the five actions (AND,OR, subtract,
add, or set on less than), depending on the value of the 6‑bit funct (or function) field in
the low-order bits of the instruction.
Branch equal-ALU must perform a subtraction.
We can generate the 4-bit ALU control input using a small control unit.
It has input function field of the instruction and a 2-bit control field, called ALUOp.
ALUOp indicates three kinds of operations to be performed
1. add (00) for loads and stores,
2. subtract (01) for beq, or
3. determined by the operation encoded in the funct field (10).
The output of the ALU control unit is a 4-bit signal that directly controls the ALU by generating
one of the 4-bit combinations.
11
In following figure 3.6 shows how to set the ALU control inputs based on the 2‑bit ALUOp
control and the 6‑bit function code.
Fig 3.6 The ALU control inputs based on the 2‑bit ALUOp control and the 6‑bit function
code.
When the ALuOP is 00 or 01, the ALU action does not depend on the function code
field.
We do not care about the value of the function code and the function field is shown as
XXXXXX for 00 and 01 values.
When the ALUOP value is 10, then the function code is used to set the ALU control
input.
Multiple levels of decoding functions:
1. The main control unit generates the ALUOP bits.
2. ALUOP bit is used as a input to the ALU control.
3. That ALU control generates the actual signals to control the ALU unit.
Mapping 2‑bit ALU Op field and 6‑bit funct field
There are several different ways to implement the mapping. From the 2‑bit ALUOp
field and the 6‑bit funct field to the four ALU operation control bits.
There are 64 possible values are available for function field in that small values are
used more frequently.
The function field is used only when the ALUOP bits equal to 10.
A small piece of logic that recognizes the subset of possible values and causes the
correct setting of the ALU control bits.
To design logic first we have to create a truth table for the function code field and the
ALUOP bits.
Truth table: It is a representation of a logical operation by listing all the values of the inputs
and then in each case showing what the resulting outputs should be.
Don’t-care term: An element of a logical function in which the output does not depend on the
values of all the inputs. Don’t-care terms may be specified in different ways.
12
The truth table for the 4 ALU control bits (called Operation).
Opcode: The field that denotes the operation and format of an instruction.
(a) Instruction format for R-format instructions, which all have an opcode of 0. These
instructions have three register operands: rs, rt, and rd. Fields rs and rt are sources, and rd is
the destination. The ALU function is in the funct field and is decoded by the ALU control
design in the previous section. The R-type instructions that we implement are add, sub, AND,
OR, and slt. The shamt field is used only for shift
(b) Instruction format for load (opcode = 35ten) and store (opcode = 43ten) instructions.
The register rs is the base register that is added to the 16-bit address field to form the memory
address. For loads, rt is the destination register for the loaded value. For stores, rt is the
source register whose value should be stored into memory.
(c) Instruction format for branch equal (opcode =4). The registers rs and rt are the source
registers that are compared for equality. The 16-bit address field is sign-extended, shifted, and
added to the PC + 4 to compute the branch target address.
13
The op field, also called the opcode, is always contained in bits 31:26.
The two registers to be read are always specified by the rs and rt fields, at positions
25:21 and 20:16.
The base register for load and store instructions is always in bit positions 25:21 (rs).
The 16‑bit offset for branch equal, load, and store is always in positions 15:0.
The destination register is in one of two places. For a load it is in bit positions 20:16
(rt), For R-type it is in bit positions 15:11(rd).
So we need to add a multiplexor to select which field of the instruction is used to
indicate the register number to be written.
Using this information, we can add the instruction labels and extra multiplexor to the
simple datapath as shown in fig 3.7
Fig 3.7 The datapath with all necessary multiplexors and all control lines
identified.
This shows these additions plus the ALU control block, the write signals for state
elements, the read signal for the data memory, and the control signals for the
multiplexors. Since all the multiplexors have two inputs, they each require a single control
line.
14
15
Setting of the control lines depends only on the opcode and we have to define whether each
control signal should be 0,1 or don’t care (X) for each of the opcode values.
The below truth table shows how the control signals should be set for each opcode.
R-Format:
The first row of the table corresponds to the R-format instructions (add, sub, AND, OR,
and slt).
For all these instructions, the source register fields are rs and rt, and the destination
register field is rd; this defines how the signals ALUSrc and RegDst are set.
R-type instruction writes a register (Reg-Write = 1), but neither reads nor writes data
memory.
The ALUOp field for R-type instructions is set to 10 to indicate that the ALU control
should be generated from the funct field.
Branch instruction:
The branch instruction is similar to an R-format operation, since it sends the rs and rt
registers to the ALU.
The ALUOp field for branch is set for a subtract (ALU control = 01), which is used to
test for equality. Notice that the MemtoReg field is irrelevant when the RegWrite signal
is 0: since the register is not being written, the value of the data on the register data
write port is not used.
Thus, the entry MemtoReg in the last two rows of the table is replaced with X for don’t
care. Don’t cares can also be added to RegDst when RegWrite is 0.
3.3.3 Operation of the DataPath:
1. R-type instructions
2. Load and store instructions
3. Branch instructions
In R-type instruction consider add $t1,$t2,$t3 and remaining four operations (sub,
AND, OR,slt) occurs in one clock cycle as shown in fig 3.9.
17
18
The jump instructionlooks somewhat like a branch instruction but computes the target
PC differently and is not conditional.
Like a branch, the low-order 2 bits of a jump address are always 00two.
The next lower 26 bits of this 32-bit address come from the 26-bit immediate field in
the instruction.
The upper 4 bits of the address that should replace the PC come from the PC of the
jump instruction plus 4.
Thus, we can implement a jump by storing into the PC the concatenation of
1. The upper 4 bits of the current PC + 4
2. The 26-bit immediate field of the jump instruction
19
Fig 3.12 The simple control and datapath are extended to handle the jump instruction
Single cycle implementation is not used mostly because of the following reasons:
1. It is inefficient.
2. Clock cycle have same length for every instruction.
3. Overall performance is very poor because it has too long clock cycle.
20