0% found this document useful (0 votes)
36 views5 pages

CA Classes-116-120

The document discusses the execution of load and store instructions in computer architecture. It describes the subtasks involved in executing load and store instructions, as well as different approaches for processing load/store instructions sequentially or in parallel with other instructions.

Uploaded by

SrinivasaRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views5 pages

CA Classes-116-120

The document discusses the execution of load and store instructions in computer architecture. It describes the subtasks involved in executing load and store instructions, as well as different approaches for processing load/store instructions sequentially or in parallel with other instructions.

Uploaded by

SrinivasaRao
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Computer Architecture Unit 5

Let us first consider a load instruction. Its execution begins with the
determination of the effective memory address (EA) from where data is to
be fetched. In straightforward cases, like RISC processors, this can be done
in two steps: fetching the referenced address register(s) and calculating the
effective address. However, for CISC processors address calculation may
be a difficult task, requiring multiple subsequent register fetches and
address calculations, as for instance in the case of indexed, post-
incremented, relative addresses. Once the effective address is available,
the next step is usually, to forward the effective (virtual) address to the MMU
for translation and to access the data cache. Here, and in the subsequent
discussion, we shall not go into details of whether the referenced cache is
physically or virtually addressed, and thus we neglect the corresponding
issues. Furthermore, we assume that the referenced data is available in the
cache and thus it is fetched in one or a few cycles. Usually, fetched data is
made directly available to the requesting unit, such as the FX or FP unit,
through bypassing. Finally, the last subtask to be performed is writing the
accessed data into the specified register.
For a store instruction, the address calculation phase is identical to that
already discussed for loads. However, subsequently both the virtual address
and the data to be stored can be sent out in parallel to the MMU and the
cache, respectively. This concludes the processing of the store instruction.
Figure 5.8 shows the subtasks involved in executing load and store
instructions.

Figure 5.8: Subtasks of Executing Load and Store Instructions

Manipal University Jaipur B1648 Page No. 116


Computer Architecture Unit 5

5.5.2 The design space


While considering the design space of pipelined load/store processing we
take into account only one aspect, namely whether load/store operations
are executed sequentially or in parallel with FX instructions (Figure 5.9).
In traditional pipeline implementations, load and store instructions are
processed by the master pipeline. Thus, loads and stores are executed
sequentially with other instructions (Figure 5.9).

Figure 5.9: Sequential vs. Parallel Execution of Load/Store Instructions

In this case, the required address calculation of a load/store instruction can


be performed by the adder of the execution stage. However, one instruction
slot is needed for each load or store instruction.

Manipal University Jaipur B1648 Page No. 117


Computer Architecture Unit 5

A more effective technique for load/store instruction processing is to do it in


parallel with data manipulations (see again Figure 5.9). Obviously, this
approach assumes the existence of an autonomous load/store unit which
can perform address calculations on its own.
Let’s discuss both these techniques in detail.
5.5.3 Sequential consistency of instruction execution
By operating the processors with multiple EUs (Execution Units) in parallel,
the instructions execution can be finished very fast. However, all the
instructions execution should maintain sequential consistency. The
sequential consistency follows two aspects:
1. Processor Consistency - the order of instructions execution ();
2. Memory Consistency - the order of accessing the memory ().
Processor consistency: The phrase Processor Consistency is applied to
suggest the consistency of instruction completion with sequential instruction
execution. There are two types of processor consistency reflected by
Superscalar processors; namely weak or strong consistency.
Weak processor consistency specifies that all the instructions must be
executed justly; with the condition of no violation of data dependencies.
Data dependencies must be observed and settled during the execution.
Strong processor consistency forces the instructions to follow program order
for the execution. This can be attained through ROB (reorder buffer).ROB is
a storage area from where all data is read and written.
Memory consistency: One another face of superscalar instruction
execution is whether memory access is executed in the same order as in a
sequential processor.
Memory consistency is weak if with strict sequential program execution, the
memory access is out-of-order. Moreover, data dependencies should not be
dishonoured. Simply, it can be stated that weak consistency permits load
and store reordering and being very particular about memory data
dependencies, to be found and settled.
Memory consistency is strong, if memory access occurs strictly in program
order and load/store reordering is prohibited.

Manipal University Jaipur B1648 Page No. 118


Computer Architecture Unit 5

Load and Store reordering


Load and store instructions affect both the processor and the memory.
Firstly ALU or address unit computes the addresses and then the load and
store instructions get executed.
Now, the loads can fetch the data cache from the memory data. Once the
generated address is received, a store instruction can send the operands.
Processor affirming weak memory consistency permits memory access
reordering. This point can be considered as advantageous because of the
following three reasons:
1. Permitting load/store bypassing,
2. Making speculative loads or stores feasible
3. Allowing hiding of cache misses.
Load/Store bypassing
Load/Store bypassing means that any of the two can bypass each other.
This means either stores can bypass loads or vice versa, without violating
the memory data dependencies. The bypassing of loads to stores provides
the advantage of runtime overlapping of loops.
This is accomplished by permitting loads at the origin of iteration to access
memory without having to hold till stores at the end of the former iteration
are finished. In order to prevent fetching a false data value, a load can
bypass pending stores if none of the previous stores have the same target
address as the load. Nevertheless, certain addresses of pending stores may
not be available.
Speculative loads
Speculative loads avoid memory access delay. This delay can be caused
due to the non- computation of required addresses or clashes among the
addresses. The speculative loads should be checked for correctness. If
required then respective measures should be taken to done for it.
Speculative loads are alike speculative branches.
To check the address, write the loads and stores computed target address
into ROB (ReOrder buffer). The address comparison is carried out at ROB.
Reorder buffer (ROB)
ROB came in 1988 for the solution of precise interrupt problem. Currently,
ROB is an assurance tool for sequential consistency execution where
multiple EUs operate in parallel.
Manipal University Jaipur B1648 Page No. 119
Computer Architecture Unit 5

ROB is a circular buffer. It has a head and tail pointers. In ROB, instructions
enter in program order only. Instructions can only be retired if all of their
previous instructions have finished and they had also retired.
Sequential consistency can be maintained by directing instructions to
update the program state by writing their results in proper program order
into the memory or referenced architectural register(s). ROB can
successfully support both interrupt handling and speculative execution.
5.5.4 Instruction Issuing and parallel execution
In this phase execution tuples are created. After its creation it is decided
that which execution tuple can now be issued. When the accessibility of
data and resources are checked during run-time it is then known as
Instruction Issuing. In instruction issuing area many pipelines are
processed.
In figure 5.10 you can see a reorder buffer which follows FIFO order.

Figure 5.10: A Reorder Buffer.

In this buffer the entries received and sent in FIFO order. When the input
operands are present then the instruction can be executed. Other instruction
might be located in instruction issue.
Other constraints are associated with the buffers carrying the execution
tuples. In figure 5.11 you can see the Parallel Execution Schedule (PES) of

Manipal University Jaipur B1648 Page No. 120

You might also like