Addis Ababa science & Technology University
College of Electrical & Mechanical Engineering
Course Name :Computer Architecture & Organization
Instructor Name : Tayachew Fikire
Mail Address : tayachew.fikire@aastu.edu.et
Major References : William Stallings: Computer Organization and Architecture
Course contents
• Basic Concepts & computer Evolution
• The Central processing Unit(CPU)
• Memory Systems
• Input/output systems
• Advanced topics (parallel processing , introduction to operating
systems)
Prepared by: Tayachew Fikire Computer architecture & organization 2
Course objective
The objective of the course is to enable the student to:
• Understand fundamentals of computer organization and architecture
along with design and simulation of a basic computer system.
Prepared by: Tayachew Fikire Computer architecture & organization 3
Learning outcome
At the end of this course students will be able to:
• Distinguish between computer architecture and organization
• Know the different processor architecture ,instruction sets, addressing
modes
• Memory organization and structures in a computer system
• Interfacing techniques of I/O devices with processors
Prepared by: Tayachew Fikire Computer architecture & organization 4
Chapter1 :Basic Concepts & computer evolution
Lesson Objective:
• The objective of this lesson is to introduce students the basic concepts
of computer architecture & organization along with computer evolution
• Topics to be covered
• Basic Introduction
• Computer evolution
Prepared by: Tayachew Fikire Computer architecture & organization 5
Chapter1 :Basic Concepts & Computer evolution
Learning outcomes:
At the end of this chapter students will be able to:
• Distinguish computer organization and architecture.
• Understand computer structure & function
• Know about the historical background of computer systems
• Understand performance issues of computers
Prepared by: Tayachew Fikire Computer architecture & organization 6
Part 1 : Basic Concepts
Prepared by: Tayachew Fikire Computer architecture & organization 7
Computer architecture & organization
• Attributes of a system • Instruction set, number of
visible to the bits used to represent
programmer various data types, I/O
• Have a direct impact on mechanisms, techniques
Start
the logical execution of a for addressing memory
program
Architectural
Computer
attributes
Architecture
include:
Model categories
Organizational
Computer
attributes
Organization
include:
• Hardware details • The operational units and
transparent to the their interconnections
programmer, control that realize the
signals, interfaces architectural
between the computer specifications
and peripherals, memory
technology used
Prepared by: Tayachew Fikire Computer architecture & organization 8
Structure & function
Hierarchical system
Start Structure
Set of interrelated
subsystems The way in which
components relate to each
Hierarchical nature of complex other
systems is essential to both
their design and their Function
Model categories
description The operation of individual
components as part of the
Designer need only deal with structure
a particular level of the system
at a time
Concerned with structure
and function at each level
Prepared by: Tayachew Fikire Computer architecture & organization 9
Function
There are four basic functions that a computer can perform:
Data processing
Start
Data may take a wide variety of forms and the range of processing requirements is
broad
Data storage
Short-term
Long-term
Model categories
Data movement
Input-output (I/O) - when data are received from or delivered to a device (peripheral)
that is directly connected to the computer
Data communications – when data are moved over longer distances, to or from a
remote device
Control
A control unit manages the computer’s resources and orchestrates the performance of
its functional parts in response to instructions
Prepared by: Tayachew Fikire Computer architecture & organization 10
Structure-Top level
Start
Model categories
Prepared by: Tayachew Fikire Computer architecture & organization 11
Structure-CPU
Start
Model categories
Prepared by: Tayachew Fikire Computer architecture & organization 12
Structure-Control Unit
Start
Model categories
Prepared by: Tayachew Fikire Computer architecture & organization 13
Structural components
There are four CPU – controls the operation of the
main structural computer and performs its data
components processing functions
of the computer:
Main Memory – stores data
I/O – moves data between the
computer and its external environment
System Interconnection – some
mechanism that provides for
communication among CPU, main
memory, and I/O
Prepared by: Tayachew Fikire Computer architecture & organization 14
Structural components
CPU Control Unit
Major structural Controls the operation of the CPU
and hence the computer
components:
Arithmetic and Logic Unit (ALU)
Performs the computer’s data
processing function
Registers
Provide storage internal to the CPU
CPU Interconnection
Some mechanism that provides for
communication among the control
unit, ALU, and registers
Prepared by: Tayachew Fikire Computer architecture & organization 15
Multicore Computer Structure
Central processing unit (CPU)
Portion of the computer that fetches and executes instructions
Consists of an ALU, a control unit, and registers
Referred to as a processor in a system with a single processing unit
Core
An individual processing unit on a processor chip
May be equivalent in functionality to a CPU on a single-CPU system
Specialized processing units are also referred to as cores
Processor
A physical piece of silicon containing one or more cores
Is the computer component that interprets and executes instructions
Referred to as a multicore processor if it contains multiple cores
Prepared by: Tayachew Fikire Computer architecture & organization 16
Cache Memory
Multiple layers of memory between the processor and main memory
Is smaller and faster than main memory
Used to speed up memory access by placing in the cache data from main
memory that is likely to be used in the near future
A greater performance improvement may be obtained by using multiple levels of
cache, with level 1 (L1) closest to the core and additional levels (L2, L3, etc.)
progressively farther from the core
Prepared by: Tayachew Fikire Computer architecture & organization 17
Elements of Multi-core computer
Prepared by: Tayachew Fikire Computer architecture & organization 18
Part 2: History & Evolution of Computers
Prepared by: Tayachew Fikire Computer architecture & organization 19
History of Computers
First Generation: Vacuum Tubes
Vacuum tubes were used for digital logic elements and memory
ENIAC-Electronic Numerical Integrator And Computer
Decimal (not binary
20 accumulators of 10 digits
Programmed manually by switches
18,000 vacuum tubes
15,000 square feet
140 kW power consumption
5,000 additions per second
Prepared by: Tayachew Fikire Computer architecture & organization 20
History of Computers
First Generation: Vacuum Tubes
Von Neumann-Machine(IAS computer )
Stored Program concept
Main memory storing programs and data
ALU operating on binary data
Control unit interpreting instructions from memory and executing
Input and output equipment operated by control unit
Princeton Institute for Advanced Studies (IAS) computer
Completed 1952
Prototype of all subsequent general-purpose computers
Prepared by: Tayachew Fikire Computer architecture & organization 21
History of Computers
Von Neumann-Machine
Prepared by: Tayachew Fikire Computer architecture & organization 22
History of Computers/IAS –Computer
IAS computer-Details
1000 x 40 bit words, each word representing
One 40-bit binary number
Two 20-bit instructions(i.e. 8 bits opcode &12 bits address)
Set of registers (Storages in CPU)
Memory Buffer Register
Memory Address Register
Instruction Register
Instruction Buffer Register
Program Counter
Accumulator
Multiplier Quotient
Prepared by: Tayachew Fikire Computer architecture & organization 23
History of Computers/IAS –Computer-Registers
Memory buffer register • Contains a word to be stored in memory or sent to the I/O unit
(MBR) • Or is used to receive a word from memory or from the I/O unit
Memory address • Specifies the address in memory of the word to be written from
register (MAR) or read into the MBR
Instruction register (IR) • Contains the 8-bit opcode instruction being executed
Instruction buffer • Employed to temporarily hold the right-hand instruction from a
register (IBR) word in memory
• Contains the address of the next instruction pair to be fetched
Program counter (PC) from memory
Accumulator (AC) and • Employed to temporarily hold operands and results of ALU
multiplier quotient (MQ) operations
Prepared by: Tayachew Fikire Computer architecture & organization 24
History of Computers/IAS –Computer
IAS –Memory formats
Prepared by: Tayachew Fikire Computer architecture & organization 25
History of Computers/IAS –Computer
Prepared by: Tayachew Fikire Computer architecture & organization 26
History of Computers/IAS –Computer
Prepared by: Tayachew Fikire Computer architecture & organization 27
History of Computers/IAS –Computer
IAS- the FETCH –EXECUTE CYCLE
FETCH: load the binary code of the instr. from Memory (or IBR)
Opcode goes into IR
Address goes into MAR
EXECUTE : send appropriate control signals to do what the instr. needs to do
Prepared by: Tayachew Fikire Computer architecture & organization 28
IAS –Instruction set
Prepared by: Tayachew Fikire Computer architecture & organization 29
IAS –Instruction set
Prepared by: Tayachew Fikire Computer architecture & organization 30
History of Computers
Second Generation: Transistors
Smaller
Cheaper
Dissipates less heat than a vacuum tube
Is a solid state device made from silicon
Was invented at Bell Labs in 1947
It was not until the late 1950’s that fully transistorized
computers were commercially available
Prepared by: Tayachew Fikire Computer architecture & organization 31
Computer generations
Approximate Typical Speed
Generation Dates Technology (operations per second)
1 1946–1957 Vacuum tube 40,000
2 1957–1964 Transistor 200,000
3 1965–1971 Small and medium scale 1,000,000
integration
4 1972–1977 Large scale integration 10,000,000
5 1978–1991 Very large scale integration 100,000,000
6 1991- Ultra large scale integration >1,000,000,000
Prepared by: Tayachew Fikire Computer architecture & organization 32
Second generations Computers
Introduced:
More complex arithmetic and logic units and control units
The use of high-level programming languages
Provision of system software which provided the ability to:
Load programs
Move data to peripherals
Libraries perform common computations
Prepared by: Tayachew Fikire Computer architecture & organization 33
History of Computers
Third Generation: Integrated Circuits
1958 – the invention of the integrated circuit
Discrete component
Single, self-contained transistor
Manufactured separately, packaged in their own containers, and
soldered or wired together onto masonite-like circuit boards
Manufacturing process was expensive and cumbersome
The two most important members of the third generation were
the IBM System/360 and the DEC PDP-8
Prepared by: Tayachew Fikire Computer architecture & organization 34
Third generation Computers
A computer consists of gates,
Integrated memory cells, and
Circuits interconnections among these
elements
Data storage – provided by The gates and memory cells
memory cells are constructed of simple
digital electronic components
Data processing – provided by
gates
Exploits the fact that such
Data movement – the paths components as transistors,
among components are used resistors, and conductors can be
to move data from memory to fabricated from a
memory and from memory semiconductor such as silicon
through gates to memory
Many transistors can be
Control – the paths among produced at the same time on a
components can carry control single wafer of silicon
signals
Transistors can be connected
with a processor metallization to
form circuits
35
Third generations Computers –Elements
key concepts in an integrated circuit
A thin wafer of silicon is divided into a matrix of
small areas, each a few millimeters square.
The identical circuit pattern is fabricated in each
area, and the wafer is broken up into chips.
Each chip consists of many gates and/or memory
cells plus a number of input and output attachment
points.
This chip is then packaged in housing that protects
it and provides pins for attachment to devices
beyond the chip.
A number of these packages can then be
interconnected on a printed circuit board to
produce larger and more complex circuits.
Prepared by: Tayachew Fikire Computer architecture & organization 36
Moore’s Law
1965; Gordon Moore – co-founder of Intel
Observed number of transistors that could be
put on a single chip was doubling every year
Consequences of Moore’s law:
The pace slowed to a
doubling every 18
months in the 1970’s The cost of Computer
but has sustained that The electrical
computer logic becomes smaller Reduction in
rate ever since path length is
and memory and is more power and Fewer interchip
shortened,
circuitry has convenient to use cooling connections
increasing in a variety of
fallen at a requirements
operating speed environments
dramatic rate
Prepared by: Tayachew Fikire Computer architecture & organization 37
Moore’s Law
Growth in transistor count in integrated circuits(DRAM memory )
Prepared by: Tayachew Fikire Computer architecture & organization 38
Moore’s Law
Growth in CPU transistor count
Prepared by: Tayachew Fikire Computer architecture & organization 39
History of Computers: Later Generations
LSI
Large
Scale
Later Generations Integration
VLSI
Very Large
Scale
Integration
ULSI
Semiconductor Memory Ultra Large
Scale
Microprocessors Integration
Prepared by: Tayachew Fikire Computer architecture & organization 40
Generation of Computers
Vacuum tube - 1946-1957
Transistor - 1958-1964
Small scale integration - 1965 on
Up to 100 devices on a chip
Medium scale integration - to 1971
100-3,000 devices on a chip
Large scale integration - 1971-1977
3,000 - 100,000 devices on a chip
Very Large scale integration - to 1978-1991
—100,000 - 100,000,000 devices on a chip
Ultra Large scale integration – 1991
Over 100.000,000 devices on a chip
Prepared by: Tayachew Fikire Computer architecture & organization 41
Semiconductor Memory
In 1970 Fairchild produced the first relatively capacious semiconductor memory
Chip was about the size Could hold 256 bits of
Non-destructive Much faster than core
of a single core memory
In 1974 the price per bit of semiconductor memory dropped below the price per bit of core memory
There has been a continuing and rapid decline in Developments in memory and processor
memory cost accompanied by a corresponding technologies changed the nature of computers in
increase in physical memory density less than a decade
Since 1970 semiconductor memory has been through 13 generations
Each generation has provided four times the storage density of the previous generation, accompanied
by declining cost per bit and declining access time
Prepared by: Tayachew Fikire Computer architecture & organization 42
Microprocessors
The density of elements on processor chips continued to rise
More and more elements were placed on each chip so that fewer and fewer
chips were needed to construct a single computer processor
1971 Intel developed 4004
First chip to contain all of the components of a CPU on a single chip
Birth of microprocessor
1972 Intel developed 8008
First 8-bit microprocessor
1974 Intel developed 8080
First general purpose microprocessor
Faster, has a richer instruction set, has a large addressing capability
Prepared by: Tayachew Fikire Computer architecture & organization 43
Evolution of Intel Microprocessors
4004 8008 8080 8086 8088
Introduced 1971 1972 1974 1978 1979
5 MHz, 8 MHz, 10
Clock speeds 108 kHz 108 kHz 2 MHz 5 MHz, 8 MHz
MHz
Bus width 4 bits 8 bits 8 bits 16 bits 8 bits
Number of
2,300 3,500 6,000 29,000 29,000
transistors
Feature size
10 8 6 3 6
(µm)
Addressable 640 Bytes 16 KB 64 KB 1 MB 1 MB
memory
1970s Intel processors
Prepared by: Tayachew Fikire Computer architecture & organization 44
Evolution of Intel Microprocessors
80286 386TM DX 386TM SX 486TM DX
CPU
Introduced 1982 1985 1988 1989
Clock speeds 6 MHz - 12.5 16 MHz - 33 16 MHz - 33 25 MHz - 50
MHz MHz MHz MHz
Bus width 16 bits 32 bits 16 bits 32 bits
Number of transistors
134,000 275,000 275,000 1.2 million
Feature size (µm) 1.5 1 1 0.8 - 1
Addressable
16 MB 4 GB 16 MB 4 GB
memory
Virtual
1 GB 64 TB 64 TB 64 TB
memory
Cache — — — 8 kB
1980s Intel processors
Prepared by: Tayachew Fikire Computer architecture & organization 45
Evolution of Intel Microprocessors
486TM SX Pentium Pentium Pro Pentium II
Introduced 1991 1993 1995 1997
Clock speeds 16 MHz - 33 60 MHz - 166 150 MHz - 200 200 MHz - 300
MHz MHz, MHz MHz
Bus width 32 bits 32 bits 64 bits 64 bits
Number of 1.185 million 3.1 million 5.5 million 7.5 million
transistors
Feature size (µm) 1 0.8 0.6 0.35
Addressable
4 GB 4 GB 64 GB 64 GB
memory
Virtual memory 64 TB 64 TB 64 TB 64 TB
512 kB L1 and 1
Cache 8 kB 8 kB 512 kB L2
MB L2
1990s Intel processors
Prepared by: Tayachew Fikire Computer architecture & organization 46
Evolution of Intel Microprocessors
Core 2 Duo Core i7 EE
Pentium III Pentium 4
4960X
Introduced 1999 2000 2006 2013
Clock speeds 450 - 660 MHz 1.3 - 1.8 GHz 1.06 - 1.2 GHz 4 GHz
Bus
wid 64 bits 64 bits 64 bits 64 bits
th
Number of 9.5 million 42 million 167 million 1.86 billion
transistors
Feature size (nm) 250 180 65 22
Addressable
64 GB 64 GB 64 GB 64 GB
memory
Virtual memory 64 TB 64 TB 64 TB 64 TB
Cache 512 kB L2 256 kB L2 2 MB L2 1.5 MB L2/15
MB L3
Number of cores 1 1 2 6
Recent Intel processors
Prepared by: Tayachew Fikire Computer architecture & organization 47
Evolution of Intel Microprocessors
Core 2 Duo Core i7 EE
Pentium III Pentium 4
4960X
Introduced 1999 2000 2006 2013
Clock speeds 450 - 660 MHz 1.3 - 1.8 GHz 1.06 - 1.2 GHz 4 GHz
Bus
wid 64 bits 64 bits 64 bits 64 bits
th
Number of 9.5 million 42 million 167 million 1.86 billion
transistors
Feature size (nm) 250 180 65 22
Addressable
64 GB 64 GB 64 GB 64 GB
memory
Virtual memory 64 TB 64 TB 64 TB 64 TB
Cache 512 kB L2 256 kB L2 2 MB L2 1.5 MB L2/15
MB L3
Number of cores 1 1 2 6
Recent Intel processors
Prepared by: Tayachew Fikire Computer architecture & organization 48
The Evolution of the Intel x86
Architecture
Two processor families are the Intel x86 and the ARM architectures
Current x86 offerings represent the results of decades of design effort on
complex instruction set computers (CISCs)
An alternative approach to processor design is the reduced instruction set
computer (RISC)
ARM architecture is used in a wide variety of embedded systems and is one of
the most powerful and best-designed RISC-based systems on the market
Prepared by: Tayachew Fikire Computer architecture & organization 49
Highlights of the Evolution of the
Intel Product Line:
8080 8086 80286 80386 80486
• World’s first • A more • Extension of the • Intel’s first 32- • Introduced the
general- powerful 16-bit 8086 enabling bit machine use of much
purpose machine addressing a • First Intel more
microprocessor • Has an 16-MB memory processor to sophisticated
• 8-bit machine, instruction instead of just support and powerful
8-bit data path cache, or 1MB multitasking cache
to memory queue, that technology and
• Was used in the prefetches a sophisticated
first personal few instructions instruction
computer before they are pipelining
(Altair) executed • Also offered a
• The first built-in math
appearance of coprocessor
the x86
architecture
• The 8088 was a
variant of this
processor and
used in IBM’s
first personal
computer
(securing the
success of Intel
Prepared by: Tayachew Fikire Computer architecture & organization 50
ARM
Refers to a processor architecture that has evolved from RISC design
principles and is used in embedded systems
Family of RISC-based microprocessors and microcontrollers designed
by ARM Holdings, Cambridge, England
Chips are high-speed processors that are known for their small die size
and low power requirements
Probably the most widely used embedded processor architecture and
indeed the most widely used processor architecture of any kind in the
world
Acorn RISC Machine/Advanced RISC Machine
Prepared by: Tayachew Fikire Computer architecture & organization 51
Highlights of the Evolution of the
Intel Product Line:
Pentium
• Intel introduced the use of superscalar techniques, which allow multiple instructions to execute in parallel
Pentium Pro
• Continued the move into superscalar organization with aggressive use of register renaming, branch
prediction, data flow analysis, and speculative execution
Pentium II
• Incorporated Intel MMX technology, which is designed specifically to process video, audio, and graphics
data efficiently
Pentium III
•Incorporated additional floating-point instructions
•Streaming SIMD Extensions (SSE)
Pentium 4
• Includes additional floating-point and other enhancements for multimedia
Core
• First Intel x86 micro-core
Core 2
• Extends the Core architecture to 64 bits
• Core 2 Quad provides four cores on a single chip
• More recent Core offerings have up to 10 cores per chip
• An important addition to the architecture was the Advanced Vector Extensions instruction set
Prepared by: Tayachew Fikire Computer architecture & organization 52
Embedded Systems
The use of electronics and software within a product
Billions of computer systems are produced each year that are
embedded within larger devices
Today many devices that use electric power have an embedded
computing system
Often embedded systems are tightly coupled to their
environment
This can give rise to real-time constraints imposed by the need to
interact with the environment
Constraints such as required speeds of motion, required precision
of measurement, and required time durations, dictate the timing of
software operations
If multiple activities must be managed simultaneously this imposes
more complex real-time constraints
Prepared by: Tayachew Fikire Computer architecture & organization 53
Embedded Systems
Possible organization of Embedded systems
Prepared by: Tayachew Fikire Computer architecture & organization 54
Deeply Embedded Systems
Subset of embedded systems
Has a processor whose behavior is difficult to observe both by the programmer and the
user
Uses a microcontroller rather than a microprocessor
Is not programmable once the program logic for the device has been burned into ROM
Has no interaction with a user
Dedicated, single-purpose devices that detect something in the environment, perform
a basic level of processing, and then do something with the results
Often have wireless capability and appear in networked configurations, such as
networks of sensors deployed over a large area
Typically have extreme resource constraints in terms of memory, processor size, time,
and power consumption
Prepared by: Tayachew Fikire Computer architecture & organization 55
Application Processors
versus
Embedded Operating Systems Dedicated Processors
There are two general Application processors
approaches to developing an Defined by the processor’s ability
to execute complex operating
embedded operating system systems
(OS): General-purpose in nature
Take an existing OS and An example is the smartphone –
the embedded system is designed
adapt it for the embedded to support numerous apps and
application perform a wide variety of functions
Design and implement an Dedicated processor
OS intended solely for Is dedicated to one or a small
embedded use number of specific tasks required
by the host device
Because such an embedded system
is dedicated to a specific task or
tasks, the processor and associated
components can be engineered to
reduce size and cost
Prepared by: Tayachew Fikire Computer architecture & organization 56
The Internet of Things (IoT)
Term that refers to the expanding interconnection of smart devices, ranging from
appliances to tiny sensors
Is primarily driven by deeply embedded devices
Generations of deployment culminating in the IoT:
Information technology (IT)
PCs, servers, routers, firewalls, and so on, bought as IT devices by enterprise IT people and
primarily using wired connectivity
Operational technology (OT)
Machines/appliances with embedded IT built by non-IT companies, such as medical machinery,
SCADA, process control, and kiosks, bought as appliances by enterprise OT people and primarily
using wired connectivity
Personal technology
Smartphones, tablets, and eBook readers bought as IT devices by consumers exclusively using
wireless connectivity and often multiple forms of wireless connectivity
Sensor/actuator technology
Single-purpose devices bought by consumers, IT, and OT people exclusively using wireless
connectivity, generally of a single form, as part of larger systems
It is the fourth generation that is usually thought of as the IoT and it is marked by the use of
billions of embedded devices
Prepared by: Tayachew Fikire Computer architecture & organization 57
Cloud Computing
NIST defines cloud computing as:
“A model for enabling ubiquitous, convenient,
on-demand network access to a shared pool of
configurable computing resources that can be
rapidly provisioned and released with minimal
management effort or service provider interaction.”
You get economies of scale, professional network
management, and professional security management
The individual or company only needs to pay for the storage
capacity and services they need
Cloud provider takes care of security
Prepared by: Tayachew Fikire Computer architecture & organization 58
Cloud Computing
Refers to the networks and network management functionality that must be in place to
enable cloud computing
One example is the provisioning of high-performance and/or high-reliability networking
between the provider and subscriber
The collection of network capabilities required to access a cloud, including making use of
specialized services over the Internet, linking enterprise data center to a cloud, and using
firewalls and other network security devices at critical points to enforce access security
policies
Cloud Storage
Subset of cloud computing
Consists of database storage and database applications hosted remotely on cloud servers
Enables small businesses and individual users to take advantage of data storage that scales
with their needs and to take advantage of a variety of database applications without having
to buy, maintain, and manage the storage assets
Prepared by: Tayachew Fikire Computer architecture & organization 59
Cloud Computing
Prepared by: Tayachew Fikire Computer architecture & organization 60
Part 3 : Computer Performance Issues
Prepared by: Tayachew Fikire Computer architecture & organization 61
Designing for Performance
The cost of computer systems continues to drop dramatically, while the performance and capacity of
those systems continue to rise equally dramatically
Today’s laptops have the computing power of an IBM mainframe from 10 or 15 years ago
Processors are so inexpensive that we now have microprocessors we throw away
Desktop applications that require the great power of today’s microprocessor-based systems include:
Image processing
Three-dimensional rendering
Speech recognition
Videoconferencing
Multimedia authoring
Voice and video annotation of files
Simulation modeling
Businesses are relying on increasingly powerful servers to handle transaction and database
processing and to support massive client/server networks that have replaced the huge mainframe
computer centers of yesteryear
Cloud service providers use massive high-performance banks of servers to satisfy high-volume,
high-transaction-rate applications for a broad spectrum of clients
Prepared by: Tayachew Fikire Computer architecture & organization 62
Microprocessor Speed
Techniques built into contemporary processors include:
• Processor moves data or instructions into a
Pipelining conceptual pipe with all stages of the pipe
processing simultaneously
• Processor looks ahead in the instruction code
fetched from memory and predicts which
branches, or groups of instructions, are likely to be
Branch prediction processed next
• This is the ability to issue more than one
Superscalar execution instruction in every processor clock cycle.
(In effect, multiple parallel pipelines are
used.)
• Processor analyzes which instructions are
Data flow analysis dependent on each other’s results, or data, to
create an optimized schedule of instructions
• Using branch prediction and data flow analysis,
some processors speculatively execute instructions
Speculative execution ahead of their actual appearance in the program
execution, holding the results in temporary
locations, keeping execution engines as busy as
possible
Prepared by: Tayachew Fikire 63
Performance Balance
Increase the number
Adjust the organization and of bits that are
retrieved at one time
architecture to compensate by making DRAMs
“wider” rather than
for the mismatch among the “deeper” and by
using wide bus data
capabilities of the various paths
components
Reduce the frequency
Architectural examples of memory access by
incorporating
include: increasingly complex
and efficient cache
structures between
the processor and
main memory
Change the DRAM Increase the
interface to make it interconnect
more efficient by bandwidth between
processors and
including a cache or memory by using
other buffering higher speed buses
scheme on the DRAM and a hierarchy of
chip buses to buffer and
structure data flow
Prepared by: Tayachew Fikire Computer architecture & organization 64
Improvements in Chip Organization
and Architecture
Increase hardware speed of processor
Fundamentally due to shrinking logic gate size
More gates, packed more tightly, increasing clock rate
Propagation time for signals reduced
Increase size and speed of caches
Dedicating part of processor chip
Cache access times drop significantly
Change processor organization and architecture
Increase effective speed of instruction execution
Parallelism
Prepared by: Tayachew Fikire Computer architecture & organization 65
Problems with Clock Speed and Login
Density
Power
Power density increases with density of logic and clock speed
Dissipating heat
RC delay
Speed at which electrons flow limited by resistance and capacitance of metal
wires connecting them
Delay increases as the RC product increases
As components on the chip decrease in size, the wire interconnects become
thinner, increasing resistance
Also, the wires are closer together, increasing capacitance
Memory latency
Memory speeds lag processor speeds
Prepared by: Tayachew Fikire Computer architecture & organization 66
The use of multiple
processors on the same chip
provides the potential to
Multicore increase performance
without increasing the clock
rate
Strategy is to use two simpler
processors on the chip rather
than one more complex
processor
With two processors larger
caches are justified
As caches became larger it
made performance sense to
create two and then three
levels of cache on a chip
Prepared by: Tayachew Fikire Computer architecture & organization 67