CS6461 - Computer Architecture Fall 2016 Instructor Morris Lancaster

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 28

CS6461 Computer Architecture

Fall 2016
Instructor Morris Lancaster
Adapted from Professor Stephen Kaislers Slides
.
Lecture 2 - Basic System Design
Hierarchical System Architecture

10/7/2017 CS6461 Computer Architecture - 2014 2


Dept. of Computer Science
Technology Trends

Processor
logic capacity: 2 x increase in performance every 1.5 - 2 years;
clock rate: about 25% per year
overall performance: 1000 x in last decade
Main Memory
DRAM capacity: 2 x every 2 years; 1000 x size in last decade
memory speed: about 10% per year
cost / bit: improves about 25% per year
Disk
capacity: > 2 x increase in capacity every 1.5 years
cost / bit: improves about 60% per year
120 x capacity in last decade
Disk architecture not much different than IBMs 10 MByte disks of the
early 1980s
Network Bandwidth
Bandwidth: 1 Gbit/s standard to the desktop in many places
Bandwidth: Probably 1 Tbit/s b end of decade, but may require new
infrastructure

10/7/2017 CS6461 Computer Architecture - 2014 3


Dept. of Computer Science
Intel Processor Evolution

10/7/2017 CS6461 Computer Architecture - 2014 4


Dept. of Computer Science
Processor Clock Speed

10/7/2017 CS6461 Computer Architecture - 2014 5


Dept. of Computer Science
Cost Per GFLOP

10/7/2017 CS6461 Computer Architecture - 2014 6


Dept. of Computer Science
# Servers Comprising WWW

10/7/2017 CS6461 Computer Architecture - 2014 7


Dept. of Computer Science
Technology Progress

Growth factors: 45%


40%
35%
Transistors/Chips
Transistors/chip: 30%
since 1971
>100,000 since 1971 25%
Disk Density since 1956
Disk density: 20%
>100,000,000 since 1956 15% Disk Speed since 1956
Disk speed: 10%
12.5 since 1956 5%
0%
Compound Annual Growth Rate
The disk speed barrier
dominates everything!

10/7/2017 CS6461 Computer Architecture - 2014 8


Dept. of Computer Science
The 1,000,000:1 disk-speed barrier

RAM access times ~5-7.5 nanoseconds


CPU clock speed <1 nanosecond
Interprocessor communication can be ~1,000X slower than
on-chip

Disk seek times ~2.5-3 milliseconds


Limit = rotation
i.e., 1/30,000 minutes
i.e., 1/500 seconds = 2 ms

Tiering brings it closer to ~1,000:1 in practice, but even so


the difference is VERY BIG

10/7/2017 CS6461 Computer Architecture - 2014 9


Dept. of Computer Science
State of the Art

State-of-the-art PC (on your desk) now:


Processor clock speed: ~4 GigaHertz
Memory capacity: 2 to 8 GigaBytes (Windows 7 limits to 8
GBytes; Windows 8 limits to 128 GBytes on x64 )
Disk capacity: 1 TByte for <$79; 2 TBytes for <$129
Wow!!
In five years, we will need new units!
Mega -> Giga -> Tera -> Peta -> Exa (Big Data!)

10/7/2017 CS6461 Computer Architecture - 2014 10


Dept. of Computer Science
Intel 4004 Die Photo

(2250 transistors, 12 mm2, 108 KHz, 1970)

10/7/2017 CS6461 Computer Architecture - 2014 11


Dept. of Computer Science
Intel 80486 Die Photo

(1,200,000 transistors, 81 mm2, 25 MHz, 1989)

10/7/2017 CS6461 Computer Architecture - 2014 12


Dept. of Computer Science
Pentium Die Photo

(3,100,000 transistors; 296 mm2; 60 MHz, 1993)

10/7/2017 CS6461 Computer Architecture - 2014 13


Dept. of Computer Science
I/O System Side

Each bus and adapter has its own specifications.


Interfaces are where the problems are - between functional units and
between the computer and the outside world
Need to design against constraints of performance, power, area and cost

10/7/2017 CS6461 Computer Architecture - 2014 14


Dept. of Computer Science
Issues

Performance:
the key to computing for most intensive problems
whats the secret? TIME, TIME, TIME
analogy to Real Estate: Location, Location, Location
Response Time:
How long does it take for my job/program to run?
How long does it take to execute my job/program?
[NOTE: These are not equivalent. Why not?]
How long must I wait for a database query?
Throughput:
How many jobs can the machine run at once?
What is the average execution rate?
How much work is getting done?
How long does it take to handle an interrupt?
Execution Times:
Elapsed Time: counts everything, disk and memory accesses, I/O waits, etc.
Sometimes, a useful number, but not good for comparison purposes
CPU Time: counts instruction execution times, but not I/O time; basis for
MIPS/MFLOPS; often divided into system time and user time
Q? What are MIPS and MFLOPS good measures of, if anything?

10/7/2017 CS6461 Computer Architecture - 2014 15


Dept. of Computer Science
Lets start to design the machine for the CS211 CISC Computer!

Reset

Init
Initialize
Machine

Fetch
Instr.

Branch Load/ XEQ


Store Instr.
Register-
to-Register
Branch Branch
Taken Not Taken
Incr.
PC

10/7/2017 CS6461 Computer Architecture - 2014 16


Dept. of Computer Science
Analyze LDR/STR Instructions

From our analysis of LDR/LDA/STR instructions, what do


we know?
Memory Address Register (MAR)
Memory Buffer Register (MBR)
Program Counter (PC)
4 GPRs (given)
Instruction Register (IR)
Register Select Register (RSR)
Instruction Operation Register (Opcode)

10/7/2017 CS6461 Computer Architecture - 2014 17


Dept. of Computer Science
How do these hook together?

How many
Memory
Registers do I need to access RF?
See Mul/Div instructions

MAR MBR How do I


Hook in the
IR OpCode Index Registers?

X1
R0
X2
R1
RFI
X3
R2

R3

ALU

Carry

PC
Condition Codes

10/7/2017 CS6461 Computer Architecture - 2014 18


Dept. of Computer Science
Execution Structure
R0 IR
R1
R2 MUX MUX
MBR
R3
ALU PC
Control

Data1 Data2 Carry Data1 Data2 Count Data1

Arithmetic Shifter
Logical Unit
Unit
Carry

Opcode

LRR ARR SRR

ALU-Result

xRR = result registers, hold result of operation for store on next cycle

10/7/2017 CS6461 Computer Architecture - 2014 19


Dept. of Computer Science
Comments on Multiplexors

Both the arithmetic unit and the logic unit are active
and produce outputs.
The mux determines whether the final result comes from the
arithmetic or logic unit.
The output of the other one is effectively ignored.
Our hardware scheme may seem like wasted effort, but
its not really.
Deactivating one or the other wouldnt save that much time.
We have to build hardware for both units anyway, so we might
as well run them together.
This is a very common use of multiplexers in logic
design.

10/7/2017 CS6461 Computer Architecture - 2014 20


Dept. of Computer Science
Shifter

A shifter is most useful for arithmetic operations since


shifting is equivalent to multiplication by powers of two.
Shifting is necessary, for example, during floating point operation
arithmetic.
The simplest shifter is the shift register, which can shift
by one position per clock cycle.
So, the number of shifts equals the number of clock
cycles consumed.
Barrel shifter allows rotations as well

10/7/2017 CS6461 Computer Architecture - 2014 21


Dept. of Computer Science
Adder

The adder is probably the most studied digital circuit.


There are a great many ways to perform binary addition, each
with its own area/delay trade-offs.
Adder delay is dominated by carry chain.
Full Adder:
Computes one-bit sum, carry:
si = ai XOR bi XOR ci
ci+1 = aibi + aici + bici

10/7/2017 CS6461 Computer Architecture - 2014 22


Dept. of Computer Science
Instruction Path

Program Counter (PC)


Keeps track of program execution
Address of next instruction to read from memory
May have auto-increment feature or use ALU
Instruction Register (IR)
Current instruction
Includes ALU operation and address of operand
Also holds target of jump instruction
Immediate operands
Relationship to Data Path
PC may be incremented through ALU or separate adder
Contents of IR may also be required as input to ALU

10/7/2017 CS6461 Computer Architecture - 2014 23


Dept. of Computer Science
Questions?

How will you do Scalar Integer Multiply/Divide?


Just use the Java operators, but must be sure to do it only on 18
bits
Think about using an Integer subclass with just 18 bits?
There is no negating instruction. How will you compute
the negative of a number?
Should you use the Adder to increment the PC or just
provide a separate adder circuit.
How will you detect overflow/underflow when doing
adding/subtracting?

10/7/2017 CS6461 Computer Architecture - 2014 24


Dept. of Computer Science
Simple Procedure Calls

Using a procedure involves the following sequence of


actions:
1. Put arguments in places known to procedure (registers)
2. Transfer control to procedure, saving the return address (JSR)
3. Acquire storage space, if required, for use by the procedure
4. Perform the desired task
5. Put results in places known to calling program (registers or
elsewhere)
6. Return control to calling point (RFS)

10/7/2017 CS6461 Computer Architecture - 2014 25


Dept. of Computer Science
Simple Procedure Calls

10/7/2017 CS6461 Computer Architecture - 2014 26


Dept. of Computer Science
Example: Finding the absolute value of an integer

jsr abs ; assume integer in r0


. ; instruction after subroutine call

abs
str r0,0,<tempInt> ; store r0 in <tempInt>, some location
ldr r1,0,smask ; mask for sign bit = 100 000 000 000 000 000
and r1,r0 ; AND r1 and r0: if r0 bit is set it will be set in r1
jz r1,0,pos ; test if sign = 0, e.g., r0 bit 0 is 0
src r0,1,1,1 ; shift r0 logical left 1 bit
src r0,1,0,1 ; shift r0 logical right sets sign bit to 0
pos
rfs 1 ; return with 1 => true and r0 has absolute integer

10/7/2017 CS6461 Computer Architecture - 2014 27


Dept. of Computer Science
Soooo!

Convoluted?? Yes!
Why??
1. No jump less than or greater than instructions!
2. Did we really need them or were they a matter of convenience?
E.g., how many instructions did we save by not having them?
3. Implicit use of r3

10/7/2017 CS6461 Computer Architecture - 2014 28


Dept. of Computer Science

You might also like